Astrocade QA Calibration Tool

Back to projects
project_details.sh

$ cat astrocade-qa-calibration-tool.json

title: Astrocade QA Calibration Tool

category: development

client: Astrocade AI

date: 2025 - Present

stack:PythonLLM-Assisted ReviewModeration ToolingAnalyticsHuman-in-the-Loop QA

Astrocade QA Calibration Tool
Astrocade QA Calibration Tool gallery image 1

Project Overview

Built and operated a human-in-the-loop QA calibration toolchain for Astrocade's UGC moderation pipeline, improving precision/recall tuning, reducing repeat rejections, and speeding daily publishing decisions.

Situation

Overview

Astrocade needed a reliable moderation workflow that balanced automation speed with reviewer judgment quality while maintaining a one-day review turnaround target.

Challenge

False or repeated rejections and inconsistent reviewer interpretation were creating friction for creators and slowing publishing throughput.

Context

The moderation stack included auto-review logic, multiple human review layers, and creator feedback loops that required measurable calibration over time.

Task

Overview

Design and run a QA calibration system that continuously measures moderation quality and improves decision consistency across automation and human review.

Precision/Recall Tuning

Create operational controls for threshold and policy tuning to reduce false rejects while preserving safety standards.

Workflow Throughput

Identify and remove bottlenecks across review queues, escalation paths, and feedback handling.

Action

1. Built calibration workflows

Implemented repeatable QA passes that compare model-assisted and human decisions, then flag disagreement patterns for targeted fixes.

2. Shipped tooling and backend fixes

Improved queue behavior, triage visibility, and reviewer ergonomics to reduce latency and increase throughput.

3. Ran recurring audits

Led calibration audits and documentation updates to align enforcement logic with reviewer judgment and policy intent.

4. Closed creator feedback loops

Integrated rejection feedback into workflow updates so recurring edge cases could be addressed systematically.

Result

Higher moderation consistency

Improved alignment between automated decisions and human reviewers through recurring calibration and audit cycles.

Reduced repeat/false rejections

Precision and recall tuning cut unnecessary friction for creators and improved trust in review outcomes.

Faster daily operations

Pipeline and tooling improvements supported reliable day-level turnaround targets for publishing review.