Mike Chaves
Back to work

AI Systems Engineering

Astrocade QA Calibration Tool

Built and operated a human-in-the-loop QA calibration toolchain for Astrocade's UGC moderation pipeline, improving precision/recall tuning, reducing repeat rejections, and speeding daily publishing decisions.

PythonLLM-Assisted ReviewModeration ToolingAnalyticsHuman-in-the-Loop QA
Client

Astrocade AI

Date

2025 - Present

Category

AI Systems Engineering

Services

Moderation Pipeline Engineering / Calibration & Evaluation / QA Operations

Live / Reference

Situation

Signal

Overview

Astrocade needed a reliable moderation workflow that balanced automation speed with reviewer judgment quality while maintaining a one-day review turnaround target.

Signal

Challenge

False or repeated rejections and inconsistent reviewer interpretation were creating friction for creators and slowing publishing throughput.

Signal

Context

The moderation stack included auto-review logic, multiple human review layers, and creator feedback loops that required measurable calibration over time.

Task

Signal

Overview

Design and run a QA calibration system that continuously measures moderation quality and improves decision consistency across automation and human review.

Signal

Precision/Recall Tuning

Create operational controls for threshold and policy tuning to reduce false rejects while preserving safety standards.

Signal

Workflow Throughput

Identify and remove bottlenecks across review queues, escalation paths, and feedback handling.

Action

01

Built calibration workflows

Implemented repeatable QA passes that compare model-assisted and human decisions, then flag disagreement patterns for targeted fixes.

02

Shipped tooling and backend fixes

Improved queue behavior, triage visibility, and reviewer ergonomics to reduce latency and increase throughput.

03

Ran recurring audits

Led calibration audits and documentation updates to align enforcement logic with reviewer judgment and policy intent.

04

Closed creator feedback loops

Integrated rejection feedback into workflow updates so recurring edge cases could be addressed systematically.

Result

Signal

Higher moderation consistency

Improved alignment between automated decisions and human reviewers through recurring calibration and audit cycles.

Signal

Reduced repeat/false rejections

Precision and recall tuning cut unnecessary friction for creators and improved trust in review outcomes.

Signal

Faster daily operations

Pipeline and tooling improvements supported reliable day-level turnaround targets for publishing review.