# Hatchways Agentic Assessment Evaluation Rubric

Translate the HotTea packet into the exact reviewer decisions a Hatchways-style practical assessment team needs to make.

## Evaluation Flow

1. Correctness
   - Reviewer question: Did the final patch solve the assigned repo task and pass visible plus hidden checks?
   - Evidence: public test output, hidden-test summary, GitHub Action evidence, final diff notes
   - Pass signal: Tests pass and the final explanation names the specific failure mode fixed.
   - Follow-up signal: The patch passes by coincidence, misses an edge case, or lacks a clear connection to the failing behavior.

2. Process
   - Reviewer question: Can the reviewer reconstruct how the candidate worked?
   - Evidence: Claude JSONL exports, terminal transcript, git snapshots, VM event timeline
   - Pass signal: The packet shows exploration, edit, verification, and finalization in order.
   - Follow-up signal: Code appears without supporting transcript, command, or git evidence.

3. AI collaboration
   - Reviewer question: Did the candidate use AI as an engineering aid rather than an unreviewed answer generator?
   - Evidence: prompt sequence, tool calls, AI process quality, candidate final answers
   - Pass signal: Candidate asks for analysis, inspects output, runs tests, and corrects or narrows AI suggestions.
   - Follow-up signal: One-shot generation, no verification, or claims that do not match the transcript.

4. Reviewer handoff
   - Reviewer question: Can Hatchways reviewers paste a grounded note into the existing review workflow?
   - Evidence: ATS-ready reviewer note, follow-up questions, anomaly flags, rubric score
   - Pass signal: The packet states what passed, what was observed, and which follow-up questions remain.
   - Follow-up signal: Reviewer must infer all context from raw code or live debrief.

## Score Use

- Use 8-10 when correctness and process evidence are both strong.
- Use 5-7 when the solution works but the AI/process evidence needs follow-up.
- Use 1-4 when correctness or process evidence is weak enough that hidden tests alone should not carry the decision.

## Pilot Decision

- Recommended start: Use the GitHub Actions fallback and tokenized reviewer packet for a limited technical pilot.
- Gated before enterprise rollout: real GitHub App, SSO, DPA, retention SLA, formal security review

## Not Claimed

- No automated hiring decision.
- No perfect anti-cheat.
- No official Hatchways partnership.
- No enterprise procurement readiness before the named gates are complete.

## Proof URLs

- Evaluation JSON: https://hottea.ai/hatchways/evaluation.json
- Buyer packet: https://hottea.ai/hatchways/packet.md
- Calibration packet: https://hottea.ai/hatchways/calibration.md
- Readiness scorecard: https://hottea.ai/hatchways/readiness.md
- Pilot kit: https://hottea.ai/hatchways/pilot.md
- Sample reviewer packet: https://hottea.ai/sample-report