Categorisation accuracy
91.4% vs. target ≥ 90%
Agreement with expert human review on this week's 20% audit sample (84 of 412 reviews)
Priority assignment accuracy
86.2% vs. target ≥ 85%
Agreement with manager judgement on the same audit sample
Hallucination rate (N/A reviews)
0% target = 0%
Cust2024-006 ("Nice place") and 11 similar short reviews returned Tags = N/A — zero false aspect tags this week
Versioned prompt change log
What changed, why, and the measured effect — owned by Sankar Kumar Palaniappan
-
v2.3 · Active Deployed Jun 2, 2026
Tightened the N/A rule for short/ambiguous reviews
Added explicit guidance to return
Aspect Tags = N/Afor reviews under ~6 words (the Cust2024-006 case) instead of inferring tags from sentiment alone. Effect: hallucination rate dropped from 4% → 0%. -
v2.2 Deployed May 14, 2026
Added idiom-awareness examples ("killer pasta" vs. "killed my appetite")
Injected contrastive few-shot examples so the model weighs context over keyword roots. Effect: categorisation accuracy rose from 87% → 90%.
-
v2.1 Deployed Apr 22, 2026
Recalibrated priority criteria around business impact, not just polarity
Re-weighted "High" toward safety/health flags and repeat-pattern complaints (e.g. "ambiance was terrifyingly unsafe" vs. "a bit quiet" — same aspect, different severity). Effect: priority agreement rose from 79% → 85%.
-
v1.0 Deployed Mar 3, 2026 — pilot launch
Initial engineered prompt — 5-column structured output
Baseline version defining the Sentiment / Aspect Tags / Priority / Suggested Action / 1st Reply schema used across the pilot.
This week's 20% manual audit sample
FR10 — 84 of 412 reviews independently re-scored by Sankar against the live prompt output
| Customer ID | AI sentiment | Auditor sentiment | AI priority | Auditor priority | Match? |
|---|---|---|---|---|---|
| Cust2024-001 | Positive Negative | Positive Negative | High | High | |
| Cust2024-005 | Neutral | Neutral | Normal | Low | |
| Cust2024-006 | Positive Tags = N/A | Positive Tags = N/A | Low | Low | |
| Cust2024-008 | Negative | Negative | Normal | Normal | |
| Cust2024-009 | Negative | Negative | High | High |
Sample disagreement: Cust2024-005 — AI assigned Normal, auditor judged Low ("nothing special, no real risk of churn"). Logged for the v2.4 calibration review.
Reported miscategorisations — open queue
- Open Cust2024-005 — "Priority felt too high for a low-stakes neutral review" — reported by Maria Reyes, Jun 6
- Resolved in v2.3 Cust2024-006 — "Tagged 'Service' on a one-line review with no detail" — reported by Jordan Tate, May 28