Model & retraining

Flow 3 · monthly retraining with an accuracy gate before any promotion (FR5)

Decision Tree (before pruning) — v2.4

In production

Recommended model per §5.2 — best balance of recall (catching spam) and precision (not over-blocking) for a security use case. Promoted Jun 1, 2026.

Test accuracy

96.59%

Test precision (spam)

93.50%

Test recall (spam)

91.50%

Trained on

5,572 msgs

Full model comparison (§5.1)

Four candidates evaluated on a 70/30 train/test split of the 5,572-message corpus (4,825 ham · 747 spam)

Model Train acc. Test acc. Test recall Test precision Verdict
Decision Tree (before pruning) 97.22% 96.59% 91.50% 93.50% Selected — best balance
Decision Tree — pruned 96.75% 94.79% 89.90% 88.18% Lower precision
Random Forest (before pruning) 95.83% 95.42% 82.89% 97.49% Misses 17% of spam
Random Forest — pruned 97.49% 94.70% 85.31% 90.78% Lower recall

Why this trade-off is acceptable: a 93.5% precision means roughly 6.5% of legitimate messages may be quarantined — but because spam is held (not deleted) and false positives are released within 1 business hour, the security upside of catching 91.5% of spam outweighs that inconvenience cost (§5.2).

Last retraining run — Jun 1, 2026

Passed accuracy gate
  1. Pulled training data

    1,842 confirmed feedback corrections + last 30 days of classified messages → corpus grew to 7,414 labelled examples

  2. Retrained Decision Tree on expanded corpus

    Completed in 2h 41m · within the 4-hour monthly window (§4.4 Performance)

  3. Evaluated against accuracy gate

    Precision (need ≥ 93%)

    93.8% ✓

    Recall (need ≥ 90%)

    91.2% ✓

  4. Promoted to production as v2.4

    Approved by Devon Reyes (ML Engineer) · v2.3 archived with rollback capability — human approval required per §4.3

Pipeline schedule

Next scheduled run Jul 1, 2026 · 04:00 UTC
Promotion gate precision ≥ 93% & recall ≥ 90%
If gate fails Hold for manual review, alert ML team

Version history

Every promoted (and rejected) version is retained with rollback capability

v2.4 Active Promoted Jun 1, 2026 · precision 93.8% · recall 91.2% Current production model
v2.3 Archived Promoted May 1, 2026 · precision 93.2% · recall 90.6%
v2.2 Archived Promoted Apr 1, 2026 · precision 92.9% · recall 90.1%
v2.1-rc Rejected Evaluated Mar 1, 2026 · recall 89.3% — below the 90% gate; v2.0 retained, ML team alerted