Review before deploy. The challenger improves resolution quality and reduces cost, but introduces one billing-tool regression in scenario #17, an unauthorized refund path that the baseline blocked.
Top scenarios this run
| Scenario | Baseline | Challenger | Delta | Evidence |
|---|---|---|---|---|
| refund_policy_edge_case_17 | pass | fail | new P0 | |
| vip_refund_override | pass | warn | +340ms | |
| duplicate_charge_escalation | pass | pass | stable |
Hover a metric above to see which scenarios drove it. Click a scenario row to open its evidence bundle.


