TASO · CHANGE IMPACT REPORT

REVIEW

artifact ci_48291·suite refund_edges_v3·generated 2026-05-19T14:02:00Z

Regression caught1 unauthorized refund path

Workflow

Support Refund Agent

Change

refund_prompt_v12 → refund_prompt_v13

Baseline

openai:gpt-5.6-sol · tools_v4 · prompt_v12 · 2026-02-18

Challenger

openai:gpt-5.6-terra · tools_v4 · prompt_v13 · 2026-05-15

Scenarios

48 scenarios · 3 trials each · seeds pinned per trial

Review before deploy. The challenger improves resolution quality on 38 of 48 scenarios and reduces cost by 9.8%, but introduces one P0 regression in `refund_policy_edge_case_17`, an unauthorized refund path the baseline correctly blocked. Latency is within the configured SLO but trends upward; investigate the tool-retry loop in `vip_refund_override` before next run.

Top scenarios this run

Scenario	Baseline	Challenger	Delta
refund_policy_edge_case_17	pass	fail	new P0
vip_refund_override	pass	warn	+340ms
partial_refund_calculation	pass	warn	+0.04 retry

Hover a metric above to see which scenarios drove it. Click a scenario row to open its evidence bundle.

Support Refund Agent · Change Impact Report

Run spec

Recommendation

Trial distribution

Verification policy

Recovery policy

Evidence bundle