March 16, 2026Benchmarks Don't Matter — Until They Do (Part 2)ForgeCode now reaches 81.8% on TermBench 2.0 with both GPT 5.4 and Opus 4.6. The interesting part is not the score. It is what we had to change in the agent to make GPT 5.4 behave as reliably as Opus 4.6.