Results forAnthropicSee all Tags
March 16, 2026
Benchmarks Don't Matter — Until They Do (Part 2)ForgeCode now reaches 81.8% on TermBench 2.0 with both GPT 5.4 and Opus 4.6. The interesting part is not the score. It is what we had to change in the agent to make GPT 5.4 behave as reliably as Opus 4.6.
Tushar
May 23, 2025
Claude 4 Initial Impressions: A Developer's Review of Anthropic's AI Coding BreakthroughFirst impressions and in-depth review of Claude 4, highlighting its groundbreaking 72.7% SWE-bench Verified score, real-world coding capabilities, and what this means for the future of AI-assisted software development.

ForgeCode Team