Results forSee all Tags
Developer Experience
August 10, 20257 min read
Claude Sonnet 4 vs Kimi K2 vs Gemini 2.5 Pro: Which AI actually ships production code?
I ran Claude Sonnet 4, Kimi K2, and Gemini 2.5 Pro on the same Next.js app and measured cost, speed, and whether the code actually shipped without follow-ups.
July 23, 20259 min read
Kimi K2 vs Qwen-3 Coder: Testing Two AI Models on Coding Tasks
I tested Kimi K2 and Qwen-3 Coder on 13 Rust development tasks across a 38k-line codebase and 2 Frontend refactor tasks. The results reveal differences in code quality, instruction following, and development capabilities.