/topics/deepswe-benchmark-and-gpt-5-5-coding-performance

DeepSWE benchmark and GPT-5.5 coding performance

6 items●2 sources●updated 16d ago●trend 0

┌─ summary ─────────────────────────────┐

OpenAI's GPT-5.5 topped the DeepSWE benchmark, a new contamination-free evaluation for long-horizon coding agents, outperforming Claude Opus 4.8 on software engineering tasks. The benchmark measures frontier AI coding capabilities and has become a key leaderboard for assessing agent performance.

┌─ key points ──────────────────────────┐

GPT-5.5 achieved higher performance than Claude Opus 4.8 on DeepSWE with lower computational cost
DeepSWE is a contamination-free benchmark designed specifically for long-horizon coding agents
Cognition raised $1B at $25B pre-money valuation, with $492M annualized revenue run rate
DeepSWE benchmark launched May 26-27, 2026 and became a new AI coding leaderboard standard

┌─ items (6) ───────────────────────────┐

[HN]hacker news5

DeepSWE: More and cheaper intelligence from maxed GPT 5.5 than maxed Opus 4.8

HN: GPT · rajveerb · ▲5 · 17d

Tesla's 'Full Self-Driving' fraud lawsuit gets first hearing in China

HN: GPT · breve · ▲12 · 17d

DeepSWE Measuring frontier coding agents

HN: agents · e2e4 · ▲3 · 20d

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5

HN: AI · ripvanwinkle · ▲2 · 21d

DeepSWE: A contamination-free benchmark for long-horizon coding agents

HN: agents · ammar_x · ▲41 · 21d

[BLG]blog/rss1

AI coding startup Cognition raises $1B at $25B pre-money valuation

TechCrunch AI · Julie Bort · 20d