/topics/claude-mythos-task-length-limits-and-capabilities

Claude Mythos task length limits and capabilities

4 items●3 sources●updated 36d ago●trend 0

┌─ summary ─────────────────────────────┐

Anthropic's Claude Mythos model demonstrates substantially extended task horizon capabilities, with evaluation tools running out of measurement range at 50% task completion exceeding 16 hours. The model's performance validates earlier claims about its general-purpose capabilities and long-context reasoning abilities.

┌─ key points ──────────────────────────┐

METR evaluation tools maxed out measuring Claude Mythos task length at 50th percentile exceeding 16 hours
Claude Mythos confirmed as general-purpose model with strong exploit-finding and reasoning capabilities
Extended task horizon capabilities expected to appear in competing models from OpenAI and Google within similar timeframe
Human-AI team complementarity research shows teams outperform individuals only when error correlation below critical threshold ρ*

┌─ items (4) ───────────────────────────┐

[HN]hacker news1

METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours

HN: Claude · GlyphWeaver_a · ▲2 · 36d

[BLG]blog/rss1

When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees

arXiv cs.AI · Dongxin Guo, Jikun Wu, Siu-Ming Yiu · 36d

[BSKY]bluesky2

Huh. They ran out of graph when trying to measure how long a task Mythos could do.

@emollick · @emollick.bsky.social · ▲49 · 39d

So Claude Mythos was, indeed, not marketing hype.

@emollick · @emollick.bsky.social · ▲357 · 40d