The AI labs knew about this problem for years. A startup solved it first.

Today's signal

A startup called Subquadratic launched SubQ on May 5, the first LLM built on a fully sub-quadratic attention architecture. The core claim: compute scales linearly with context length, not quadratically. At 1 million tokens, it runs 52x faster than standard dense attention and costs one-fifth of leading frontier models.

Why it matters

Every major AI model today, ChatGPT, Claude, Gemini, carries the same structural tax: double your input, quadruple your compute cost. The industry built around it with RAG pipelines, chunking, and agentic scaffolding. None of that changes the underlying scaling law. It just routes around it. Subquadratic's SSA mechanism uses content-dependent selection to route attention only to positions that carry signal, skipping the rest. At 1M tokens, that reduces attention FLOPs by 62.5x. Their research architecture reaches 12 million tokens. Benchmarks are third-party verified: 95% on RULER at 128K (Claude Opus 4.6 scores 94.8%), 81.8% on SWE-Bench Verified (vs Opus 4.6 at 80.8%), and 65.9% on MRCR v2, the hardest long-context retrieval test. The company raised $29M in seed funding from early backers of Anthropic, OpenAI, Stripe, and Brex.

The take

Every major AI lab has known about the quadratic scaling problem for years. None of them shipped a production sub-quadratic model. That is either because the problem is genuinely hard to solve without sacrificing accuracy, or because their transformer infrastructure is too expensive to walk away from. A seed-stage startup doing it first, if the results hold, is a significant indictment of what large compute budgets optimise for. The technical report is still pending and the gap between their research score and production score on MRCR v2 deserves scrutiny. But the claims are specific, the benchmarks are verified, and the architecture is original.

The number

62.5x — the reduction in attention FLOPs SubQ achieves at 1 million tokens compared to standard quadratic attention. At 12 million tokens, the company's research target, that figure approaches 1,000x. This is not a marginal efficiency improvement. It is a different cost curve entirely.

Read the full breakdown on analyticsdrift.com

The AI labs knew about this problem for years. A startup solved it first.

Reply

Keep Reading

Drift