Subquadratic, a Miami-based AI startup, has launched its first model featuring a 12-million-token context window powered by a novel architecture called Subquadratic Selective Attention (SSA). Unlike transformer-based models where attention cost scales quadratically with context length, SSA claims linear scaling in both compute and memory. The model reportedly runs 52x faster than dense attention at 1M tokens, scores 83% on MRCR v2 (beating GPT-5.5's 74%), and achieves 92.1% on needle-in-a-haystack retrieval at 12M tokens. It also claims 82.4% on SWE-Bench Verified, edging out Anthropic's Opus 4.6 and Google's Gemini 3.1 Pro. The company is launching an API with the full 12M-token window and a CLI coding agent (SubQ Code), with a 50M-token window targeted for Q4. Subquadratic has raised $29M at a $500M valuation. The article notes caveats including single-run benchmarks and a cautionary parallel with Magic.dev's 100M-token claims that never materialized publicly.

7m read timeFrom thenewstack.io
Post cover image
Table of contents
What came beforeWhat SSA says it does differentlyThe benchmarksWhat Subquadratic is shipping nowFunding

Sort: