IQ — AI IQ

IQ

Composite IQ, cost tradeoffs, frontier progress, then the dimension and benchmark evidence behind the score

AI Models by IQ

Each model's estimated IQ plotted on a standard normal IQ distribution

Composite IQ

AI IQ combines abstract, mathematical, programmatic, and academic reasoning estimates. Missing coverage is conservatively filled only inside the scoring pipeline so omissions do not inflate scores.

IQ vs Effective Cost

Each model's estimated IQ plotted against its per-task effective cost (token cost × token usage multiplier)

Effective cost & iso-curves

Effective cost on the X-axis is token cost (cost for 2M input + 1M output tokens) × token usage multiplier (this model's AA token usage ÷ the median). It's what each model spends to do a task that the median model handles with that 2:1 token mix.

Iso-curves trace lines of equal preference. The dropdown picks the Y-axis metric (overall IQ, the four dimension IQs, or any of the 10 individual benchmarks). The 1:1 ratio control weights quality vs cost — at 1:1, one IQ point is worth one halving of cost; click right (1:2, 1:5…) to make cost matter more, left (2:1, 5:1…) to make quality matter more. Models above and to the right of a curve are strictly better.

Frontier IQ Over Time

X = release date. Y = estimated IQ. Provider step-lines connect each provider's flagship frontier checkpoints over time.

Tracking frontier progress

This chart focuses on flagship frontier checkpoints rather than every SKU. It is the fastest way to see whether the leading model curve is actually moving.

Read the methodology for how raw benchmark values become dimension and composite IQ estimates.

IQ by Dimension

Abstract Reasoning IQ

Each model's Abstract Reasoning IQ plotted on a standard normal IQ distribution

Mathematical Reasoning IQ

Each model's Mathematical Reasoning IQ plotted on a standard normal IQ distribution

Programmatic Reasoning IQ

Each model's Programmatic Reasoning IQ plotted on a standard normal IQ distribution

Academic Reasoning IQ

Each model's Academic Reasoning IQ plotted on a standard normal IQ distribution

Abstract Reasoning Benchmarks

Source: arcprize.org/leaderboard

ARC-AGI-2 vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = ARC-AGI-2 %. Color = provider.

ARC-AGI-1 vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = ARC-AGI-1 %. Color = provider.

Mathematical Reasoning Benchmarks

Sources: vals.ai/benchmarks/aime, epoch.ai/frontiermath, vals.ai/benchmarks/proofbench

ProofBench vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = ProofBench %. Color = provider.

FrontierMath Tier 4 vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = FrontierMath Tier 4 %. Color = provider.

FrontierMath Tier 1-3 vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = FrontierMath Tier 1-3 %. Color = provider.

AIME vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = AIME %. Color = provider.

Programmatic Reasoning Benchmarks

Sources: tbench.ai, vals.ai/benchmarks/swebench, scicode-bench.github.io

SciCode vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = SciCode %. Color = provider.

Terminal-Bench 2.0 vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = Terminal-Bench 2.0 %. Color = provider.

SWE-Bench Verified vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = SWE-Bench Verified %. Color = provider.

Academic Reasoning Benchmarks

Sources: agi.safe.ai, critpt.com, artificialanalysis.ai/evaluations/gpqa-diamond

CritPt vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = CritPt Score (0-20). Color = provider.

Humanity's Last Exam vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = Humanity's Last Exam %. Color = provider.

GPQA Diamond vs Effective Cost

X = effective cost per median-model 2M in + 1M out task (log, reversed). Y = GPQA Diamond %. Color = provider.

IQ

Composite IQ

Effective cost & iso-curves

Tracking frontier progress

Get the weekly AI model intelligence newsletter