The AI IQ Leaderboard

AI IQ intelligently estimates the IQs of popular AI models

AI Models by IQ
Each model's estimated IQ plotted on a standard normal IQ distribution

How AI IQ estimates model intelligence

  1. We archive source captures from public benchmark leaderboards and extract only source-backed values
  2. We map each benchmark score to an implied IQ using calibrated difficulty curves
  3. We group scored benchmarks into seven dimensions: abstract, mathematical, scientific, software engineering, computer use, reliability, and social reasoning
  4. We conservatively fill missing benchmark and dimension estimates only inside the scoring pipeline
  5. Every derived IQ averages all seven dimensions, so missing coverage cannot make a model look better by omission
IQ vs Effective Cost
Each model's estimated IQ plotted against effective cost per 1M I/O Tokens (sticker price × blended usage multiplier).
IQ 1:1 Cost

Effective cost & iso-curves

Effective cost on the X-axis is sticker price for 1M I/O Tokens × token usage multiplier. 1M I/O Tokens means 1M input tokens plus 1M output tokens, priced at the model's published rates.

Iso-curves trace lines of equal preference for IQ versus cost. The slider weights quality vs cost: center is 1:1, drag toward Cost to make cost matter more, or toward IQ to make quality matter more. Models above and to the right of a curve are strictly better.

Frontier IQ Over Time
X = release date. Y = estimated IQ. Provider step-lines connect each provider's flagship frontier checkpoints over time.

Tracking frontier progress

Each dot is a model with a known release date and a derived IQ estimate. Models are positioned left-to-right by release date, so the chart shows how the frontier changes over time rather than just where models rank today.

Provider-colored lines connect each lab's flagship frontier checkpoints. Codex, mini, nano, flash, coder, and smaller open-weight variants are omitted so the chart tracks each lab's main offering rather than every SKU.

This view is most useful for spotting whether a new release is actually ahead of its direct predecessor, or whether source coverage and conservative imputations are shaping the comparison.

Abstract Reasoning IQ
Each model's Abstract Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Fluid problem-solving on novel puzzles a model cannot have memorized — abstracting patterns from just a few examples.

Mathematical Reasoning IQ
Each model's Mathematical Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Multi-step quantitative reasoning, from competition problems to research-level proofs.

Scientific Reasoning IQ
Each model's Scientific Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Graduate-level reasoning across the natural sciences and applying scientific knowledge to hard problems.

Software Engineering IQ
Each model's Software Engineering IQ plotted on a standard normal IQ distribution

What it measures

Real-world coding: resolving issues in live repositories, building front-end apps, and competitive programming.

Computer Use IQ
Each model's Computer Use IQ plotted on a standard normal IQ distribution

What it measures

Agentic operation of real tools and environments — terminals, browsers, and desktop apps.

Reliability IQ
Each model's Reliability IQ plotted on a standard normal IQ distribution

What it measures

Following instructions precisely and knowing the limits of its own knowledge instead of guessing.

Social Reasoning / EQ
Each model's Social Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Emotional and social intelligence — reading intent, attunement, and the quality of human interaction.

IQ Methodology