Skip to content

Eco-efficiency methodology

The 1-to-5 star energy rating answers one question: how much energy does this model burn? It says nothing about how capable the model is. A tiny model can earn five stars by being weak and cheap to run; a frontier model can earn one star while being far more useful per unit of energy.

The eco-efficiency score closes that gap. It measures capability per unit of environmental cost — how much useful output a model delivers for the energy, water, and carbon it consumes. It is a distinct, separate measure from the energy-only star rating, and the two are never conflated.

Inputs and output

Eco-efficiency is computed with Data Envelopment Analysis (DEA) using cross-efficiency weighting. DEA frames each model as converting environmental inputs (the cost) into a capability output (the benefit).

Inputs (the environmental cost):

Input Unit
Per-query energy Wh
PUE dimensionless
WUE_site L/kWh
WUE_source L/kWh
Carbon intensity factor (CIF) kgCO2e/kWh

Output (the capability) — the AI-Index composite:

The output is a single composite capability index built from public benchmark scores with fixed weights:

Capability area Weight Component benchmarks
Reasoning + knowledge 50% e.g. MMLU-Pro, HLE, GPQA
Math 25% e.g. MATH-500, AIME
Coding 25% e.g. SciCode, LiveCodeBench

The composite weights are fixed (50 / 25 / 25) so that the score is stable and comparable across the catalog rather than tuned per model.

Why cross-efficiency

Plain DEA lets each model choose the input/output weights that flatter it most (self-evaluation). Cross-efficiency instead scores each model under the weight schemes preferred by all models and averages the result, which sharply reduces self-evaluation bias and produces a fairer ranking. The result is a unitless eco_efficiency_score — higher is more capability per environmental cost.

Null, never zero

eco_efficiency_score is null whenever any input is null — if energy is unknown, or a water or benchmark term is missing, the score is not computed rather than fabricated. This is the same honest-unknown discipline applied to energy and carbon and water.

Distinct from the star rating

To be explicit about the two scores, because they are easy to confuse:

Energy star rating Eco-efficiency score
Question answered How much energy does it burn? How much capability per environmental cost?
Inputs Per-task energy only Energy + PUE + water + carbon-intensity vs. capability
Scale 1–5 stars (relative quintiles, per task) Unitless DEA cross-efficiency score
Method 20% quintile bands at render time Data Envelopment Analysis (cross-efficiency)
Comparability Within one task and catalog version Across the catalog

A model can be five-star on energy and middling on eco-efficiency, or the reverse. Read each for what it measures.

Where to go next