Eco-efficiency methodology¶

The 1-to-5 star energy rating answers one question: how much energy does this model burn? It says nothing about how capable the model is. A tiny model can earn five stars by being weak and cheap to run; a frontier model can earn one star while being far more useful per unit of energy.

The eco-efficiency score closes that gap. It measures capability per unit of environmental cost — how much useful output a model delivers for the energy, water, and carbon it consumes. It is a distinct, separate measure from the energy-only star rating, and the two are never conflated.

Inputs and output¶

Eco-efficiency is computed with Data Envelopment Analysis (DEA) using cross-efficiency weighting. DEA frames each model as converting environmental inputs (the cost) into a capability output (the benefit).

Inputs (the environmental cost):

Input	Unit
Per-query energy	Wh
PUE	dimensionless
`WUE_site`	L/kWh
`WUE_source`	L/kWh
Carbon intensity factor (CIF)	kgCO2e/kWh

Output (the capability) — the AI-Index composite:

The output is a single composite capability index built from public benchmark scores with fixed weights:

Capability area	Weight	Component benchmarks
Reasoning + knowledge	50%	e.g. MMLU-Pro, HLE, GPQA
Math	25%	e.g. MATH-500, AIME
Coding	25%	e.g. SciCode, LiveCodeBench

The composite weights are fixed (50 / 25 / 25) so that the score is stable and comparable across the catalog rather than tuned per model.

Why cross-efficiency¶

Plain DEA lets each model choose the input/output weights that flatter it most (self-evaluation). Cross-efficiency instead scores each model under the weight schemes preferred by all models and averages the result, which sharply reduces self-evaluation bias and produces a fairer ranking. The result is a unitless eco_efficiency_score — higher is more capability per environmental cost.

Null, never zero¶

eco_efficiency_score is null whenever any input is null — if energy is unknown, or a water or benchmark term is missing, the score is not computed rather than fabricated. This is the same honest-unknown discipline applied to energy and carbon and water.

Distinct from the star rating¶

To be explicit about the two scores, because they are easy to confuse:

	Energy star rating	Eco-efficiency score
Question answered	How much energy does it burn?	How much capability per environmental cost?
Inputs	Per-task energy only	Energy + PUE + water + carbon-intensity vs. capability
Scale	1–5 stars (relative quintiles, per task)	Unitless DEA cross-efficiency score
Method	20% quintile bands at render time	Data Envelopment Analysis (cross-efficiency)
Comparability	Within one task and catalog version	Across the catalog

A model can be five-star on energy and middling on eco-efficiency, or the reverse. Read each for what it measures.

Where to go next¶

Energy methodology — the energy term and the separate star rating.
Water methodology — the water inputs.
Public AI Energy Index — the eco-efficiency leaderboard and the energy-vs-capability frontier chart.
Licenses & attributions — benchmark and data attribution.