Eco-efficiency methodology¶
The 1-to-5 star energy rating answers one question: how much energy does this model burn? It says nothing about how capable the model is. A tiny model can earn five stars by being weak and cheap to run; a frontier model can earn one star while being far more useful per unit of energy.
The eco-efficiency score closes that gap. It measures capability per unit of environmental cost — how much useful output a model delivers for the energy, water, and carbon it consumes. It is a distinct, separate measure from the energy-only star rating, and the two are never conflated.
Inputs and output¶
Eco-efficiency is computed with Data Envelopment Analysis (DEA) using cross-efficiency weighting. DEA frames each model as converting environmental inputs (the cost) into a capability output (the benefit).
Inputs (the environmental cost):
| Input | Unit |
|---|---|
| Per-query energy | Wh |
| PUE | dimensionless |
WUE_site |
L/kWh |
WUE_source |
L/kWh |
| Carbon intensity factor (CIF) | kgCO2e/kWh |
Output (the capability) — the AI-Index composite:
The output is a single composite capability index built from public benchmark scores with fixed weights:
| Capability area | Weight | Component benchmarks |
|---|---|---|
| Reasoning + knowledge | 50% | e.g. MMLU-Pro, HLE, GPQA |
| Math | 25% | e.g. MATH-500, AIME |
| Coding | 25% | e.g. SciCode, LiveCodeBench |
The composite weights are fixed (50 / 25 / 25) so that the score is stable and comparable across the catalog rather than tuned per model.
Why cross-efficiency¶
Plain DEA lets each model choose the input/output weights that flatter it most
(self-evaluation). Cross-efficiency instead scores each model under the
weight schemes preferred by all models and averages the result, which sharply
reduces self-evaluation bias and produces a fairer ranking. The result is a
unitless eco_efficiency_score — higher is more capability per environmental
cost.
Null, never zero¶
eco_efficiency_score is null whenever any input is null — if energy is
unknown, or a water or benchmark term is missing, the score is not computed
rather than fabricated. This is the same honest-unknown discipline applied to
energy and carbon and
water.
Distinct from the star rating¶
To be explicit about the two scores, because they are easy to confuse:
| Energy star rating | Eco-efficiency score | |
|---|---|---|
| Question answered | How much energy does it burn? | How much capability per environmental cost? |
| Inputs | Per-task energy only | Energy + PUE + water + carbon-intensity vs. capability |
| Scale | 1–5 stars (relative quintiles, per task) | Unitless DEA cross-efficiency score |
| Method | 20% quintile bands at render time | Data Envelopment Analysis (cross-efficiency) |
| Comparability | Within one task and catalog version | Across the catalog |
A model can be five-star on energy and middling on eco-efficiency, or the reverse. Read each for what it measures.
Where to go next¶
- Energy methodology — the energy term and the separate star rating.
- Water methodology — the water inputs.
- Public AI Energy Index — the eco-efficiency leaderboard and the energy-vs-capability frontier chart.
- Licenses & attributions — benchmark and data attribution.