Energy and carbon methodology¶
This page is the reference specification for how Venturi derives the energy and carbon figures it attributes to every AI invocation. It expands the customer-facing overview in Energy & carbon with the exact formulas, constants, and provenance behind each number.
Venturi's scope is operational inference energy — the energy a model burns answering a request. Training energy and embodied (manufacturing) energy are out of scope and appear here only as labelled historical context, never folded into a per-request figure.
Two principles govern everything below:
- Null, never zero. Any unknown term yields a
nullresult, never a fabricated0. A zero would falsely imply no energy was consumed. - Energy is never restated as cost. Energy and carbon are sustainability context. Cost comes from the billing and pricing path and is reported separately. The two are never multiplied into one another.
The derivation tier ladder (T0–T4)¶
Venturi runs a tiered derivation engine. Every output row records a
derivation_method, a confidence level, and a provenance string, so you can
always tell how a number was produced and how much to trust it. The engine
walks the ladder highest-trust first and stops at the first tier it can satisfy.
| Tier | Name | What it gives | When it is used |
|---|---|---|---|
| T0 | Measured full-system | Metered GPU + CPU + RAM energy via CodeCarbon (NVML / RAPL / psutil) | Self-hosted inference that Venturi instruments directly |
| T1 | AI Energy Score coefficient | GPU watt-hours per 1,000 queries, batch = 1, H100, by (model, task, class) | The model is on the AI Energy Score leaderboard |
| T2 | Infra-aware estimate | Per-query Wh from latency + throughput × utilization × PUE | The model has published performance data but no AI Energy Score result |
| T3 | Class-analogue estimate | The nearest same-class, same-task coefficient borrowed | A brand-new model whose parameter count is known but has no performance data |
| T4 | Unestimated | null | Nothing is available |
T1 is the primary tier for Venturi's vendored catalog. T0 is the cross-check where Venturi measures hardware directly (see deployment modes and licenses).
Energy¶
The stored primitive: watt-hours¶
Venturi stores absolute energy in watt-hours (Wh) as the stored primitive. The 1-to-5 star rating you see in the product is derived at render time from that primitive — it is never persisted as the source of truth. This matters because star bands are relative: a model's stars can change when the population is recalibrated even though its measured energy did not. The raw Wh is the durable fact; the star is a presentation layer over it.
The per-invocation energy estimate is:
where coefficient_wh_per_1k is the model's per-1,000-output-token coefficient
for its task type.
Render-time star bands (20% quintiles)¶
The 1-to-5 star rating is computed at render time as five equal 20% quintiles of the per-task energy population:
- ★☆☆☆☆ — least efficient quintile (highest energy).
- ★★★★★ — most efficient quintile (lowest energy).
Bands are recalibrated on a regular cadence (see version history). Because the bands are relative within a task, a star is only meaningful within the same task and the same catalog version — never compare a star across tasks or across versions.
T1 — AI Energy Score coefficient (primary)¶
The AI Energy Score is a standardized, independent benchmark maintained by Hugging Face with Salesforce, Cohere, and Carnegie Mellon University. It measures model inference energy on NVIDIA H100 GPUs and publishes a coefficient and a 1-to-5 rating per model and task. Venturi vendors this catalog and looks up each invocation's model to derive its energy. The benchmark is published at huggingface.co/blog/sasha/ai-energy-score-v2.
The coefficient unit is GPU watt-hours per 1,000 queries. Only GPU energy is scored; CPU and RAM are measured but excluded from the rating.
Venturi records the AI Energy Score run configuration as provenance, verbatim, on every coefficient:
| Run-config field | Value |
|---|---|
| Batch size | 1 |
| Dataset size | 1,000 points per task |
| Runs averaged | 10 |
| Hardware | NVIDIA H100 80 GB, single GPU |
| Numeric precision | FP16 for text generation; FP32 default otherwise |
Energy is summed across the model's phases. The current schema reports
preprocess + prefill + decode; an older schema reports a single forward
phase and a per_token figure. Venturi's ingestion tolerates both shapes so the
catalog does not break when the upstream schema evolves.
Hardware comparability
Coefficients measured on non-H100 hardware (for example A100 or TPU) are flagged as not AI Energy Score comparable and are segregated in the catalog, because the rating's relative bands are only meaningful within the same measurement hardware.
The reasoning premium¶
Reasoning ("thinking") modes consume dramatically more energy than standard generation for the same model, because they emit far more output tokens. The AI Energy Score v2 analysis finds:
- Reasoning models average roughly 30× the energy of non-reasoning models.
- Turning reasoning on versus off for the same model spans 150× to 700×.
Venturi treats reasoning as a first-class engine factor so that reasoning invocations are not silently under-counted. This is also why an energy multiplier must never be read as a cost multiplier — see Energy is not cost.
T2 — Infra-aware estimate¶
When a model is not on the AI Energy Score leaderboard but has published performance data (first-token latency and throughput), Venturi estimates energy from the work and the hardware:
base_time_h = (latency_s + output_tokens / throughput_tps) / 3600
energy_wh = base_time_h × (P_gpu × util_gpu + P_nongpu × util_nongpu) × PUE
A minimum (util_gpu,min) and a maximum (util_gpu,max) bound the utilization;
the reported expectation blends them and a Monte-Carlo pass produces the standard
deviation (the uncertainty band shown in the public viz). GPU count scales with
model size class. These rows are recorded with confidence = medium.
T3 — Class-analogue estimate¶
For a brand-new model with a known parameter count but no performance data, Venturi bins the model into a size class using the memory formula
with bytes_per_param = 2 (FP16), quant_bits = 16, and overhead = 1.2, then
borrows the nearest same-class, same-task coefficient. These rows are recorded
with derivation_method = class_analogue and confidence = low.
T0 — Measured full-system (self-hosted)¶
Where Venturi instruments the hardware (self-hosted inference), it can measure
energy directly with CodeCarbon. Power is derived backwards from energy-counter
deltas — power_kw = Δenergy_kwh / Δhours — which is the true steady average
power, not an instantaneous reading. T0 covers GPU + CPU + RAM. A caveat rides
every T0-vs-T1 comparison: measured GPU energy is device-level, not
per-tenant, so it is not directly comparable to the per-request normalized
AI Energy Score figure on multi-tenant hardware.
Carbon¶
Carbon is a single decomposition across all tiers:
where E is energy in kWh and C is the grid carbon intensity in gCO2e/kWh.
The five-step C resolution chain¶
Venturi resolves the grid carbon intensity C in a fixed order of precedence, so
the most specific value available is always used:
- An explicit override supplied for the calculation
(
force_carbon_intensity_g_co2e_kwh). - A live grid-intensity feed (optional ElectricityMaps integration).
- The cloud provider and region (vendored per-region intensities).
- The country ISO code (and sub-region where available), via the generation-mix derivation below.
- The world-average fallback of 475 gCO2e/kWh — always flagged as estimated, never a silent value.
A region you configure resolves higher up the chain and is materially more accurate than the fallback.
Generation-mix derivation and the fuel-factor table¶
Where carbon is derived from a country's generation mix:
The per-fuel emission factors (g/kWh, equivalently kg/MWh) are:
| Fuel | Emission factor (gCO2e/kWh) |
|---|---|
| Coal | 995 |
| Petroleum / oil | 816 |
| Natural gas | 743 |
| Solar | 48 |
| Geothermal | 38 |
| Nuclear | 29 |
| Hydro | 26 |
| Wind | 26 |
Worked example for a mix of 25% coal, 35% oil, 26% gas, 14% nuclear:
PUE is applied once, in the energy term¶
Power Usage Effectiveness (PUE) accounts for datacenter overhead beyond the IT
load. Venturi applies PUE once, inside the energy term (T1/T2 energy already
includes it), and then computes carbon as C × E_total. PUE is never
double-applied in the carbon step. This is an explicit decision that closes a
known discrepancy between two upstream methodologies (one applies PUE in carbon,
one does not) and is enforced by a gate so it cannot regress.
Reconciling the boundaries¶
The three measurement families draw different system boundaries:
| AI Energy Score (T1) | CodeCarbon (T0) | Infra-aware (T2) | |
|---|---|---|---|
| Boundary | GPU only | GPU + CPU + RAM | GPU + non-GPU × utilization × PUE |
| Basis | Measured, batch = 1, H100 | Measured counters | Statistically inferred |
| GPU attribution | Per-model normalized | Device-level | Per-query utilization share |
The model-of-record: Venturi stores the GPU-only AI Energy Score coefficient
(T1) as the primary number, because it is the apples-to-apples normalized index
that matches the token × coefficient engine. To compare against a full-system or
datacenter figure, Venturi grosses up the GPU-only number by adding CPU + RAM
(≈ 30% of GPU energy) and applying PUE once. Where hardware is instrumented, the
T0 measured figure is the validation cross-check and the residual is stored.
Every comparison carries the device-level caveat above, and every missing term
resolves to null.
Version history of the AI Energy Score¶
Venturi records the AI Energy Score release history so that older coefficients stay interpretable:
- v1 introduced the standardized 10-task benchmark and the 1-to-5 rating.
- v2 added Reasoning as an 11th task and surfaced the "newer is not always greener" finding: newer model generations are not uniformly more efficient than the models they replace, so efficiency must be measured per release rather than assumed to improve over time.
See version history for how Venturi re-bands stars across catalog versions while preserving the raw Wh, and licenses for the attribution of every upstream source.
Where to go next¶
- Energy & carbon overview — the customer-facing summary this page expands.
- Water methodology — the two-term water model.
- Eco-efficiency methodology — capability per environmental cost.
- Version history — recalibrate-not-re-benchmark semantics.
- Public AI Energy Index — the public dataset and viewer.
- Licenses & attributions — upstream source credit.
- FAQ — common energy and carbon questions.