Skip to content

Energy and carbon methodology

This page is the reference specification for how Venturi derives the energy and carbon figures it attributes to every AI invocation. It expands the customer-facing overview in Energy & carbon with the exact formulas, constants, and provenance behind each number.

Venturi's scope is operational inference energy — the energy a model burns answering a request. Training energy and embodied (manufacturing) energy are out of scope and appear here only as labelled historical context, never folded into a per-request figure.

Two principles govern everything below:

  • Null, never zero. Any unknown term yields a null result, never a fabricated 0. A zero would falsely imply no energy was consumed.
  • Energy is never restated as cost. Energy and carbon are sustainability context. Cost comes from the billing and pricing path and is reported separately. The two are never multiplied into one another.

The derivation tier ladder (T0–T4)

Venturi runs a tiered derivation engine. Every output row records a derivation_method, a confidence level, and a provenance string, so you can always tell how a number was produced and how much to trust it. The engine walks the ladder highest-trust first and stops at the first tier it can satisfy.

Tier Name What it gives When it is used
T0 Measured full-system Metered GPU + CPU + RAM energy via CodeCarbon (NVML / RAPL / psutil) Self-hosted inference that Venturi instruments directly
T1 AI Energy Score coefficient GPU watt-hours per 1,000 queries, batch = 1, H100, by (model, task, class) The model is on the AI Energy Score leaderboard
T2 Infra-aware estimate Per-query Wh from latency + throughput × utilization × PUE The model has published performance data but no AI Energy Score result
T3 Class-analogue estimate The nearest same-class, same-task coefficient borrowed A brand-new model whose parameter count is known but has no performance data
T4 Unestimated null Nothing is available

T1 is the primary tier for Venturi's vendored catalog. T0 is the cross-check where Venturi measures hardware directly (see deployment modes and licenses).

Energy

The stored primitive: watt-hours

Venturi stores absolute energy in watt-hours (Wh) as the stored primitive. The 1-to-5 star rating you see in the product is derived at render time from that primitive — it is never persisted as the source of truth. This matters because star bands are relative: a model's stars can change when the population is recalibrated even though its measured energy did not. The raw Wh is the durable fact; the star is a presentation layer over it.

The per-invocation energy estimate is:

energy_wh = (output_token_count / 1000) × coefficient_wh_per_1k

where coefficient_wh_per_1k is the model's per-1,000-output-token coefficient for its task type.

Render-time star bands (20% quintiles)

The 1-to-5 star rating is computed at render time as five equal 20% quintiles of the per-task energy population:

  • ★☆☆☆☆ — least efficient quintile (highest energy).
  • ★★★★★ — most efficient quintile (lowest energy).

Bands are recalibrated on a regular cadence (see version history). Because the bands are relative within a task, a star is only meaningful within the same task and the same catalog version — never compare a star across tasks or across versions.

T1 — AI Energy Score coefficient (primary)

The AI Energy Score is a standardized, independent benchmark maintained by Hugging Face with Salesforce, Cohere, and Carnegie Mellon University. It measures model inference energy on NVIDIA H100 GPUs and publishes a coefficient and a 1-to-5 rating per model and task. Venturi vendors this catalog and looks up each invocation's model to derive its energy. The benchmark is published at huggingface.co/blog/sasha/ai-energy-score-v2.

The coefficient unit is GPU watt-hours per 1,000 queries. Only GPU energy is scored; CPU and RAM are measured but excluded from the rating.

Venturi records the AI Energy Score run configuration as provenance, verbatim, on every coefficient:

Run-config field Value
Batch size 1
Dataset size 1,000 points per task
Runs averaged 10
Hardware NVIDIA H100 80 GB, single GPU
Numeric precision FP16 for text generation; FP32 default otherwise

Energy is summed across the model's phases. The current schema reports preprocess + prefill + decode; an older schema reports a single forward phase and a per_token figure. Venturi's ingestion tolerates both shapes so the catalog does not break when the upstream schema evolves.

Hardware comparability

Coefficients measured on non-H100 hardware (for example A100 or TPU) are flagged as not AI Energy Score comparable and are segregated in the catalog, because the rating's relative bands are only meaningful within the same measurement hardware.

The reasoning premium

Reasoning ("thinking") modes consume dramatically more energy than standard generation for the same model, because they emit far more output tokens. The AI Energy Score v2 analysis finds:

  • Reasoning models average roughly 30× the energy of non-reasoning models.
  • Turning reasoning on versus off for the same model spans 150× to 700×.

Venturi treats reasoning as a first-class engine factor so that reasoning invocations are not silently under-counted. This is also why an energy multiplier must never be read as a cost multiplier — see Energy is not cost.

T2 — Infra-aware estimate

When a model is not on the AI Energy Score leaderboard but has published performance data (first-token latency and throughput), Venturi estimates energy from the work and the hardware:

base_time_h = (latency_s + output_tokens / throughput_tps) / 3600
energy_wh   = base_time_h × (P_gpu × util_gpu + P_nongpu × util_nongpu) × PUE

A minimum (util_gpu,min) and a maximum (util_gpu,max) bound the utilization; the reported expectation blends them and a Monte-Carlo pass produces the standard deviation (the uncertainty band shown in the public viz). GPU count scales with model size class. These rows are recorded with confidence = medium.

T3 — Class-analogue estimate

For a brand-new model with a known parameter count but no performance data, Venturi bins the model into a size class using the memory formula

M(GB) = (params × bytes_per_param / (32 / quant_bits)) × overhead

with bytes_per_param = 2 (FP16), quant_bits = 16, and overhead = 1.2, then borrows the nearest same-class, same-task coefficient. These rows are recorded with derivation_method = class_analogue and confidence = low.

T0 — Measured full-system (self-hosted)

Where Venturi instruments the hardware (self-hosted inference), it can measure energy directly with CodeCarbon. Power is derived backwards from energy-counter deltas — power_kw = Δenergy_kwh / Δhours — which is the true steady average power, not an instantaneous reading. T0 covers GPU + CPU + RAM. A caveat rides every T0-vs-T1 comparison: measured GPU energy is device-level, not per-tenant, so it is not directly comparable to the per-request normalized AI Energy Score figure on multi-tenant hardware.

Carbon

Carbon is a single decomposition across all tiers:

carbon_gco2e = C × E

where E is energy in kWh and C is the grid carbon intensity in gCO2e/kWh.

The five-step C resolution chain

Venturi resolves the grid carbon intensity C in a fixed order of precedence, so the most specific value available is always used:

  1. An explicit override supplied for the calculation (force_carbon_intensity_g_co2e_kwh).
  2. A live grid-intensity feed (optional ElectricityMaps integration).
  3. The cloud provider and region (vendored per-region intensities).
  4. The country ISO code (and sub-region where available), via the generation-mix derivation below.
  5. The world-average fallback of 475 gCO2e/kWh — always flagged as estimated, never a silent value.

A region you configure resolves higher up the chain and is materially more accurate than the fallback.

Generation-mix derivation and the fuel-factor table

Where carbon is derived from a country's generation mix:

C = Σ (fraction_fuel × emission_factor_fuel)

The per-fuel emission factors (g/kWh, equivalently kg/MWh) are:

Fuel Emission factor (gCO2e/kWh)
Coal 995
Petroleum / oil 816
Natural gas 743
Solar 48
Geothermal 38
Nuclear 29
Hydro 26
Wind 26

Worked example for a mix of 25% coal, 35% oil, 26% gas, 14% nuclear:

C = 0.25×995 + 0.35×816 + 0.26×743 + 0.14×29 ≈ 731.6 gCO2e/kWh

PUE is applied once, in the energy term

Power Usage Effectiveness (PUE) accounts for datacenter overhead beyond the IT load. Venturi applies PUE once, inside the energy term (T1/T2 energy already includes it), and then computes carbon as C × E_total. PUE is never double-applied in the carbon step. This is an explicit decision that closes a known discrepancy between two upstream methodologies (one applies PUE in carbon, one does not) and is enforced by a gate so it cannot regress.

Reconciling the boundaries

The three measurement families draw different system boundaries:

AI Energy Score (T1) CodeCarbon (T0) Infra-aware (T2)
Boundary GPU only GPU + CPU + RAM GPU + non-GPU × utilization × PUE
Basis Measured, batch = 1, H100 Measured counters Statistically inferred
GPU attribution Per-model normalized Device-level Per-query utilization share

The model-of-record: Venturi stores the GPU-only AI Energy Score coefficient (T1) as the primary number, because it is the apples-to-apples normalized index that matches the token × coefficient engine. To compare against a full-system or datacenter figure, Venturi grosses up the GPU-only number by adding CPU + RAM (≈ 30% of GPU energy) and applying PUE once. Where hardware is instrumented, the T0 measured figure is the validation cross-check and the residual is stored. Every comparison carries the device-level caveat above, and every missing term resolves to null.

Version history of the AI Energy Score

Venturi records the AI Energy Score release history so that older coefficients stay interpretable:

  • v1 introduced the standardized 10-task benchmark and the 1-to-5 rating.
  • v2 added Reasoning as an 11th task and surfaced the "newer is not always greener" finding: newer model generations are not uniformly more efficient than the models they replace, so efficiency must be measured per release rather than assumed to improve over time.

See version history for how Venturi re-bands stars across catalog versions while preserving the raw Wh, and licenses for the attribution of every upstream source.

Where to go next