Skip to content

Energy catalog version history

Venturi distributes energy coefficients, carbon factors, and water-intensity reference data as a versioned, signed catalog. This page explains how the catalog changes over time and — importantly — why a model's star rating can change without its measured energy changing.

Recalibrate, don't re-benchmark

There are two very different kinds of catalog change:

  • Re-benchmark — a model is measured again and its raw energy in watt-hours changes. This happens only when there is new measurement data (a fresh AI Energy Score run, new hardware, a model revision).
  • Recalibrate — the raw watt-hours are preserved, but the 1-to-5 star bands are recomputed because the population shifted (new models entered the catalog and moved the 20% quintile boundaries).

Venturi's discipline is recalibrate, not re-benchmark. Recalibration never mutates a stored watt-hour value. It only re-derives the relative star bands.

This is a direct consequence of storing watt-hours as the primitive and deriving stars at render time (see energy methodology). The raw energy is the durable fact; the star is a relative presentation over the current population.

Stars are version-relative

Because recalibration re-bands the stars, a star rating is only meaningful within its own catalog version. A model that was four-star in one version and three-star in the next has not necessarily gotten worse — the population around it may simply have improved, moving the quintile boundaries. Venturi blocks cross-version star comparisons for this reason. Compare raw watt-hours across versions, never stars.

What every catalog version records

Each published catalog version carries:

  • A catalog version identifier and a generated-at date.
  • An Ed25519 signature so the catalog is tamper-evident and version-pinned at distribution. The same signed catalog backs both the customer-facing numbers and the public AI Energy Index, so the public number and your number are byte-for-byte the same.
  • An issue date on every star, so an exported or stored rating always carries the version it was banded under.
  • Per-row derivation_method, confidence, and provenance, unchanged across a recalibration.

Freshness

The catalog has a freshness service-level objective tied to its refresh cadence. If the distributed catalog is older than that objective, the staleness is surfaced on the customer status surface rather than hidden — an old catalog is a visible state, not a silent one. New model coverage is added on a regular polling cadence; star bands are recalibrated on a longer (biannual) cadence, at which point any model whose raw watt-hours moved is flagged, because that indicates an actual measurement change rather than a re-banding.

Reading a version diff

When you compare two catalog versions:

  • Raw Wh changed → a genuine measurement update (re-benchmark). This is a real change in the model's energy.
  • Raw Wh unchanged, stars changed → a recalibration. The model's energy is the same; the population around it shifted.
  • New rows → newly covered models (or newly upgraded from an estimated tier to a measured one).

Where to go next