Skip to content

Observability & diagnostics

Venturi runs inside your environment, so you should never have to take its health on faith. Venturi self-reports — exposing its degradation state, freshness, coverage, and queue health where you read attribution, over the API, and in a form your own monitoring stack can ingest. This page is the operator's guide to the signals Venturi emits and how to act on them.

The signals that matter

Signal What it answers
Degradation state Is attribution being produced normally, or on a fallback?
Connector freshness When did each source last sync successfully?
Index freshness How current are dashboard and API results?
Coverage What fraction of AI spend is being attributed?
Queue depth Is the processor keeping up with event volume?
Gateway latency Is the decision-time path inside its budget?
Export status Did governed exports complete and get signed?

Degradation state

Every customer-visible attribution result carries a degradation state as part of its evidence, and the same signal is exposed over the API on each attribution record. This is how Venturi tells you, honestly, whether a number was produced normally or by a fail-open fallback.

degradation_state Meaning What to do
none Healthy — trained-model inference active, all components current Nothing; results are at full basis
degraded_serving The trained model is unavailable; attribution is produced on the deterministic heuristic baseline Results remain usable; treat newly-fallback attributions as provisional and review their evidence
disabled Inference is administratively disabled for the scope Re-enable in settings if unexpected
model_recalled A model version was withdrawn; affected results revert to the baseline basis Await the replacement model; affected records are clearly marked
# Read the degradation state on a specific attribution record
curl https://<your-venturi-instance>/api/v1/attribution/attr_01HF8... \
  -H "Authorization: Bearer $ARGMIN_TOKEN"
{
  "id": "attr_01HF8...",
  "cost_usd": 12.40,
  "coper": 0.91,
  "output_state": "strongly_inferred",
  "degradation_state": "none",
  "freshness": "2026-06-03T17:02:11Z"
}

A fallback never blocks your traffic and never hides itself

When the trained model is unavailable, attribution continues on the heuristic baseline and the result is marked degraded_serving — it is never silently presented as if the model produced it. The fail-open guarantee means this degradation is invisible to your AI traffic and fully visible to you. See Confidence & evidence.

Freshness and coverage

Freshness

Attribution carries a freshness timestamp telling you the moment the underlying data was current as of. Two freshness objectives are instrumented and you can hold the platform to them:

  • Index freshness P99 ≤ 90 s — the lag from an invocation arriving to its appearing in the attribution index/API.
  • Reconciliation freshness ≤ 24 h — the window within which attributed cost is reconciled against authoritative provider billing.

Connector freshness is a separate, per-source signal: when each connector last synced successfully. A connector that has stopped syncing is the most common root cause of a coverage drop, so connector freshness is the first thing to check when attribution looks low. A freshness lag is also surfaced as a degraded state in-product, so a stale number is never presented as current.

Coverage

Coverage answers a different and equally important question: of the AI spend Venturi can see, how much is being attributed? Coverage is a first-class signal — a healthy serving plane with low coverage tells you a connector is missing or misconfigured, not that your costs dropped.

# Read coverage for the current period
curl "https://<your-venturi-instance>/api/v1/coverage" \
  -H "Authorization: Bearer $ARGMIN_TOKEN"

When coverage drops, Venturi tells you why — never with a blank screen. An empty or suppressed view explains its cause: no connector configured, data not yet fresh, or rows withheld by cohort privacy suppression (minimum cohort of 5). A coverage-drop notification carries the affected scope and a deep link to resolve it. See Reporting & exports for the coverage report and Ingestion for connector setup.

Throughput and latency

Two operational signals tell you the pipeline is keeping pace:

  • Queue depth — whether the asynchronous processor is keeping up with event volume. Sustained growth indicates backpressure and usually precedes a freshness lag.
  • Gateway latency — whether the decision-time path is inside its 50 ms P99 end-to-end budget. The trained-model adapter inside that path carries a 20 ms wall-clock fail-open timeout; a breach falls back to the baseline rather than slowing your traffic. See SLAs & SLOs.

Wiring Venturi into your monitoring stack

Venturi is built to be observed by the tools you already run, not only through its own UI. Health and diagnostic endpoints are read-only and tenant-scoped.

Target-state — record-level tenant scoping

The diagnostic and health surfaces are read-only today. The record-level tenant-isolation boundary that backs the per-tenant scoping of these endpoints — the 403 TENANT_MISMATCH rejection — is a Target-state control, not yet general. See the binding label in Docs authority & product state.

Read degradation, freshness, coverage, and rate-limit headroom on a schedule and feed them into your existing dashboards and alerting.

# Current rate-limit headroom for an API client
curl https://<your-venturi-instance>/api/v1/rate-limit \
  -H "Authorization: Bearer $ARGMIN_TOKEN"

Use webhooks to react to coverage drops, RAIL-degraded transitions, export-ready events, and credential expiry — push, not poll.

Critical events route to in-app, email, Slack, and Microsoft Teams with consistent rules, P95 ≤ 30 s delivery, and a missed delivery alarmed rather than dropped silently. Configure channels, severities, and quiet hours in Budgets & alerts.

Diagnosing common conditions

A diagnostic warning is not automatically a billing defect. The product labels which outputs are affected — chargeback, coverage, adoption, optimization, exports, or only freshness — so you can scope the impact precisely.

You observe Likely cause Where to look
Attribution degradation_state is degraded_serving Trained model temporarily unavailable; baseline in use Status & incidents; awaits model recovery
Freshness timestamp lagging > 90 s Index materialization behind objective, or growing queue depth SLAs & SLOs
Coverage dropped for a scope Connector down, credential expired, or new uncaptured pathway Troubleshooting; Ingestion
A view is empty or partially suppressed No data yet, or cohort suppression (k=5) The view's own empty-state explanation
API calls returning 429 Rate limit reached Versioning & rate limits

What you do not have to monitor

You do not have to monitor whether Venturi is blocking your AI traffic, because it structurally cannot. The decision-time path is fail-open with a hard 50 ms budget; an Venturi problem degrades your visibility into attribution, never your production inference.