Observability & diagnostics¶

Venturi runs inside your environment, so you should never have to take its health on faith. Venturi self-reports — exposing its degradation state, freshness, coverage, and queue health where you read attribution, over the API, and in a form your own monitoring stack can ingest. This page is the operator's guide to the signals Venturi emits and how to act on them.

The signals that matter

Signal	What it answers
Degradation state	Is attribution being produced normally, or on a fallback?
Connector freshness	When did each source last sync successfully?
Index freshness	How current are dashboard and API results?
Coverage	What fraction of AI spend is being attributed?
Queue depth	Is the processor keeping up with event volume?
Gateway latency	Is the decision-time path inside its budget?
Export status	Did governed exports complete and get signed?

Degradation state¶

Every customer-visible attribution result carries a degradation state as part of its evidence, and the same signal is exposed over the API on each attribution record. This is how Venturi tells you, honestly, whether a number was produced normally or by a fail-open fallback.

`degradation_state`	Meaning	What to do
`none`	Healthy — trained-model inference active, all components current	Nothing; results are at full basis
`degraded_serving`	The trained model is unavailable; attribution is produced on the deterministic heuristic baseline	Results remain usable; treat newly-fallback attributions as provisional and review their evidence
`disabled`	Inference is administratively disabled for the scope	Re-enable in settings if unexpected
`model_recalled`	A model version was withdrawn; affected results revert to the baseline basis	Await the replacement model; affected records are clearly marked

# Read the degradation state on a specific attribution record
curl https://<your-venturi-instance>/api/v1/attribution/attr_01HF8... \
  -H "Authorization: Bearer $ARGMIN_TOKEN"

{
  "id": "attr_01HF8...",
  "cost_usd": 12.40,
  "coper": 0.91,
  "output_state": "strongly_inferred",
  "degradation_state": "none",
  "freshness": "2026-06-03T17:02:11Z"
}

A fallback never blocks your traffic and never hides itself

When the trained model is unavailable, attribution continues on the heuristic baseline and the result is marked degraded_serving — it is never silently presented as if the model produced it. The fail-open guarantee means this degradation is invisible to your AI traffic and fully visible to you. See Confidence & evidence.

Freshness and coverage¶

Freshness¶

Attribution carries a freshness timestamp telling you the moment the underlying data was current as of. Two freshness objectives are instrumented and you can hold the platform to them:

Index freshness P99 ≤ 90 s — the lag from an invocation arriving to its appearing in the attribution index/API.
Reconciliation freshness ≤ 24 h — the window within which attributed cost is reconciled against authoritative provider billing.

Connector freshness is a separate, per-source signal: when each connector last synced successfully. A connector that has stopped syncing is the most common root cause of a coverage drop, so connector freshness is the first thing to check when attribution looks low. A freshness lag is also surfaced as a degraded state in-product, so a stale number is never presented as current.

Coverage¶

Coverage answers a different and equally important question: of the AI spend Venturi can see, how much is being attributed? Coverage is a first-class signal — a healthy serving plane with low coverage tells you a connector is missing or misconfigured, not that your costs dropped.

# Read coverage for the current period
curl "https://<your-venturi-instance>/api/v1/coverage" \
  -H "Authorization: Bearer $ARGMIN_TOKEN"

When coverage drops, Venturi tells you why — never with a blank screen. An empty or suppressed view explains its cause: no connector configured, data not yet fresh, or rows withheld by cohort privacy suppression (minimum cohort of 5). A coverage-drop notification carries the affected scope and a deep link to resolve it. See Reporting & exports for the coverage report and Ingestion for connector setup.

Throughput and latency¶

Two operational signals tell you the pipeline is keeping pace:

Queue depth — whether the asynchronous processor is keeping up with event volume. Sustained growth indicates backpressure and usually precedes a freshness lag.
Gateway latency — whether the decision-time path is inside its 50 ms P99 end-to-end budget. The trained-model adapter inside that path carries a 20 ms wall-clock fail-open timeout; a breach falls back to the baseline rather than slowing your traffic. See SLAs & SLOs.

Wiring Venturi into your monitoring stack¶

Venturi is built to be observed by the tools you already run, not only through its own UI. Health and diagnostic endpoints are read-only and tenant-scoped.

Target-state — record-level tenant scoping

The diagnostic and health surfaces are read-only today. The record-level tenant-isolation boundary that backs the per-tenant scoping of these endpoints — the 403 TENANT_MISMATCH rejection — is a Target-state control, not yet general. See the binding label in Docs authority & product state.

Pull signals over the APISubscribe to eventsRoute alerts to your channels

Read degradation, freshness, coverage, and rate-limit headroom on a schedule and feed them into your existing dashboards and alerting.

# Current rate-limit headroom for an API client
curl https://<your-venturi-instance>/api/v1/rate-limit \
  -H "Authorization: Bearer $ARGMIN_TOKEN"

Use webhooks to react to coverage drops, RAIL-degraded transitions, export-ready events, and credential expiry — push, not poll.

Critical events route to in-app, email, Slack, and Microsoft Teams with consistent rules, P95 ≤ 30 s delivery, and a missed delivery alarmed rather than dropped silently. Configure channels, severities, and quiet hours in Budgets & alerts.

Diagnosing common conditions¶

A diagnostic warning is not automatically a billing defect. The product labels which outputs are affected — chargeback, coverage, adoption, optimization, exports, or only freshness — so you can scope the impact precisely.

You observe	Likely cause	Where to look
Attribution `degradation_state` is `degraded_serving`	Trained model temporarily unavailable; baseline in use	Status & incidents; awaits model recovery
Freshness timestamp lagging > 90 s	Index materialization behind objective, or growing queue depth	SLAs & SLOs
Coverage dropped for a scope	Connector down, credential expired, or new uncaptured pathway	Troubleshooting; Ingestion
A view is empty or partially suppressed	No data yet, or cohort suppression (k=5)	The view's own empty-state explanation
API calls returning `429`	Rate limit reached	Versioning & rate limits

What you do not have to monitor¶

You do not have to monitor whether Venturi is blocking your AI traffic, because it structurally cannot. The decision-time path is fail-open with a hard 50 ms budget; an Venturi problem degrades your visibility into attribution, never your production inference.

SLAs & SLOs — the objectives behind these signals.
Status & incidents — how degraded states are communicated and resolved.
Confidence & evidence — the evidence card every result carries.
Troubleshooting — step-by-step diagnostics.