Observability & diagnostics¶
Venturi runs inside your environment, so you should never have to take its health on faith. Venturi self-reports — exposing its degradation state, freshness, coverage, and queue health where you read attribution, over the API, and in a form your own monitoring stack can ingest. This page is the operator's guide to the signals Venturi emits and how to act on them.
The signals that matter
| Signal | What it answers |
|---|---|
| Degradation state | Is attribution being produced normally, or on a fallback? |
| Connector freshness | When did each source last sync successfully? |
| Index freshness | How current are dashboard and API results? |
| Coverage | What fraction of AI spend is being attributed? |
| Queue depth | Is the processor keeping up with event volume? |
| Gateway latency | Is the decision-time path inside its budget? |
| Export status | Did governed exports complete and get signed? |
Degradation state¶
Every customer-visible attribution result carries a degradation state as part of its evidence, and the same signal is exposed over the API on each attribution record. This is how Venturi tells you, honestly, whether a number was produced normally or by a fail-open fallback.
degradation_state |
Meaning | What to do |
|---|---|---|
none |
Healthy — trained-model inference active, all components current | Nothing; results are at full basis |
degraded_serving |
The trained model is unavailable; attribution is produced on the deterministic heuristic baseline | Results remain usable; treat newly-fallback attributions as provisional and review their evidence |
disabled |
Inference is administratively disabled for the scope | Re-enable in settings if unexpected |
model_recalled |
A model version was withdrawn; affected results revert to the baseline basis | Await the replacement model; affected records are clearly marked |
# Read the degradation state on a specific attribution record
curl https://<your-venturi-instance>/api/v1/attribution/attr_01HF8... \
-H "Authorization: Bearer $ARGMIN_TOKEN"
{
"id": "attr_01HF8...",
"cost_usd": 12.40,
"coper": 0.91,
"output_state": "strongly_inferred",
"degradation_state": "none",
"freshness": "2026-06-03T17:02:11Z"
}
A fallback never blocks your traffic and never hides itself
When the trained model is unavailable, attribution continues on the heuristic baseline and the result is marked degraded_serving — it is never silently presented as if the model produced it. The fail-open guarantee means this degradation is invisible to your AI traffic and fully visible to you. See Confidence & evidence.
Freshness and coverage¶
Freshness¶
Attribution carries a freshness timestamp telling you the moment the underlying data was current as of. Two freshness objectives are instrumented and you can hold the platform to them:
- Index freshness P99 ≤ 90 s — the lag from an invocation arriving to its appearing in the attribution index/API.
- Reconciliation freshness ≤ 24 h — the window within which attributed cost is reconciled against authoritative provider billing.
Connector freshness is a separate, per-source signal: when each connector last synced successfully. A connector that has stopped syncing is the most common root cause of a coverage drop, so connector freshness is the first thing to check when attribution looks low. A freshness lag is also surfaced as a degraded state in-product, so a stale number is never presented as current.
Coverage¶
Coverage answers a different and equally important question: of the AI spend Venturi can see, how much is being attributed? Coverage is a first-class signal — a healthy serving plane with low coverage tells you a connector is missing or misconfigured, not that your costs dropped.
# Read coverage for the current period
curl "https://<your-venturi-instance>/api/v1/coverage" \
-H "Authorization: Bearer $ARGMIN_TOKEN"
When coverage drops, Venturi tells you why — never with a blank screen. An empty or suppressed view explains its cause: no connector configured, data not yet fresh, or rows withheld by cohort privacy suppression (minimum cohort of 5). A coverage-drop notification carries the affected scope and a deep link to resolve it. See Reporting & exports for the coverage report and Ingestion for connector setup.
Throughput and latency¶
Two operational signals tell you the pipeline is keeping pace:
- Queue depth — whether the asynchronous processor is keeping up with event volume. Sustained growth indicates backpressure and usually precedes a freshness lag.
- Gateway latency — whether the decision-time path is inside its 50 ms P99 end-to-end budget. The trained-model adapter inside that path carries a 20 ms wall-clock fail-open timeout; a breach falls back to the baseline rather than slowing your traffic. See SLAs & SLOs.
Wiring Venturi into your monitoring stack¶
Venturi is built to be observed by the tools you already run, not only through its own UI. Health and diagnostic endpoints are read-only and tenant-scoped.
Target-state — record-level tenant scoping
The diagnostic and health surfaces are read-only today. The record-level tenant-isolation boundary that backs the per-tenant scoping of these endpoints — the 403 TENANT_MISMATCH rejection — is a Target-state control, not yet general. See the binding label in Docs authority & product state.
Read degradation, freshness, coverage, and rate-limit headroom on a schedule and feed them into your existing dashboards and alerting.
Use webhooks to react to coverage drops, RAIL-degraded transitions, export-ready events, and credential expiry — push, not poll.
Critical events route to in-app, email, Slack, and Microsoft Teams with consistent rules, P95 ≤ 30 s delivery, and a missed delivery alarmed rather than dropped silently. Configure channels, severities, and quiet hours in Budgets & alerts.
Diagnosing common conditions¶
A diagnostic warning is not automatically a billing defect. The product labels which outputs are affected — chargeback, coverage, adoption, optimization, exports, or only freshness — so you can scope the impact precisely.
| You observe | Likely cause | Where to look |
|---|---|---|
Attribution degradation_state is degraded_serving |
Trained model temporarily unavailable; baseline in use | Status & incidents; awaits model recovery |
| Freshness timestamp lagging > 90 s | Index materialization behind objective, or growing queue depth | SLAs & SLOs |
| Coverage dropped for a scope | Connector down, credential expired, or new uncaptured pathway | Troubleshooting; Ingestion |
| A view is empty or partially suppressed | No data yet, or cohort suppression (k=5) | The view's own empty-state explanation |
API calls returning 429 |
Rate limit reached | Versioning & rate limits |
What you do not have to monitor¶
You do not have to monitor whether Venturi is blocking your AI traffic, because it structurally cannot. The decision-time path is fail-open with a hard 50 ms budget; an Venturi problem degrades your visibility into attribution, never your production inference.
Related pages¶
- SLAs & SLOs — the objectives behind these signals.
- Status & incidents — how degraded states are communicated and resolved.
- Confidence & evidence — the evidence card every result carries.
- Troubleshooting — step-by-step diagnostics.