Operations & reliability¶
Venturi is the enterprise system of record for AI consumption, and it is operated to the standard that role implies: published, instrumented service-level objectives; transparent incident communication; deep self-diagnostics; tiered support; and a tested disaster-recovery posture.
This section is written for the people who keep Venturi running well inside your environment — your platform, SRE, and FinOps-operations teams — and for the reviewers who need to see Venturi's reliability commitments before they sign. Every number on these pages is grounded in Venturi's binding specifications and is traceable to an instrumented signal. Venturi never publishes a commitment it does not measure.
The operational posture in one page
- Your AI traffic cannot be taken down by Venturi. The decision-time gateway is fail-open with a hard 50 ms P99 end-to-end budget. Under any failure — including a full Venturi outage — your AI request is forwarded unmodified. Fail-open is absolute and not configurable.
- The serving plane targets 99.9% availability. This governs Venturi's own in-VPC query and dashboard surface — not your traffic, which stays effectively 100% available because of fail-open.
- Durability is one customer-facing pair: RPO ≤ 15 minutes, RTO ≤ 1 hour, propagated identically across every store. Event sourcing over the durable invocation log underwrites the recovery point.
- Venturi self-reports its own health. A live degradation state, freshness signals, and per-component status are exposed in-product and over the API, so you never have to guess whether attribution is current.
- Support is tiered, audited, and least-privilege. Venturi support has no standing access to your tenant. Any support investigation runs under a time-boxed, customer-approved, fully-audited break-glass grant that fails closed on tenant isolation and residency.
What's in this section¶
| Page | What it answers |
|---|---|
| SLAs & SLOs | The published availability, latency, freshness, and durability objectives, what each one governs, and the signals behind them. |
| Status & incidents | How Venturi classifies, communicates, and closes out incidents, the severity model, and the in-product degradation surface. |
| Observability & diagnostics | The health, freshness, and coverage signals you can read yourself — in-product, over the API, and through your own monitoring stack. |
| Support | The support tiers, response targets, escalation paths, and how Venturi support access is governed. |
| Disaster recovery | Backup, restore, the RPO/RTO pair, the recovery design, and how recovery is rehearsed and evidenced. |
The reliability invariants¶
A small set of frozen invariants shapes every operational control here. They hold on every deployment and cannot be configured away.
Fail-open is absolute on your AI traffic
No Venturi code path may block or degrade a live customer AI request. The synchronous gateway works to a 50 ms P99 end-to-end budget, and the trained-model inference call inside it (the RAIL adapter) carries a 20 ms wall-clock fail-open timeout as its share of that budget. On any breach or failure, your traffic is forwarded unmodified and the event is recorded with conservative attribution. See SLAs & SLOs.
Access-control surfaces fail closed
Authentication, authorization, tenant isolation, export, billing mutation, residency routing, and support break-glass fail closed — the gateway's fail-open allowance never extends to an access decision. A cross-tenant request is rejected regardless of role. See Support.
Every published number is instrumented
Each availability, latency, freshness, and durability objective on these pages maps to a named service-level indicator. Venturi does not advertise a target it cannot measure, and forward commitments scoped to a future operational milestone are labeled as such, never implied as already in force.
Where Venturi runs¶
These commitments apply to the serving plane — the in-VPC query and dashboard API that runs inside your own cloud account, alongside the attribution engine and its stores. The Venturi-operated control plane (pricing, release, and aggregate telemetry) is outbound-only: it initiates nothing inbound to your environment and carries no customer-facing availability objective. See Security architecture for the full trust boundary.
Where to start¶
- Evaluating Venturi's reliability for procurement? Start with SLAs & SLOs, then Disaster recovery.
- Operating a live deployment? Wire your monitoring to the signals in Observability & diagnostics.
- Need help with a live issue? See Support and the troubleshooting reference.