Skip to content

System architecture

This is the complete architecture of Venturi for customer architects: where the system runs, how it is partitioned into planes, how attribution is built, where your data lives, and — most importantly — where it does not. It is the reference to hand to whoever reviews Venturi before it goes into your environment.

If you only need to connect and start seeing attribution, start with the Quickstart. If you want the security-reviewer summary, read Trust & security. This page is the architecture underneath both.

The one-sentence model

Venturi is the attribution layer for enterprise AI — the enterprise system of record for AI consumption. It deploys as a dedicated data plane inside your cloud trust boundary, reads cost/usage/identity signals, links every AI invocation to the team, service, identity, and budget responsible, and never blocks your AI traffic and never stores prompt or completion content.

Deployment model: inside your VPC

Venturi runs as a dedicated data plane inside your own cloud trust boundary (your VPC / project / subscription). Your operational data — invocation events, the attribution graph, the materialized index — stays in your environment. The Venturi control plane is outbound-only: it manages releases, configuration, and licensing, and it never holds your tenant data.

graph LR
  subgraph YOUR["Your cloud trust boundary (VPC)"]
    GW[Gateway / interceptor plane]
    PR[Attribution processor plane]
    DB[Dashboard / API plane]
    IDX[(Materialized index)]
    GRAPH[(Attribution graph)]
    STREAM[(Event stream)]
    GW --> STREAM --> PR --> GRAPH --> IDX --> DB
  end
  subgraph VENTURI["Venturi control plane (outbound-only)"]
    REL[Release & config delivery]
  end
  REL -. signed artifacts, config .-> PR
  USERS[Your engineers / finance] --> DB

Three product invariants follow directly from this shape:

  1. Your operational data stays in your trust boundary. Venturi's own dev, staging, and production environments exist for development, certification, and release rehearsal — not for holding your data.
  2. Integrations are read-only. Venturi reads cost, usage, and identity. It never writes back into your environment. This is enforced, not promised — see Trust & security.
  3. The decision-time interceptor fails open. It never depends on an external control plane to let your traffic through.

Outbound-only control plane

Venturi runs inside your boundary and the control plane reaches out for signed release artifacts and configuration. There is no inbound path from Venturi's control plane into your data. Server-initiated outbound calls to customer-named endpoints (billing hosts, connectors, webhooks) pass through a single hardened fetch wrapper with private/link-local/metadata-IP denial and DNS-rebind protection, so a connector can never be coerced to reach your instance-metadata service or internal hosts.

The three planes

Venturi partitions cleanly into three planes with different latency profiles, different failure semantics, and different responsibilities. Keeping them separate is what lets the synchronous path stay fast and fail open while the analytics path stays correct and durable.

Plane Job Path Failure mode
Gateway / interceptor Observe live AI invocations at decision time Synchronous hot path Fails open — always forwards your traffic
Attribution processor Build the attribution graph and confidence Asynchronous, off the hot path Reconciles later; never on the request path
Dashboard / API Serve attribution to people and systems Synchronous read Fails closed — denies on auth/tenant error

Gateway / interceptor plane (synchronous, fail-open)

The gateway plane optionally sits in the decision-time path of an AI request. Its only job there is to observe — record that an invocation happened and what identity/service made it — and forward the request unmodified. It performs a fast index lookup, not inline model inference, and emits an event onto the stream for the processor plane to attribute asynchronously.

This plane runs on a hard 50 ms P99 end-to-end latency budget, enforced with a wall-clock timeout. Internal calls behind the gateway (index lookup, policy evaluation) fit inside a 15 ms internal budget. Its only runtime dependencies are the fast in-memory index and the event stream — no graph database call ever enters the hot path.

Fail-open is absolute on this plane

No code path on the gateway hot path may block your AI traffic. If anything is slow, degraded, or down, the request is forwarded unmodified and attribution is reconciled later from the event. This is enforced by a hardware-level timeout, not by application logic. See Fail-open vs fail-closed.

The gateway is not the only way Venturi sees traffic. Venturi classifies every AI pathway into a 20-class taxonomy (PathwayCategory) — direct API, AI gateway, model router, AWS Bedrock, Azure OpenAI, Vertex AI, orchestration frameworks, agentic AI, SaaS-embedded (per-seat and consumption), developer tools, self-hosted, batch API, RPA workflows, and more — and captures each through whichever of six instrumentation layers is feasible:

Layer Source Example
L1 Network proxy / gateway Live interceptor, AI gateway
L2 SDK / framework Orchestration-framework instrumentation
L3 Billing / control-plane CUR, BigQuery billing export, Cost Management
L4 Observability pipelines OpenTelemetry, metrics
L5 Source-of-record systems HRIS, IdP, repository ownership
L6 Vendor admin APIs Provider usage/cost and audit-log APIs

Each event carries a capture-feasibility class so the system is honest about what is fully capturable versus indirectly capturable versus uncapturable for a given pathway — it never silently treats missing visibility as zero.

Attribution processor plane (asynchronous)

The processor plane consumes the event stream and does the real work of attribution: running the HRE pipeline, building the attribution graph, computing calibrated confidence, and materializing results into the index that the dashboard reads. This is where 95%+ of attribution volume is handled, off the synchronous path, on a generous 100 ms inference design budget measured at the processor seam. Because it is asynchronous and event-sourced, it can take the time to be correct without ever touching your live traffic.

Dashboard / API plane (synchronous reads, fail-closed)

The dashboard and developer API serve attribution to your engineers, finance team, and systems. Reads are served from the materialized index at low latency. Every result carries its interpretation metadata (see Evidence on every result), so a number is never shown with more authority than its evidence supports. This plane is the fail-closed half of the system: authentication, authorization, tenant isolation, exports, and billing all deny on error.

The six-layer attribution graph

Venturi's defensibility is the attribution graph: it links every AI inference signal across six layers into one graph, with no manual tagging required.

graph TD
  R[Invocation] --> S[Service]
  S --> C[Code / Project]
  C --> I[Identity]
  I --> O[Organization]
  O --> B[Budget responsibility]

The product "six-layer" framing maps to an engineering canon of five node types plus a budget edge:

Product layer Graph node Answers
Invocation Invocation Which AI call happened?
Service Service Which service or workload made it?
Code / Project Project Which codebase / project owns that service?
Identity Identity Which person or service account is responsible?
Organization Organization Which team / org unit do they roll up to?
Budget (edge on Organization) Which cost center / budget is billed?

Budget responsibility is an attribute of the organization expressed through billed_to / budgeted_under edges — not a separate node. The graph uses a frozen eight-edge wire taxonomy (owns, member_of, deployed_in, called_by, produced_by, billed_to, owned_by_org, budgeted_under) so that every relationship a result depends on is explicit and auditable.

The result answers the question finance and engineering both ask — which team, which service, which person, which budget is responsible for a given slice of AI spend — and it does so without you tagging anything by hand.

The HRE three-stage pipeline

The HRE (Heuristic Reconciliation Engine) is the runtime pipeline that turns raw invocation signals into attributed, confidence-scored records. It runs in three stages.

graph LR
  EV[InvocationEvent] --> A
  subgraph A["Stage A — deterministic"]
    A1[R1 direct key match]
    A2[R2 temporal proximity]
    A3[R3 naming correlation]
    A4[R4 historical patterns]
    A5[R5 service-account trace]
  end
  A -->|unresolved edges| B
  subgraph B["Stage B — RAIL inference"]
    RAIL[Trained RAIL model]
    HB[HeuristicBaseline fallback]
  end
  B --> C
  subgraph C["Stage C — allocation"]
    R6[R6 fractional allocation]
  end
  C --> AR[AttributionRecord]

Stage A — deterministic resolution (R1–R5)

Stage A resolves what can be known for certain, using five deterministic reconciliation methods:

Method Reconciliation signal
R1 Direct key match (API key / service-account identifier)
R2 Temporal proximity
R3 Naming correlation
R4 Historical patterns
R5 Service-account trace

Stage A is bit-reproducible: the same input always yields the same AttributionRecord fields. Anything Stage A resolves carries stage_origin = stage_a and the strongest confidence the evidence allows. This is the backbone of chargeback-grade attribution.

Stage B — trained RAIL edge-existence inference

For edges Stage A cannot resolve deterministically, Stage B asks a narrower question — does this edge exist? — and answers it with the trained RAIL model (RailStageB). RAIL is the attribution-intelligence sidecar described in the next section. It emits a posterior per edge, which is calibrated and materialized into customer-facing confidence.

Stage B is wrapped in fail-open behavior at every level. If the RAIL model times out (its share of the latency budget is a hard 20 ms wall-clock), is missing its artifact, returns an invalid output, or is disabled, the pipeline falls back to the HeuristicBaseline — a permanent safety feature that keeps attribution flowing. RAIL never blocks production traffic and never makes a result fail; it only ever improves the attribution that the deterministic path already guarantees.

Stage C — fractional cost allocation

When an invocation genuinely cannot be pinned to a single owner, Stage C allocates its cost fractionally across the candidate owners using the R6 allocation prior. R6 carries explicit allocation semantics and a deliberate 0.50 confidence ceiling — it is never treated as a Stage B inference feature, and the system is transparent that an allocation is an allocation, not a resolution.

The output of the pipeline is the AttributionRecord: the durable, auditable unit that carries the resolved edge, its output_state, its calibrated confidence (c_oper), the originating stage, and the evidence behind it.

Six output states

Every result lands in exactly one of six honest output states — Venturi never fabricates a default when it does not know:

output_state Meaning
deterministically_resolved Resolved with certainty by Stage A
strongly_inferred High-confidence Stage B inference
bounded Narrowed to a set; not a single owner
ambiguous Multiple plausible owners remain
unknown Insufficient evidence to attribute
not_identifiable Cannot be attributed even in principle

RAIL: the attribution-intelligence sidecar

RAIL (Reconciliation Attribution Intelligence Layer) is the tenant-scoped intelligence service that serves Stage B. It runs inside your boundary as a sidecar to the processor plane. It loads a verified model artifact, performs feature extraction and edge-existence inference, emits confidence and evidence semantics, and fails open to HeuristicBaseline on timeout, missing artifact, invalid output, or a disabled flag.

RAIL is deliberately narrow. It is not a customer-facing app, a training UI, or an authorization boundary. It supplies evidence-backed predictions through one seam and owns nothing else: it does not own dashboards, billing, admin mutation, security controls, or exports.

What confidence means and how to act on it

Every attribution carries a single customer-facing confidence value, coper, on a 0–1 scale. Two numbers govern how you use it:

Threshold Value What it means
Confidence cap 0.95 The highest confidence Venturi ever asserts. This is a deliberate, conservative policy ceiling — Venturi never claims certainty on an inferred attribution.
Chargeback floor 0.80 An attribution must be at or above this to be eligible for chargeback and to count toward a savings-share billing base.

So in practice: anything at 0.80 or above is solid enough to bill and charge back; below it, treat the attribution as advisory and investigate before acting. Venturi caps inferred confidence at 0.95 by policy — when you see a number that high, it means "as confident as we will ever assert," not "certain."

How the model is improved — and who decides

The production RAIL model is selected and promoted by a human Venturi decision through a formal gate, never automatically. Promotion requires per-edge precision and calibration bars to clear, a non-waivable cap on high-confidence errors, and a canary rollout that auto-rolls-back without ever touching customer traffic if calibration drifts. The most explainable model wins ties. You consume the result of that discipline; you never have to operate it.

Evidence on every result

No attribution is presented as a bare number. Every customer-visible result carries interpretation metadata — an evidence card — so you can always see why and how strongly Venturi believes it:

  • stage_origin — Stage A deterministic, Stage B inference (RAIL or HeuristicBaseline), or Stage C allocation.
  • output_state — one of the six states above.
  • coper + confidence band — the calibrated confidence and where it sits.
  • Evidence basis — which reconciliation methods (R1–R5) and which input evidence produced the result.
  • Model version — the RAIL artifact version, where applicable.
  • Degradation state — whether the system was running fully or in a fallback/degraded mode when the result was produced.
  • Freshness — when the underlying data was last reconciled.

The product surfaces these as five trust dimensions on every result — Status, Sources, Permissions, Automation, and Consequences — and the absence of any one is rendered explicitly as "unknown / unavailable," never as a blank or a fabricated default. This is the honest-unknown discipline: Venturi would rather tell you it does not know than guess.

Control plane vs data plane

Data plane (in your VPC) Control plane (Venturi, outbound-only)
Holds Your events, graph, index, attribution records Releases, configuration, licensing
Holds your data? Yes — and only here No
Direction Internal to your boundary Outbound from your boundary only
Customer data egress None None

Your data plane is the only place your operational data lives. The control plane delivers signed, verified release artifacts and configuration inbound to the data plane and never reaches your data. There is no path by which Venturi's control plane reads your invocation events, your graph, or your index.

Fail-open vs fail-closed: the boundary

The single most important rule in the architecture is the fail-open boundary, and it is drawn precisely:

Fail-open applies to exactly one path

Fail-open applies only to customer AI traffic on the gateway hot path. If Venturi is degraded, your AI request is forwarded unmodified and attribution is reconciled later. Venturi cannot take your AI traffic down.

Everything security-relevant fails closed

Fail-closed applies to every security-relevant decision: authentication, authorization (RBAC), tenant isolation, admin mutation, export creation, billing mutation, data-residency routing, legal-gated adoption/workforce views, and support break-glass access. On error, timeout, ambiguity, or missing input, these deny — with zero data egress and an audit entry.

This split is exhaustive and frozen. There is no feature flag that converts a fail-closed path to fail-open. The gateway forwards your traffic when in doubt; the security boundary denies access when in doubt. Both behaviors are tested directly.

Where customer data lives — and where it does not

Data Where it lives Where it never goes
Invocation events, attribution graph, materialized index In your VPC, in your data plane Venturi's control plane or environments
Cost / usage / identity signals Read into your data plane Written back to your environment (read-only)
Prompt and completion content Nowhere — never captured or stored The pipeline has no field for it
Provider admin keys KMS-encrypted in your tenant boundary Venturi control plane never sees plaintext

No content capture, by design

The core pipeline never stores prompt or completion text. The canonical InvocationEvent schema has no content field at all. Attribution is built entirely from metadata — model, tokens, cost, identity, timing. This is structural, not configurable: there is no setting that turns content capture on.

Data that does live in your plane is retained for 13 months operationally, encrypted at rest with a customer-managed key (one key per tenant) and in transit with TLS. Per-subject erasure is supported via crypto-shred within a 30-day SLA. See Trust & security for the full data-handling model.

Energy and carbon attribution

Because the attribution graph already knows which model served each invocation and how many tokens it consumed, Venturi projects the same graph onto energy and carbon accountability. Each invocation is attributed energy (Wh/kWh) and carbon (gCO₂e) from a model-and-region catalog, rolled up the same six layers so a team, service, or budget can see its AI energy and carbon alongside its cost.

Null is not zero

When a model is not in the energy/carbon catalog, Venturi reports the value as null (unknown coverage), never as zero. It will not understate impact by treating missing catalog coverage as no impact, and energy multipliers are never restated as cost figures.

How it fits together

The trust & security model in full