Versioning and rate limits¶
Venturi is the enterprise system of record for AI consumption, so integrations built on it are meant to run for years. Two contracts make that durable: a versioning and deprecation policy that guarantees backward compatibility within a major version, and rate limits that protect tenant isolation and platform stability with a predictable, standards-based 429 contract.
API versioning¶
The API is versioned by major version in the path — the public API lives under
/api/v1, the partner API under /api/partner/v1. The OpenAPI 3.1 document is
the single source of truth for the contract, published alongside a
machine-readable changelog and the SDKs.
The backward-compatibility guarantee¶
Within a major version, the contract only grows — it never breaks. Specifically,
within v1:
- No field is ever removed, renamed, retyped, or narrowed.
- No closed enum loses a member — including the output states, the edge taxonomy, and the error-code catalog.
- No required input becomes more restrictive.
- Confidence semantics are never versioned away —
coperstays capped at 0.95, and the 0.80 chargeback floor stays the single floor.
Additive changes are always allowed: new optional fields, new routes, new optional parameters, and new members on explicitly open enums. Design your clients to tolerate unknown response fields — that is what lets you adopt additive changes without redeploying.
A v1.0 client keeps working
A client written against v1.0 continues to function unchanged against any
later v1.x. This is enforced: a schema-diff gate compares each candidate
contract against the last released document for the same major version and
blocks any breaking change in CI. Only additive diffs ship within a major
version.
The same discipline governs the webhook event catalog: within a
major event-schema version, no event payload field is removed, renamed, retyped,
or narrowed, and no event type is removed. New event types and optional payload
fields are additive.
Breaking changes and deprecation¶
A genuinely breaking change ships as a new major version (/api/v2), served
in parallel with the current one. The old version remains available through a
deprecation window of at least 12 months, so you migrate on your schedule.
| Stage | What it means |
|---|---|
| Active | The current major version; full support and additive evolution. |
| Deprecated | A newer major version exists; the deprecated version keeps working for ≥12 months. Migration guidance is published in the changelog. |
| Sunset | After the window closes, the version is retired. |
The SDKs track major versions: a breaking API change ships alongside a coordinated new SDK major version, and the changelog records the API-to-SDK version mapping. Client generation is deterministic and pinned, so a given SDK version is reproducible against a given API version.
Discover the contract
The OpenAPI document, an interactive reference, and a generated Postman collection are all published from the same route models and versioned with the changelog. Pin your SDK version and watch the changelog for additive updates and deprecation notices.
Rate limits¶
Rate limiting protects tenant isolation and platform stability. Limits are
enforced per (tenant, principal, route-class) using a token bucket — a
sustained rate plus a burst allowance. Bursts within the allowance pass;
sustained traffic above the rate is throttled.
- Tenant isolation. One tenant exhausting its bucket never throttles another.
- Separate buckets for heavy routes. Bulk and export routes carry their own, lower limits, so a large export run does not starve your interactive reads.
- Defaults per plan, overridable per tenant. Per-tenant overrides take effect without a redeploy.
- Fail closed at the edge. A caller with no valid token is rejected before any limit is even evaluated.
Reading your budget¶
Every response includes your current limit state in headers, and you can query it directly at any time:
The endpoint reports the same remaining budget the response headers report. Your remaining quota is also visible in the admin console before you ever hit a wall.
The 429 contract¶
A throttled request returns 429 Too Many Requests as RFC 9457
application/problem+json with error_code: RATE_LIMITED, and carries a pinned,
standards-based header set:
| Header | Meaning |
|---|---|
Retry-After |
Seconds to wait before retrying. |
RateLimit-Limit |
The bucket's limit for the window. |
RateLimit-Remaining |
Requests remaining in the current window. |
RateLimit-Reset |
Seconds until the window resets (not an absolute timestamp). |
HTTP/1.1 429 Too Many Requests
Retry-After: 12
RateLimit-Limit: 600
RateLimit-Remaining: 0
RateLimit-Reset: 12
Content-Type: application/problem+json
{
"type": "https://venturi.systems/errors/rate-limited",
"title": "Too many requests",
"status": 429,
"error_code": "RATE_LIMITED",
"trace_id": "01HF8...",
"docs_url": "https://docs.venturi.systems/developers/versioning-and-rate-limits/"
}
The header field names and units are pinned to a named specification revision and
are frozen within a major version — so a backoff implementation written today
keeps parsing the same fields for the life of v1.
Handling 429 correctly¶
Honor Retry-After, and add jitter so retrying clients do not synchronize into a
thundering herd:
import time, random, requests
def call_with_backoff(do_request, max_attempts=5):
for attempt in range(max_attempts):
resp = do_request()
if resp.status_code != 429:
return resp
retry_after = int(resp.headers.get("Retry-After", "1"))
time.sleep(retry_after + random.uniform(0, 1)) # jitter
raise RuntimeError("rate limit: retries exhausted")
Let the SDK do it
The TypeScript and Python SDKs honor Retry-After with jittered
backoff automatically and raise a typed RateLimitError only after retries
are exhausted — so you rarely write this loop by hand.