SLOs & Budgets (Pilot)¶

This document outlines the initial Service Level Objectives (SLOs) and resource budgets for the Cloudflare RAG service during its pilot phase.

Service Level Objectives (SLOs)¶

Metric	Threshold	Notes
RAG Query Latency (p95)	≤ 1.5 seconds	This is for a "warm" worker. The total time is broken down as follows: - `embed`: ≤ 300ms - `search`: ≤ 600ms - `rerank`: ≤ 300ms
Uptime	99.5%	Measured by the `/rag/health` endpoint.

These are soft limits for the pilot phase to control costs and usage.

Resource	Budget (per month)	Environment
RAG Queries	≤ 10,000	`dev` + `stage` combined
Vector Count	≤ 50,000	Per environment (`dev`, `stage`)

AI Gateway: The Cloudflare AI Gateway automatically handles retries for requests to third-party providers.
API Client: To avoid compounding retries, the Labeeb API's HTTP client should be configured with retry(0) when calling endpoints via the AI Gateway.