SLOs & Budgets (Pilot)¶
This document outlines the initial Service Level Objectives (SLOs) and resource budgets for the Cloudflare RAG service during its pilot phase.
Service Level Objectives (SLOs)¶
| Metric | Threshold | Notes |
|---|---|---|
| RAG Query Latency (p95) | ≤ 1.5 seconds | This is for a "warm" worker. The total time is broken down as follows: - embed: ≤ 300ms - search: ≤ 600ms - rerank: ≤ 300ms |
| Uptime | 99.5% | Measured by the /rag/health endpoint. |
Resource & Cost Budgets¶
These are soft limits for the pilot phase to control costs and usage.
| Resource | Budget (per month) | Environment |
|---|---|---|
| RAG Queries | ≤ 10,000 | dev + stage combined |
| Vector Count | ≤ 50,000 | Per environment (dev, stage) |
Retry Policy¶
- AI Gateway: The Cloudflare AI Gateway automatically handles retries for requests to third-party providers.
- API Client: To avoid compounding retries, the Labeeb API's HTTP client should be configured with
retry(0)when calling endpoints via the AI Gateway.