Observability: Health, Monitoring & Logs¶
This guide covers the tools and procedures for monitoring the Cloudflare RAG service.
1. Health Checks¶
Health checks are the first line of defense for detecting issues.
API-Side Health Probes¶
The Labeeb API provides endpoints to check the health of its Cloudflare dependencies from its own perspective.
| Endpoint | Probe | Checks | Potential Outcomes |
|---|---|---|---|
/api/health/ai-gateway |
Default (/models) |
AI Gateway connectivity (cached) | ok, degraded |
/api/health/ai-gateway |
?probe=embeddings |
Upstream provider quota | ok, rate_limited, down |
/api/health/edge-llm |
Worker /llm/health |
Custom worker health | ok, down |
Worker-Side Health¶
- Endpoint:
/rag/health - Deep Check: Use
?deep=1to include a quick embedding timing test, which verifies the Workers AI binding is functional.
2. Logs¶
Real-time logs are essential for debugging.
- Command: Use
wrangler tailto stream live logs from your worker. - AI Gateway Logs: View detailed logs and analytics in the Cloudflare dashboard under AI → AI Gateway → Logs.
3. Monitoring & Dashboards¶
AI Gateway Dashboard¶
The AI Gateway dashboard is the primary tool for monitoring cost, latency, and errors related to third-party LLM providers.
- Location: AI → AI Gateway → Analytics
- Key Metrics to Watch:
- Cache Hit Ratio: Enable Cache responses in the gateway settings to reduce costs and latency. A high cache hit ratio is desirable.
- Errors: Monitor for spikes in error rates, which could indicate issues with an upstream provider.
- Latency: Track p95 latency to ensure providers are meeting performance expectations.
Vectorize¶
Use the Wrangler CLI to inspect the status and metadata of your Vectorize indexes.
# List all Vectorize indexes in your account
wrangler vectorize list
# List the metadata configuration for a specific index
wrangler vectorize list-metadata-index labeeb-articles-dev
Server-Timing Header¶
The /rag/query endpoint includes a Server-Timing response header, which provides a detailed breakdown of the time spent in each stage of the RAG pipeline (e.g., embed, search, rerank). This is invaluable for performance analysis.