Observability: Health, Monitoring & Logs¶

This guide covers the tools and procedures for monitoring the Cloudflare RAG service.

1. Health Checks¶

Health checks are the first line of defense for detecting issues.

API-Side Health Probes¶

The Labeeb API provides endpoints to check the health of its Cloudflare dependencies from its own perspective.

Endpoint	Probe	Checks	Potential Outcomes
`/api/health/ai-gateway`	Default (`/models`)	AI Gateway connectivity (cached)	`ok`, `degraded`
`/api/health/ai-gateway`	`?probe=embeddings`	Upstream provider quota	`ok`, `rate_limited`, `down`
`/api/health/edge-llm`	Worker `/llm/health`	Custom worker health	`ok`, `down`

Worker-Side Health¶

Endpoint: /rag/health
Deep Check: Use ?deep=1 to include a quick embedding timing test, which verifies the Workers AI binding is functional.

2. Logs¶

Real-time logs are essential for debugging.

Command: Use wrangler tail to stream live logs from your worker.

# Tail logs for a specific environment
wrangler tail --env stage

AI Gateway Logs: View detailed logs and analytics in the Cloudflare dashboard under AI → AI Gateway → Logs.

3. Monitoring & Dashboards¶

AI Gateway Dashboard¶

The AI Gateway dashboard is the primary tool for monitoring cost, latency, and errors related to third-party LLM providers.

Location: AI → AI Gateway → Analytics
Key Metrics to Watch:
- Cache Hit Ratio: Enable Cache responses in the gateway settings to reduce costs and latency. A high cache hit ratio is desirable.
- Errors: Monitor for spikes in error rates, which could indicate issues with an upstream provider.
- Latency: Track p95 latency to ensure providers are meeting performance expectations.

Vectorize¶

Use the Wrangler CLI to inspect the status and metadata of your Vectorize indexes.

# List all Vectorize indexes in your account
wrangler vectorize list

# List the metadata configuration for a specific index
wrangler vectorize list-metadata-index labeeb-articles-dev

Server-Timing Header¶

The /rag/query endpoint includes a Server-Timing response header, which provides a detailed breakdown of the time spent in each stage of the RAG pipeline (e.g., embed, search, rerank). This is invaluable for performance analysis.