انتقل إلى المحتوى

Observability: Health, Monitoring & Logs

This guide covers the tools and procedures for monitoring the Cloudflare RAG service.


1. Health Checks

Health checks are the first line of defense for detecting issues.

API-Side Health Probes

The Labeeb API provides endpoints to check the health of its Cloudflare dependencies from its own perspective.

Endpoint Probe Checks Potential Outcomes
/api/health/ai-gateway Default (/models) AI Gateway connectivity (cached) ok, degraded
/api/health/ai-gateway ?probe=embeddings Upstream provider quota ok, rate_limited, down
/api/health/edge-llm Worker /llm/health Custom worker health ok, down

Worker-Side Health

  • Endpoint: /rag/health
  • Deep Check: Use ?deep=1 to include a quick embedding timing test, which verifies the Workers AI binding is functional.

2. Logs

Real-time logs are essential for debugging.

  • Command: Use wrangler tail to stream live logs from your worker.
    # Tail logs for a specific environment
    wrangler tail --env stage
    
  • AI Gateway Logs: View detailed logs and analytics in the Cloudflare dashboard under AI → AI Gateway → Logs.

3. Monitoring & Dashboards

AI Gateway Dashboard

The AI Gateway dashboard is the primary tool for monitoring cost, latency, and errors related to third-party LLM providers.

  • Location: AI → AI Gateway → Analytics
  • Key Metrics to Watch:
    • Cache Hit Ratio: Enable Cache responses in the gateway settings to reduce costs and latency. A high cache hit ratio is desirable.
    • Errors: Monitor for spikes in error rates, which could indicate issues with an upstream provider.
    • Latency: Track p95 latency to ensure providers are meeting performance expectations.

Vectorize

Use the Wrangler CLI to inspect the status and metadata of your Vectorize indexes.

# List all Vectorize indexes in your account
wrangler vectorize list

# List the metadata configuration for a specific index
wrangler vectorize list-metadata-index labeeb-articles-dev

Server-Timing Header

The /rag/query endpoint includes a Server-Timing response header, which provides a detailed breakdown of the time spent in each stage of the RAG pipeline (e.g., embed, search, rerank). This is invaluable for performance analysis.