Cloudflare RAG Service: Overview¶
Mission
This service provides a unified, high-performance layer for Retrieval-Augmented Generation (RAG), vector search, and on-demand LLM tasks (translation, ASR). It leverages Cloudflare's serverless infrastructure (Workers, Vectorize, AI) to offer a scalable and cost-effective solution that is tightly integrated with the Labeeb API.
Quick Reference¶
| Component | Path / Location | Key Bindings / Endpoints |
|---|---|---|
| Worker Code | cloudflare/ |
Entrypoint: src/index.ts |
| RAG & LLM | Cloudflare Worker | /rag/health, /rag/query, /llm/chat, /llm/translate, /llm/asr |
| AI Bindings | wrangler.jsonc |
AI (Workers AI), VECTORIZE (Vectorize DB) |
| AI Gateway | Cloudflare Dashboard | Provides an OpenAI-compatible base URL |
| API Clients | api/app/Services/ |
AiGatewayClient (for AI Gateway), EdgeLlmClient (for this worker) |
High-Level Architecture¶
The Cloudflare worker acts as an intelligent routing and processing layer between the core API and various AI models.
graph TD
subgraph "Labeeb Platform"
Frontend("Frontend"):::platform
API("API Service"):::platform
end
subgraph "Cloudflare Infrastructure"
AIGateway("AI Gateway"):::cloudflare
Worker("Serverless Worker"):::cloudflare
end
subgraph "AI Models & Data"
subgraph "Third-Party (via Gateway)"
OAI("OpenAI / Anthropic"):::models
end
subgraph "Cloudflare Native"
VDB["Vectorize Database"]:::models
WAI("Workers AI Models"):::models
end
end
Frontend --> API
API -- "Summaries, Classifications" --> AIGateway
AIGateway -- "Cached & Logged Requests" --> OAI
API -- "RAG, Translate, ASR" --> Worker
Worker -- "Vector Search" --> VDB
Worker -- "Embed, Rerank, LLM" --> WAI
Model Stack (Workers AI)¶
| Task | Model Name | Notes |
|---|---|---|
| Embeddings | @cf/baai/bge-m3 |
1024 dimensions |
| Reranking | @cf/baai/bge-reranker-base |
Improves search relevance |
| General LLM | @cf/meta/llama-3.1-8b-instruct-fp8-fast |
For chat and generation |
| Translation | @cf/meta/m2m100-1.2b |
Multilingual translation |
| ASR | @cf/openai/whisper-large-v3-turbo |
Audio-to-text transcription |
Change Management¶
To ensure stability, changes to the worker and its contracts follow these guidelines:
- Additive Changes: All configuration changes in
wrangler.jsoncshould be additive. Avoid removing or renaming existing variables or bindings to maintain backward compatibility. - Stable Schemas: The
/rag/queryendpoint's request and response schemas are considered stable. Add new fields if necessary, but do not remove or alter existing ones. - Feature Flags: Use environment variables as feature flags to control new or experimental functionality.
- API-side:
CF_AI_GATEWAY_ENABLED - Worker-side:
CF_RAG_TOP_K,CF_RERANK_TOP_N
- API-side:
- Issue Tracking: Use the
infra:cloudflarelabel in GitHub issues and pull requests for any changes related to this service.