انتقل إلى المحتوى

Cloudflare RAG Service: Overview

Mission

This service provides a unified, high-performance layer for Retrieval-Augmented Generation (RAG), vector search, and on-demand LLM tasks (translation, ASR). It leverages Cloudflare's serverless infrastructure (Workers, Vectorize, AI) to offer a scalable and cost-effective solution that is tightly integrated with the Labeeb API.


Quick Reference

Component Path / Location Key Bindings / Endpoints
Worker Code cloudflare/ Entrypoint: src/index.ts
RAG & LLM Cloudflare Worker /rag/health, /rag/query, /llm/chat, /llm/translate, /llm/asr
AI Bindings wrangler.jsonc AI (Workers AI), VECTORIZE (Vectorize DB)
AI Gateway Cloudflare Dashboard Provides an OpenAI-compatible base URL
API Clients api/app/Services/ AiGatewayClient (for AI Gateway), EdgeLlmClient (for this worker)

High-Level Architecture

The Cloudflare worker acts as an intelligent routing and processing layer between the core API and various AI models.

graph TD
    subgraph "Labeeb Platform"
        Frontend("Frontend"):::platform
        API("API Service"):::platform
    end

    subgraph "Cloudflare Infrastructure"
        AIGateway("AI Gateway"):::cloudflare
        Worker("Serverless Worker"):::cloudflare
    end

    subgraph "AI Models & Data"
        subgraph "Third-Party (via Gateway)"
            OAI("OpenAI / Anthropic"):::models
        end
        subgraph "Cloudflare Native"
            VDB["Vectorize Database"]:::models
            WAI("Workers AI Models"):::models
        end
    end

    Frontend --> API
    API -- "Summaries, Classifications" --> AIGateway
    AIGateway -- "Cached & Logged Requests" --> OAI

    API -- "RAG, Translate, ASR" --> Worker
    Worker -- "Vector Search" --> VDB
    Worker -- "Embed, Rerank, LLM" --> WAI

Model Stack (Workers AI)

Task Model Name Notes
Embeddings @cf/baai/bge-m3 1024 dimensions
Reranking @cf/baai/bge-reranker-base Improves search relevance
General LLM @cf/meta/llama-3.1-8b-instruct-fp8-fast For chat and generation
Translation @cf/meta/m2m100-1.2b Multilingual translation
ASR @cf/openai/whisper-large-v3-turbo Audio-to-text transcription

Change Management

To ensure stability, changes to the worker and its contracts follow these guidelines:

  • Additive Changes: All configuration changes in wrangler.jsonc should be additive. Avoid removing or renaming existing variables or bindings to maintain backward compatibility.
  • Stable Schemas: The /rag/query endpoint's request and response schemas are considered stable. Add new fields if necessary, but do not remove or alter existing ones.
  • Feature Flags: Use environment variables as feature flags to control new or experimental functionality.
    • API-side: CF_AI_GATEWAY_ENABLED
    • Worker-side: CF_RAG_TOP_K, CF_RERANK_TOP_N
  • Issue Tracking: Use the infra:cloudflare label in GitHub issues and pull requests for any changes related to this service.