Search Service: The Retrieval Playbook¶

Service Status: Operational

This document is the primary operational manual for the Search Service, powered by OpenSearch. As the retrieval backbone of the Labeeb platform, its performance, data integrity, and availability are paramount. This playbook provides comprehensive, actionable guidance for on-call engineers to deploy, monitor, and troubleshoot the search cluster.

1. Mission & Scope¶

The Search Service's mission is to provide a fast, scalable, and highly relevant search experience for all data on the Labeeb platform.

It is designed to be a robust and resilient data store, optimized for the complex queries required by the AI-Box. It combines traditional full-text search (BM25) with modern vector-based semantic search (k-NN) to deliver state-of-the-art results.

Scope of Responsibilities

Is Responsible For:
- Data Indexing: Storing and indexing all article data sent by the API service.
- Hybrid Search Execution: Running complex search queries that combine multiple retrieval techniques.
- Infrastructure as Code: Managing all cluster settings, index templates, and search pipelines as version-controlled JSON files.
- Data Lifecycle Management: Automatically managing the lifecycle of indices through Index State Management (ISM) policies.
Is NOT Responsible For:
- Data Persistence (System of Record): The PostgreSQL database is the ultimate source of truth. The search index can always be rebuilt from the database if necessary.
- Data Normalization: It expects to receive cleaned and structured data from the API service.

Component Diagram¶

flowchart LR
  API-->OS[(OpenSearch)]
  API-->AI[AI-Box]
  subgraph OpenSearch
    Q[Query Pipeline]-->C[Collectors BM25,kNN]
    C-->RRF[RRF Combiner]
  end

Flows¶

Query: API → OpenSearch pipeline (BM25 + kNN) → optional rerank via AI-Box.
Ingest: API (from Scraper) → OpenSearch (index templates, analyzers).

2. Service Responsibilities & Interactions¶

This table defines the Search Service's role and its critical dependencies within the Labeeb platform ecosystem.

Service	Tech Stack	Core Responsibility	Inputs	Outputs	Depends On
Search	OpenSearch	Provides fast, scalable hybrid search capabilities.	Indexed `Article` JSON	Search results (JSON)	None (Base service)

What each one does

Aspect	Embeddings (AI-Box)	NER (Local LLM / S2)
Purpose	Find semantically similar docs, even if phrasing differs	Extract entities (PERSON/ORG/LOC/DATE…) from text
Input	Query text and/or document text	Document text (or a sentence)
Output	A vector (e.g., 768 floats) or ranked results via `/retrieve`	A list of entities with types, spans, normalized names
Used by	Search ranking (kNN; Hybrid BM25+kNN+RRF)	Metadata, filters, entity pages, KG/graph, analytics
Where stored	`embedding` field in OpenSearch docs	DB tables (`entities`, relationships), optionally indexed as text keywords later
If missing	You still have BM25; hybrid is weaker/unavailable	You still have search; you lose facets/links/analytics richness
Today in Labeeb	We write vectors with `indexWithEmbedding(...)`; search is BM25	We run NER to enrich articles; not used for ranking yet

Which to use when¶

For search relevance:
- Today: BM25 only (stable).
- Quietly build for tomorrow: keep generating embeddings and writing them to OS (indexWithEmbedding). When vector coverage is high and latency is good, flip to Hybrid with one switch.
- For product features & analytics:
- Use NER to power UI facets (“People”, “Locations”), entity pages, cross-article linking, conflict detection, and later trust indicators. NER isn’t a ranking signal in our current stage.

Overview¶

Retrieval and ranking for Labeeb: hybrid BM25 + kNN + optional reranker, with Arabic/English analyzers and versioned indices.

Positioning & Responsibilities¶

Provide low-latency retrieval with consistent relevance.
Offer hybrid search (BM25 + vector) with optional AI-Box rerank.
Maintain versioned indices with read/write aliases and safe cutover.
Expose observability to detect hot shards, mapping drift, and slow queries.

Cross-links¶

Pipelines: services/search/pipelines.md
Indices & ILM: services/search/indices.md
Troubleshooting: services/search/troubleshooting.md

SLOs¶

p95 query latency ≤ 800ms (OS only), ≤ 1200ms (with rerank).
Error rate < 1% over 10m.

3. Guiding Principles¶

The architecture and operational philosophy of the Search Service are deeply rooted in these core SRE principles:

Infrastructure as Code ():
- What: All cluster configurations—index templates, component templates, pipelines, and ISM policies—are defined as version-controlled JSON files.
- Why: This ensures that the cluster configuration is reproducible, auditable, and can be safely managed through Git workflows.
- How: A set of shell scripts (tools/search/) is used to apply this configuration to the cluster idempotently.
Data is Rebuildable, Not Sacred ():
- What: The OpenSearch index is treated as a disposable, high-performance cache for the data stored in the PostgreSQL database.
- Why: This operational mindset simplifies disaster recovery. In a catastrophic failure, the entire search index can be deleted and rebuilt from the source of truth without data loss.
Automated Lifecycle Management ():
- What: Index State Management (ISM) policies are used to automate routine tasks like index rollover, snapshotting, and deletion.
- Why: This reduces manual operator toil and ensures that the cluster runs efficiently and within its resource limits.

4. Architecture at a Glance¶

The Search Service is a self-contained OpenSearch cluster that serves search requests from the AI-Box and receives indexing requests from the API.

flowchart TD
    subgraph "Labeeb Platform"
        API[(API Service)]:::svc
        AIB[(AI-Box)]:::svc
    end

    subgraph "Search Service"
        direction LR
        OS[OpenSearch Cluster]:::store
        T[Index Templates]
        P[Search Pipelines]
    end

    API -- "Index Documents" --> OS
    AIB -- "Execute Search Queries" --> OS
    T -- "Define Structure" --> OS
    P -- "Process Queries" --> OS

    classDef svc fill:#f8fafc,stroke:#64748b,stroke-width:1px;
    classDef store fill:#f0fdf4,stroke:#22c55e,stroke-width:1px;

5. Key Operational Playbooks¶

This overview is the entry point. For detailed operational procedures, use the following guides:

Infrastructure as Code (IaC): A detailed breakdown of the templates and pipelines that define the cluster.
Smoke Tests: The primary tool for verifying the health and correctness of the search configuration.
Troubleshooting Guide: Step-by-step playbooks for common incident response scenarios.