Search Service: The Retrieval Playbook¶
Service Status: Operational
This document is the primary operational manual for the Search Service, powered by OpenSearch. As the retrieval backbone of the Labeeb platform, its performance, data integrity, and availability are paramount. This playbook provides comprehensive, actionable guidance for on-call engineers to deploy, monitor, and troubleshoot the search cluster.
1. Mission & Scope¶
The Search Service's mission is to provide a fast, scalable, and highly relevant search experience for all data on the Labeeb platform.
It is designed to be a robust and resilient data store, optimized for the complex queries required by the AI-Box. It combines traditional full-text search (BM25) with modern vector-based semantic search (k-NN) to deliver state-of-the-art results.
Scope of Responsibilities
-
Is Responsible For:
- Data Indexing: Storing and indexing all article data sent by the API service.
- Hybrid Search Execution: Running complex search queries that combine multiple retrieval techniques.
- Infrastructure as Code: Managing all cluster settings, index templates, and search pipelines as version-controlled JSON files.
- Data Lifecycle Management: Automatically managing the lifecycle of indices through Index State Management (ISM) policies.
-
Is NOT Responsible For:
- Data Persistence (System of Record): The PostgreSQL database is the ultimate source of truth. The search index can always be rebuilt from the database if necessary.
- Data Normalization: It expects to receive cleaned and structured data from the API service.
Component Diagram¶
flowchart LR
API-->OS[(OpenSearch)]
API-->AI[AI-Box]
subgraph OpenSearch
Q[Query Pipeline]-->C[Collectors BM25,kNN]
C-->RRF[RRF Combiner]
end
Flows¶
- Query: API → OpenSearch pipeline (BM25 + kNN) → optional rerank via AI-Box.
- Ingest: API (from Scraper) → OpenSearch (index templates, analyzers).
2. Service Responsibilities & Interactions¶
This table defines the Search Service's role and its critical dependencies within the Labeeb platform ecosystem.
| Service | Tech Stack | Core Responsibility | Inputs | Outputs | Depends On |
|---|---|---|---|---|---|
| Search | OpenSearch | Provides fast, scalable hybrid search capabilities. | Indexed Article JSON |
Search results (JSON) | None (Base service) |
What each one does
| Aspect | Embeddings (AI-Box) | NER (Local LLM / S2) |
|---|---|---|
| Purpose | Find semantically similar docs, even if phrasing differs | Extract entities (PERSON/ORG/LOC/DATE…) from text |
| Input | Query text and/or document text | Document text (or a sentence) |
| Output | A vector (e.g., 768 floats) or ranked results via /retrieve |
A list of entities with types, spans, normalized names |
| Used by | Search ranking (kNN; Hybrid BM25+kNN+RRF) | Metadata, filters, entity pages, KG/graph, analytics |
| Where stored | embedding field in OpenSearch docs |
DB tables (entities, relationships), optionally indexed as text keywords later |
| If missing | You still have BM25; hybrid is weaker/unavailable | You still have search; you lose facets/links/analytics richness |
| Today in Labeeb | We write vectors with indexWithEmbedding(...); search is BM25 |
We run NER to enrich articles; not used for ranking yet |
Which to use when¶
-
For search relevance:
- Today: BM25 only (stable).
- Quietly build for tomorrow: keep generating embeddings and writing them to OS (
indexWithEmbedding). When vector coverage is high and latency is good, flip to Hybrid with one switch. -
For product features & analytics:
-
Use NER to power UI facets (“People”, “Locations”), entity pages, cross-article linking, conflict detection, and later trust indicators. NER isn’t a ranking signal in our current stage.
Overview¶
Retrieval and ranking for Labeeb: hybrid BM25 + kNN + optional reranker, with Arabic/English analyzers and versioned indices.
Positioning & Responsibilities¶
- Provide low-latency retrieval with consistent relevance.
- Offer hybrid search (BM25 + vector) with optional AI-Box rerank.
- Maintain versioned indices with read/write aliases and safe cutover.
- Expose observability to detect hot shards, mapping drift, and slow queries.
Cross-links¶
- Pipelines: services/search/pipelines.md
- Indices & ILM: services/search/indices.md
- Troubleshooting: services/search/troubleshooting.md
SLOs¶
- p95 query latency ≤ 800ms (OS only), ≤ 1200ms (with rerank).
- Error rate < 1% over 10m.
3. Guiding Principles¶
The architecture and operational philosophy of the Search Service are deeply rooted in these core SRE principles:
-
Infrastructure as Code ():
- What: All cluster configurations—index templates, component templates, pipelines, and ISM policies—are defined as version-controlled JSON files.
- Why: This ensures that the cluster configuration is reproducible, auditable, and can be safely managed through Git workflows.
- How: A set of shell scripts (
tools/search/) is used to apply this configuration to the cluster idempotently.
-
Data is Rebuildable, Not Sacred ():
- What: The OpenSearch index is treated as a disposable, high-performance cache for the data stored in the PostgreSQL database.
- Why: This operational mindset simplifies disaster recovery. In a catastrophic failure, the entire search index can be deleted and rebuilt from the source of truth without data loss.
-
Automated Lifecycle Management ():
- What: Index State Management (ISM) policies are used to automate routine tasks like index rollover, snapshotting, and deletion.
- Why: This reduces manual operator toil and ensures that the cluster runs efficiently and within its resource limits.
4. Architecture at a Glance¶
The Search Service is a self-contained OpenSearch cluster that serves search requests from the AI-Box and receives indexing requests from the API.
flowchart TD
subgraph "Labeeb Platform"
API[(API Service)]:::svc
AIB[(AI-Box)]:::svc
end
subgraph "Search Service"
direction LR
OS[OpenSearch Cluster]:::store
T[Index Templates]
P[Search Pipelines]
end
API -- "Index Documents" --> OS
AIB -- "Execute Search Queries" --> OS
T -- "Define Structure" --> OS
P -- "Process Queries" --> OS
classDef svc fill:#f8fafc,stroke:#64748b,stroke-width:1px;
classDef store fill:#f0fdf4,stroke:#22c55e,stroke-width:1px;
5. Key Operational Playbooks¶
This overview is the entry point. For detailed operational procedures, use the following guides:
- Infrastructure as Code (IaC): A detailed breakdown of the templates and pipelines that define the cluster.
- Smoke Tests: The primary tool for verifying the health and correctness of the search configuration.
- Troubleshooting Guide: Step-by-step playbooks for common incident response scenarios.