Troubleshooting¶
Search Service Runbook¶
Service Status: Operational
The Search Service, powered by OpenSearch, is the retrieval backbone of the Labeeb platform. It provides fast, scalable, and relevant search results by combining traditional keyword search with modern vector-based semantic search.
1. Core Responsibilities¶
- Data Indexing: Stores and indexes all processed articles, making them available for search.
- Hybrid Search: Executes hybrid search queries, combining BM25 (lexical) and k-NN (semantic) search results.
- Search Pipelines: Hosts and manages the OpenSearch pipelines used for query-time result fusion (e.g., Reciprocal Rank Fusion - RRF).
- Infrastructure as Code (IaC): All cluster configurations—index templates, component templates, and pipelines—are managed as version-controlled JSON files.
2. Key Operational Data¶
This section provides the essential commands and endpoints for operating the OpenSearch cluster.
| Operation | Command / Endpoint |
|---|---|
| Cluster Health | curl http://localhost:9200/_cluster/health?pretty |
| List Indices | curl http://localhost:9200/_cat/indices?v |
| View Index Mapping | curl http://localhost:9200/news_docs/_mapping?pretty |
| Open Shell | docker compose exec opensearch bash |
3. Infrastructure as Code (IaC)¶
We do not manage the OpenSearch cluster manually. All configuration is defined as code to ensure consistency, reproducibility, and version control.
Location of IaC Files
The canonical JSON definitions for all OpenSearch resources are located in the api/resources/search/ directory.
- Index Templates: Define the mappings, settings, and aliases for our indices (e.g.,
news_docs.json). - Component Templates: Reusable building blocks for index templates (e.g.,
knn_base.json,analysis_ar.json). - Pipelines: Define processors for ingest and search operations (e.g.,
hybrid_rrf.json).
IaC Management Scripts¶
Two primary scripts in the tools/search/ directory are used to manage this configuration.
-
install.shA one-time setup script that applies all the IaC files to the OpenSearch cluster. It creates the templates and pipelines required for the system to function.
-
smoke.shA powerful, idempotent script that verifies the entire search configuration. It installs all resources if they are missing, creates a test index, seeds it with data, and runs a series of queries to ensure that hybrid search is working correctly.
4. Troubleshooting¶
Common Failure Mode: Yellow or Red Cluster Status
Symptom: The /_cluster/health endpoint shows a status of yellow or red.
Triage Steps:
- Check Node Status: A
yellowstatus often indicates unassigned shards but the cluster is still functional. Aredstatus means at least one primary shard is unavailable, and the cluster is non-operational. - Check for Failing Nodes: Use
docker compose logs opensearchto look for errors, especially out-of-memory exceptions (OutOfMemoryError). - Allocate More Memory: If you see memory errors, increase the
OPENSEARCH_JAVA_OPTSin yourdocker-compose.ymlfile (e.g., from-Xms512m -Xmx512mto-Xms1g -Xmx1g).
Common Failure Mode: Incorrect Mappings
Symptom: The smoke.sh script fails with an error like embedding.dimension != 768.
Triage Steps:
- Run the Smoke Test: The
smoke.shscript is designed to detect and often automatically fix mapping issues by rolling over to a new index. - Inspect Mappings Manually: Use
curl http://localhost:9200/news_docs/_mapping?prettyto inspect the live mapping and compare it against the definition inapi/resources/search/templates/news_docs.json. - Re-install Templates: If the mappings are incorrect, you can manually re-run the
install.shscript, though this should not typically be necessary.