Skip to content

Troubleshooting

Search Service Runbook

Service Status: Operational

The Search Service, powered by OpenSearch, is the retrieval backbone of the Labeeb platform. It provides fast, scalable, and relevant search results by combining traditional keyword search with modern vector-based semantic search.

1. Core Responsibilities

  • Data Indexing: Stores and indexes all processed articles, making them available for search.
  • Hybrid Search: Executes hybrid search queries, combining BM25 (lexical) and k-NN (semantic) search results.
  • Search Pipelines: Hosts and manages the OpenSearch pipelines used for query-time result fusion (e.g., Reciprocal Rank Fusion - RRF).
  • Infrastructure as Code (IaC): All cluster configurations—index templates, component templates, and pipelines—are managed as version-controlled JSON files.

2. Key Operational Data

This section provides the essential commands and endpoints for operating the OpenSearch cluster.

Operation Command / Endpoint
Cluster Health curl http://localhost:9200/_cluster/health?pretty
List Indices curl http://localhost:9200/_cat/indices?v
View Index Mapping curl http://localhost:9200/news_docs/_mapping?pretty
Open Shell docker compose exec opensearch bash

3. Infrastructure as Code (IaC)

We do not manage the OpenSearch cluster manually. All configuration is defined as code to ensure consistency, reproducibility, and version control.

Location of IaC Files

The canonical JSON definitions for all OpenSearch resources are located in the api/resources/search/ directory.

  • Index Templates: Define the mappings, settings, and aliases for our indices (e.g., news_docs.json).
  • Component Templates: Reusable building blocks for index templates (e.g., knn_base.json, analysis_ar.json).
  • Pipelines: Define processors for ingest and search operations (e.g., hybrid_rrf.json).

IaC Management Scripts

Two primary scripts in the tools/search/ directory are used to manage this configuration.

  • install.sh

    A one-time setup script that applies all the IaC files to the OpenSearch cluster. It creates the templates and pipelines required for the system to function.

  • smoke.sh

    A powerful, idempotent script that verifies the entire search configuration. It installs all resources if they are missing, creates a test index, seeds it with data, and runs a series of queries to ensure that hybrid search is working correctly.

    This is your primary tool for verifying the search subsystem.

    # Run the smoke test to verify or repair the search configuration
    bash tools/search/smoke.sh
    

4. Troubleshooting

Common Failure Mode: Yellow or Red Cluster Status

Symptom: The /_cluster/health endpoint shows a status of yellow or red.

Triage Steps:

  1. Check Node Status: A yellow status often indicates unassigned shards but the cluster is still functional. A red status means at least one primary shard is unavailable, and the cluster is non-operational.
  2. Check for Failing Nodes: Use docker compose logs opensearch to look for errors, especially out-of-memory exceptions (OutOfMemoryError).
  3. Allocate More Memory: If you see memory errors, increase the OPENSEARCH_JAVA_OPTS in your docker-compose.yml file (e.g., from -Xms512m -Xmx512m to -Xms1g -Xmx1g).

Common Failure Mode: Incorrect Mappings

Symptom: The smoke.sh script fails with an error like embedding.dimension != 768.

Triage Steps:

  1. Run the Smoke Test: The smoke.sh script is designed to detect and often automatically fix mapping issues by rolling over to a new index.
  2. Inspect Mappings Manually: Use curl http://localhost:9200/news_docs/_mapping?pretty to inspect the live mapping and compare it against the definition in api/resources/search/templates/news_docs.json.
  3. Re-install Templates: If the mappings are incorrect, you can manually re-run the install.sh script, though this should not typically be necessary.