Skip to content

📜 Managing OpenSearch with Infrastructure as Code (IaC)

To ensure our search infrastructure is consistent, reliable, and easy to manage, we define all OpenSearch configurations as code. This approach allows us to build, replicate, and version-control our entire setup automatically.


1. Architectural Decision (ADR-SRCH-21)

  • Status: Accepted

  • Context: Search templates and pipelines were previously scattered, and manual cluster tweaks led to configuration drift over time. A feature flag was needed to gate the bootstrapping process for safe rollouts.

  • Decision:

    • The canonical source for all search-related JSON configuration (templates, pipelines) lives in api/resources/search/.
    • Operational scripts for applying this configuration reside in tools/search/.
    • The environment variable ENABLE_SRCH_11 controls whether the bootstrap scripts install or update the search resources.
  • Consequences:

    • Setting ENABLE_SRCH_11=true installs all templates and pipelines and runs smoke tests to verify the configuration.
    • Setting it to false skips the search setup entirely, providing a quick rollback path.

2. Search Configuration as Code

This section details the specific components managed as code.

Component Templates

  • analysis_ar: Provides an analyzer for Arabic text, including stopword removal and stemming.
  • knn_base: A simple template that enables k-NN indexing by setting index.knn: true.

Index Template: news_docs

This is the main template for all article indices.

  • Index Pattern: news_docs-*
  • Priority: 100
  • Alias: news_docs
  • Composed Of: analysis_ar, knn_base
  • Mappings:
    • title and body: text fields using the rebuilt_arabic analyzer.
    • embedding: A knn_vector field with 768 dimensions using the Lucene HNSW method.
  • Default Pipelines:
    • Ingest Pipeline: text_embed_news (Generates embeddings at index time).
    • Search Pipeline: hybrid_rrf (The default search pipeline).

Search Pipeline: hybrid_rrf

A Reciprocal Rank Fusion (RRF) pipeline that intelligently combines the results from a traditional BM25 keyword search and a k-NN vector search.

Environment Variables

  • SEARCH_PIPELINE_NAME: The name of the search pipeline to be installed and set as the default (defaults to hybrid_rrf).
  • AIBOX_RRF_MODE: Can be set to os to delegate RRF to OpenSearch (the default) or python to use a legacy Python-based fusion in the AI-Box.

3. Usage & Verification

  1. Start the search stack:

    docker compose up -d search dashboards
    

  2. Install templates and run a smoke test: The smoke.sh script is the primary tool for verifying the entire search setup. It installs all resources, indexes sample documents, and verifies that hybrid search is functioning correctly.

    bash tools/search/smoke.sh