📜 Managing OpenSearch with Infrastructure as Code (IaC)¶
To ensure our search infrastructure is consistent, reliable, and easy to manage, we define all OpenSearch configurations as code. This approach allows us to build, replicate, and version-control our entire setup automatically.
1. Architectural Decision (ADR-SRCH-21)¶
-
Status: Accepted
-
Context: Search templates and pipelines were previously scattered, and manual cluster tweaks led to configuration drift over time. A feature flag was needed to gate the bootstrapping process for safe rollouts.
-
Decision:
- The canonical source for all search-related JSON configuration (templates, pipelines) lives in
api/resources/search/. - Operational scripts for applying this configuration reside in
tools/search/. - The environment variable
ENABLE_SRCH_11controls whether the bootstrap scripts install or update the search resources.
- The canonical source for all search-related JSON configuration (templates, pipelines) lives in
-
Consequences:
- Setting
ENABLE_SRCH_11=trueinstalls all templates and pipelines and runs smoke tests to verify the configuration. - Setting it to
falseskips the search setup entirely, providing a quick rollback path.
- Setting
2. Search Configuration as Code¶
This section details the specific components managed as code.
Component Templates¶
analysis_ar: Provides an analyzer for Arabic text, including stopword removal and stemming.knn_base: A simple template that enables k-NN indexing by settingindex.knn: true.
Index Template: news_docs¶
This is the main template for all article indices.
- Index Pattern:
news_docs-* - Priority: 100
- Alias:
news_docs - Composed Of:
analysis_ar,knn_base - Mappings:
titleandbody:textfields using therebuilt_arabicanalyzer.embedding: Aknn_vectorfield with 768 dimensions using the Lucene HNSW method.
- Default Pipelines:
- Ingest Pipeline:
text_embed_news(Generates embeddings at index time). - Search Pipeline:
hybrid_rrf(The default search pipeline).
- Ingest Pipeline:
Search Pipeline: hybrid_rrf¶
A Reciprocal Rank Fusion (RRF) pipeline that intelligently combines the results from a traditional BM25 keyword search and a k-NN vector search.
Environment Variables¶
SEARCH_PIPELINE_NAME: The name of the search pipeline to be installed and set as the default (defaults tohybrid_rrf).AIBOX_RRF_MODE: Can be set toosto delegate RRF to OpenSearch (the default) orpythonto use a legacy Python-based fusion in the AI-Box.
3. Usage & Verification¶
-
Start the search stack:
-
Install templates and run a smoke test: The
smoke.shscript is the primary tool for verifying the entire search setup. It installs all resources, indexes sample documents, and verifies that hybrid search is functioning correctly.