AI-Box Operations Runbook¶
This document provides step-by-step, checklist-style procedures for all common operational tasks related to the AI-Box service. It is designed for clarity and action under pressure.
Procedure: Diagnosing Search Relevance Issues¶
Objective
To triage a user report of "bad" or irrelevant search results from the /retrieve endpoint. This procedure helps isolate whether the issue is with keyword search, vector search, or the ranking logic.
Checklist¶
-
Replicate the Exact Query:
- Use
curlto send the exact request payload that is producing irrelevant results.
- Use
-
Isolate the BM25 (Keyword) Leg:
- Re-run the query with vector search disabled (
k_knn: 0). This shows you the raw keyword search results. - Analyze: Are these results relevant? If not, the issue may be with the text analysis configuration in OpenSearch.
- Re-run the query with vector search disabled (
-
Isolate the k-NN (Vector) Leg:
- Re-run the query with keyword search disabled (
k_bm25: 0). This shows you the raw semantic search results. - Analyze: Are these results semantically related to the query? If not, the issue may be with the embedding model or the vector index.
- Re-run the query with keyword search disabled (
-
Check OpenSearch Directly:
- If one of the legs is returning poor results, construct a raw OpenSearch query to bypass the AI-Box entirely. This can confirm if the issue is in the AI-Box's query construction or in the search cluster itself.
Incident Response: High Search Latency¶
Incident Priority: High
Symptom: The aibox_request_duration_seconds metric for the /retrieve or /retrieve_pack endpoints is elevated, or API calls are timing out.
Triage & Recovery Checklist¶
-
Check AI-Box Service Logs:
- Action:
docker compose logs -f ai-box. - Look for: Any obvious errors, warnings, or timeouts in the application logs.
- Action:
-
Check OpenSearch Cluster Health:
- Action: High search latency in the AI-Box is almost always caused by high search latency in OpenSearch. Check the OpenSearch cluster's CPU, memory, and query performance via your monitoring dashboards.
- Verify:
curl http://localhost:9200/_cluster/health?pretty.
-
Inspect Query Diagnostics:
- Action: Re-run a slow query and inspect the
diagnosticsobject in the JSON response. - Analyze: The
bm25_msandknn_msfields will tell you exactly which part of the hybrid search is slow.
- Action: Re-run a slow query and inspect the
-
Check for Expensive Queries:
- A very broad query, a query with complex filters, or a very high
kvalue can cause high latency. Review the slow query for any obvious issues.
- A very broad query, a query with complex filters, or a very high
Known Errors & Quick Fixes¶
parsing_exception: Unknown key for a START_OBJECT in [knn]¶
Use the query-form for kNN within query:
curl -s -XPOST localhost:9200/news_docs/_search -H 'content-type: application/json' -d '{
"size": 2,
"query": {
"knn": {
"embedding": { "vector": [0.1,0.2,0.3,0.4], "k": 2 }
}
}
}'
zero vector is not supported when space type is [cosinesimil]¶
Ensure non-zero unit vectors when testing kNN queries. Generate one quickly:
python - <<'PY'
import random, math, json; random.seed(42)
v=[random.random() for _ in range(4)]
n=math.sqrt(sum(x*x for x in v)); print(json.dumps([x/n for x in v]))
PY
pipeline with id [text_embed_news] does not exist¶
Either remove the pipeline parameter from the indexing call or create a stub pipeline before indexing.
strict_dynamic_mapping_exception¶
The index uses dynamic: "strict". Add fields via PUT _mapping or prefer enriching at read-time through /retrieve_pack hydration instead of _update.
502 "Search backend error"¶
- Verify OpenSearch health:
curl -s localhost:9200/_cluster/health - Confirm env:
OS_INDEX,VECTOR_FIELDmatch mapping; list indices:curl -s localhost:9200/_cat/indices - Run a minimal
_searchto validate the index responds.
Metrics quick queries¶
- RRF time (avg):
rate(aibox_retrieval_rrf_ms_sum[5m]) / rate(aibox_retrieval_rrf_ms_count[5m]) - Request latency p95:
histogram_quantile(0.95, rate(aibox_request_duration_seconds_bucket[5m])) - Error rate:
rate(aibox_requests_total{status=~"5.."}[5m])