AI-Box Operations Runbook¶

This document provides step-by-step, checklist-style procedures for all common operational tasks related to the AI-Box service. It is designed for clarity and action under pressure.

Procedure: Diagnosing Search Relevance Issues¶

Objective

To triage a user report of "bad" or irrelevant search results from the /retrieve endpoint. This procedure helps isolate whether the issue is with keyword search, vector search, or the ranking logic.

Checklist¶

Replicate the Exact Query:

Use curl to send the exact request payload that is producing irrelevant results.

curl -X POST http://localhost:8001/retrieve -H "Content-Type: application/json" -d '{
  "query": "<the exact user query>",
  "lang": "en",
  "k_bm25": 20,
  "k_knn": 20
}'

Isolate the BM25 (Keyword) Leg:
- Re-run the query with vector search disabled (k_knn: 0). This shows you the raw keyword search results.
```
curl -X POST http://localhost:8001/retrieve -H "Content-Type: application/json" -d '{
  "query": "<the exact user query>",
  "k_bm25": 20,
  "k_knn": 0
}'
```
- Analyze: Are these results relevant? If not, the issue may be with the text analysis configuration in OpenSearch.
Isolate the k-NN (Vector) Leg:
- Re-run the query with keyword search disabled (k_bm25: 0). This shows you the raw semantic search results.
```
curl -X POST http://localhost:8001/retrieve -H "Content-Type: application/json" -d '{
  "query": "<the exact user query>",
  "k_bm25": 0,
  "k_knn": 20
}'
```
- Analyze: Are these results semantically related to the query? If not, the issue may be with the embedding model or the vector index.
Check OpenSearch Directly:
- If one of the legs is returning poor results, construct a raw OpenSearch query to bypass the AI-Box entirely. This can confirm if the issue is in the AI-Box's query construction or in the search cluster itself.

Incident Response: High Search Latency¶

Incident Priority: High

Symptom: The aibox_request_duration_seconds metric for the /retrieve or /retrieve_pack endpoints is elevated, or API calls are timing out.

Triage & Recovery Checklist¶

Check AI-Box Service Logs:
- Action: docker compose logs -f ai-box.
- Look for: Any obvious errors, warnings, or timeouts in the application logs.
Check OpenSearch Cluster Health:
- Action: High search latency in the AI-Box is almost always caused by high search latency in OpenSearch. Check the OpenSearch cluster's CPU, memory, and query performance via your monitoring dashboards.
- Verify: curl http://localhost:9200/_cluster/health?pretty.
Inspect Query Diagnostics:
- Action: Re-run a slow query and inspect the diagnostics object in the JSON response.
- Analyze: The bm25_ms and knn_ms fields will tell you exactly which part of the hybrid search is slow.
Check for Expensive Queries:
- A very broad query, a query with complex filters, or a very high k value can cause high latency. Review the slow query for any obvious issues.

Known Errors & Quick Fixes¶

`parsing_exception: Unknown key for a START_OBJECT in [knn]`¶

Use the query-form for kNN within query:

curl -s -XPOST localhost:9200/news_docs/_search -H 'content-type: application/json' -d '{
  "size": 2,
  "query": {
    "knn": {
      "embedding": { "vector": [0.1,0.2,0.3,0.4], "k": 2 }
    }
  }
}'

`zero vector is not supported when space type is [cosinesimil]`¶

Ensure non-zero unit vectors when testing kNN queries. Generate one quickly:

python - <<'PY'
import random, math, json; random.seed(42)
v=[random.random() for _ in range(4)]
n=math.sqrt(sum(x*x for x in v)); print(json.dumps([x/n for x in v]))
PY

`pipeline with id [text_embed_news] does not exist`¶

Either remove the pipeline parameter from the indexing call or create a stub pipeline before indexing.

`strict_dynamic_mapping_exception`¶

The index uses dynamic: "strict". Add fields via PUT _mapping or prefer enriching at read-time through /retrieve_pack hydration instead of _update.

502 "Search backend error"¶

Verify OpenSearch health: curl -s localhost:9200/_cluster/health
Confirm env: OS_INDEX, VECTOR_FIELD match mapping; list indices: curl -s localhost:9200/_cat/indices
Run a minimal _search to validate the index responds.

Metrics quick queries¶

RRF time (avg): rate(aibox_retrieval_rrf_ms_sum[5m]) / rate(aibox_retrieval_rrf_ms_count[5m])
Request latency p95: histogram_quantile(0.95, rate(aibox_request_duration_seconds_bucket[5m]))
Error rate: rate(aibox_requests_total{status=~"5.."}[5m])