Skip to content

title: Runbook: Reranker Timeout description: Diagnose and mitigate slow cross-encoder reranking in AI-Box. icon: material/timer-sand


Reranker Timeout

Impact: High — severe latency & timeouts

If reranking dominates total time (rerank_ms high), end-user latency and timeouts follow.

Triage (≤5 minutes)

  1. Confirm reranker is on
    docker compose exec ai-box printenv | grep ENABLE_RERANK
    
  2. Inspect diagnostics
    docker compose logs --tail=200 ai-box | grep -E '"rerank_ms"'
    
  3. Check container CPU/mem
    docker stats ai-box
    

Mitigation

Immediate

  • Disable reranker quickly:
    ENABLE_RERANK=false
    
    Or per-request: call /retrieve with "rerank": false.

Short-term tuning

  • Limit candidates sent to reranker (e.g., top 50 after RRF).
  • Use a lighter model (RERANK_MODEL_ID) or ONNX/quantized variant.
  • Pin CPU threads or set an explicit timeout in code that skips rerank if it exceeds the budget, returning fused results instead.

Long-term

  • Add circuit breaker around reranker.
  • Consider bi-encoder reranking (faster) or learning-to-rank with cached features.

Verify

  • After mitigation, /retrieve diagnostics should show rerank_ms: 0.0 (if disabled) or substantially lower than pre-incident.