title: Runbook: Reranker Timeout description: Diagnose and mitigate slow cross-encoder reranking in AI-Box. icon: material/timer-sand
Reranker Timeout¶
Impact: High — severe latency & timeouts
If reranking dominates total time (rerank_ms high), end-user latency and timeouts follow.
Triage (≤5 minutes)¶
- Confirm reranker is on
- Inspect diagnostics
- Check container CPU/mem
Mitigation¶
Immediate¶
- Disable reranker quickly:
Or per-request: call
/retrievewith"rerank": false.
Short-term tuning¶
- Limit candidates sent to reranker (e.g., top 50 after RRF).
- Use a lighter model (
RERANK_MODEL_ID) or ONNX/quantized variant. - Pin CPU threads or set an explicit timeout in code that skips rerank if it exceeds the budget, returning fused results instead.
Long-term¶
- Add circuit breaker around reranker.
- Consider bi-encoder reranking (faster) or learning-to-rank with cached features.
Verify¶
- After mitigation,
/retrievediagnostics should showrerank_ms: 0.0(if disabled) or substantially lower than pre-incident.