Skip to content

title: Runbook: Downstream Dependency Failure description: Diagnose and resolve failures talking to OpenSearch or Backend API (hydration). icon: material/server-network-off


Downstream Dependency Failure

Impact varies by dependency

  • OpenSearch down/retrieve, /retrieve_pack fail (critical).
  • Backend API down/retrieve_pack degrades (no hydration), /retrieve is fine.

Triage (≤5 minutes)

  1. Scan logs for connection errors

    docker compose logs --tail=200 ai-box | grep -E "ConnectionError|RequestException|502 Bad Gateway"
    

  2. Manual connectivity from container

    docker compose exec ai-box bash -lc 'curl -s http://opensearch:9200 || true'
    docker compose exec ai-box bash -lc 'curl -s ${API_REST_BASE:-http://api:8080}/health || true'
    

  3. Resolve name/DNS issues

    docker compose exec ai-box bash -lc 'getent hosts opensearch || nslookup opensearch || true'
    

  4. Validate env

    docker compose exec ai-box printenv | egrep 'OPENSEARCH_URL|API_REST_BASE|OS_INDEX|VECTOR_FIELD'
    

Remediation

  • Fix OS service first (check its runbook). AI-Box cannot serve search without it.
  • Verify OPENSEARCH_URL points at the right host/port.
  • AI-Box should still serve; /retrieve_pack returns null metadata fields.
  • Verify API_REST_BASE and token; consider turning off hydration temporarily.

Post-incident

  • Add quick dep probes to /health (cached) to surface degraded state.
  • Implement a circuit breaker for hydration to reduce noise when API is down.