Skip to content

title: Runbook: DB Connection Errors description: A playbook for diagnosing and resolving PostgreSQL connection failures in the API service. icon: material/database-alert-outline


Runbook: DB Connection Errors

Impact: Critical - Platform Outage

This alert fires when the API service cannot establish a connection to the PostgreSQL database. As the database is the central system of record, this failure will result in a total platform outage. All endpoints that rely on data persistence will fail, and no new data can be ingested.

Triage Checklist (5 Minutes)

Your immediate goal is to confirm the scope and nature of the database connection failure.

  1. Confirm the Error in Logs: Check the API service logs for specific database connection error messages. This is the fastest way to confirm the issue.

    # Look for messages like "SQLSTATE[08006]" or "could not connect to server"
    docker compose logs --tail=100 api | grep "SQLSTATE"
    

  2. Check Database Container Health: Verify that the PostgreSQL container (db) is running and healthy.

    docker compose ps db
    # Ensure the STATUS is "running" or "healthy"
    

  3. Attempt a Manual Connection: From within the API container, try to connect to the database manually using psql. This definitively isolates the issue to either the DB itself or the network path.

    # 1. Open a shell in the API container
    docker compose exec api bash
    
    # 2. Attempt to connect (password is in .env)
    psql -h db -U app -d appdb
    
    If this command fails, the problem is with the database container or the Docker network. If it succeeds, the problem is likely within the Laravel application's configuration.


Remediation Playbooks

Based on your triage, select the appropriate playbook to resolve the issue.

Symptom: The docker compose ps db command shows the container is exited or restarting.

  1. Check Database Logs for Errors: The database logs are the most likely place to find the root cause. Look for out-of-memory (OOM) errors, disk space issues, or data corruption messages.

    docker compose logs db
    

  2. Restart the Database Container: A simple restart can often resolve transient issues.

    docker compose restart db
    

  3. Increase Resources (if OOM): If the logs indicate an out-of-memory error, you must allocate more memory to the container in your docker-compose.yml or other deployment configuration.

Symptom: The database container is running, but the manual psql connection from the API container fails.

  1. Verify Environment Variables: Double-check that the DB_* variables in your .env file are correct. A typo in DB_HOST, DB_PORT, DB_USERNAME, or DB_PASSWORD is a common cause of this issue.

  2. Test Network Path: From within the API container, use ping or nc to verify that the db hostname is resolvable and the port is open.

    # From inside the API container
    ping -c 3 db
    nc -zv db 5432
    

  3. Restart the API Container: If the database was recently restarted, the API service might be holding onto a stale connection. Restarting the API container will force it to establish a new connection.

    docker compose restart api
    


Post-Incident Actions

  • Root Cause Analysis: Determine the underlying cause of the database failure. Was it a resource issue, a configuration error, or a transient network problem?
  • Improve Health Checks: Enhance the API service's /health endpoint to include a simple, cached database query (e.g., SELECT 1). This will cause the container to become unhealthy if the database is unreachable, allowing for faster automated detection.
  • Review Resource Allocation: Proactively review the memory and CPU limits for the PostgreSQL container to ensure it has sufficient resources for the production workload.