title: Runbook: High Ingestion Error Rate description: A playbook for diagnosing and resolving a high rate of 4xx errors at the article ingestion endpoint. icon: material/tray-alert
Runbook: High Ingestion Error Rate¶
Impact: High - Data Loss
This alert fires when the API service is rejecting a high percentage of incoming requests from the Scraper service. This is a critical issue, as it means that valid data is being fetched by the scraper but is not being saved to the platform, leading to silent data loss.
Triage Checklist (5 Minutes)¶
Your immediate goal is to identify the type of error and the source of the invalid data.
-
Identify the HTTP Error Code: Check the API logs to determine the specific
4xxstatus code being returned. The code will tell you the nature of the failure.422 Unprocessable Entity: The request body failed validation.401 Unauthorizedor403 Forbidden: TheINGEST_TOKENis incorrect.413 Payload Too Large: The scraper is sending a batch that exceeds the size limit.
-
Check Scraper Logs for Error Details: The scraper's logs will often contain the full error response from the API, which can include detailed validation messages.
-
Isolate the Problematic Scraper Profile: If the errors seem related to content, they are likely coming from a single, misconfigured scraper profile. The API logs may not have this context, but the scraper logs might. Correlate timestamps to identify the source.
Remediation Playbooks¶
Based on the HTTP status code you identified, select the appropriate playbook.
Symptom: The API is returning 422 Unprocessable Entity errors. This means the scraper is sending data that violates the API's contract.
-
Find the Validation Error: The API's log entry for the
422response should contain a JSON object detailing which field failed validation and why (e.g.,"title": ["The title field is required."]). -
Identify the Root Cause in the Scraper: This error is almost always caused by a bug or a recent change in a specific scraper provider. The provider is likely failing to extract a required field or is extracting data in an incorrect format.
-
Implement a Fix in the Scraper: The fix must be implemented in the relevant provider file within the
scraper/app/scraping/providers/directory. You may need to add better error handling or adjust a CSS selector. -
Deploy the Scraper Fix: A code change in the scraper requires a new Docker image to be built and deployed.
-
Temporarily Disable the Failing Profile: While a fix is being developed, the safest immediate action is to disable the failing profile in the scraper to stop it from sending invalid data. Follow the procedure in the Profile Failures Runbook.
Symptom: The API is returning 401 Unauthorized or 403 Forbidden errors.
-
Verify
INGEST_TOKEN: This error means theINGEST_TOKENenvironment variable in thescraperservice's configuration does not match the one expected by theapiservice. -
Correct the Environment Variable: Ensure the
INGEST_TOKENvalue is identical in the.envfiles for both services. -
Restart the Scraper Service: Restart the scraper container to ensure it picks up the corrected environment variable.
Symptom: The API is returning 413 Payload Too Large errors.
-
Check Scraper Batch Size: The scraper's
INGEST_BATCH_SIZEenvironment variable may be set too high. -
Reduce Batch Size: Lower the value of
INGEST_BATCH_SIZEin the scraper's.envfile and restart the service. -
Check API Limits (if necessary): If reducing the batch size is not desirable, you can increase the limits on the API side by adjusting
INGEST_MAX_BODY_BYTESin the API's.envfile, but this should be done with caution as it can impact performance.
Post-Incident Actions¶
- Strengthen Contract Testing: Implement automated contract tests between the scraper and the API to catch validation issues in CI before they reach production.
- Improve API Error Logging: Enhance the API's exception handler to always log the full validation error details, making it easier to diagnose which fields are failing.