Scraper Service Runbook¶
For On-Call Engineers
This document is the primary operational playbook for the Scraper Service. It contains standardized procedures for deployment, maintenance, and incident response. Read and execute these steps carefully.
1. Standard Deployment Process¶
Objective: To safely deploy a new version of the Scraper service to production.
This process assumes the new Docker image (labeeb/scraper:new-version) has already been built and pushed to the container registry by the CI/CD pipeline.
Deployment Checklist
-
1. Announce Deployment:
- Notify the team in the appropriate channel (e.g.,
#ops) that you are beginning a deployment.
- Notify the team in the appropriate channel (e.g.,
-
2. Place System in Maintenance (if required):
- If the deployment includes breaking changes to profiles or providers, consider pausing the scheduler via the API.
-
3. Update Service Configuration:
- Pull the latest
docker-compose.ymlor Kubernetes manifest that points to the new image version.
- Pull the latest
-
4. Perform Rolling Restart:
- Execute the rolling update command to deploy the new version with zero downtime.
-
5. Verify Deployment:
- Check that the new container is running and healthy.
- Tail the logs to ensure the service started without any fatal errors.
-
6. Announce Completion:
- Notify the team that the deployment is complete and the service is operational.
2. Incident Response Playbooks¶
This section contains step-by-step checklists for responding to common alerts and incidents.
Playbook: Ingestion Failures (Upstream API Errors)¶
- Alert Trigger:
ScraperIngestionFailureRateHigh - Symptom: Scraper logs show repeated errors when sending data to the core API (e.g.,
4xxor5xxstatus codes).
Incident Response Checklist
-
1. Acknowledge the Alert: Acknowledge the alert in your monitoring system to notify the team you are investigating.
-
2. Identify the Error: Check the Scraper logs to identify the specific error message and status code.
-
3. Triage Based on Status Code:
- Meaning: The
INGEST_TOKENis incorrect or has expired. - Action: Verify that the
INGEST_TOKENin the Scraper's environment matches the one expected by the API service. Update and restart the Scraper if necessary.
- Meaning: The API is reporting a data conflict (e.g., duplicate
external_idwith a differentcontent_hash). - Action: This is likely a data issue, not a service failure. Investigate the specific article URL in the logs. No immediate action is usually required unless all ingestions are failing with conflicts.
- Meaning: The upstream API service is unhealthy.
- Action: This is not a Scraper issue. Escalate to the team responsible for the API service. See the API Service Runbook for its troubleshooting procedures.
- Meaning: The
-
4. Resolve the Incident: Once the root cause is identified and fixed, resolve the alert in your monitoring system and document the incident.
Playbook: Profile Validation Errors¶
- Alert Trigger:
ScraperInvalidProfilesDetected(via CI/CD pipeline or log monitoring) - Symptom: Service fails to start, or logs show
invalid profiles skippedwarnings.
Incident Response Checklist
-
1. Identify the Invalid Profile: Check the service logs at startup for detailed validation errors.
> The log will specify the filename (e.g.,profiles/new-source.json) and the reason for failure (e.g.,provider: unknown_provider). -
2. Correct the Profile:
- Open the invalid JSON profile.
- Compare its structure against the official schema defined in
scraper/app/data/schemas/profile.schema.json. - Fix the error (e.g., correct a typo in the provider name, fix a data type).
-
3. Reload Profiles without Restarting:
- Use the
/profiles/reloadendpoint to apply the fix immediately.
- Use the
-
4. Verify the Fix:
- Check the logs again to confirm the profile loaded successfully.
- Call the
GET /profilesendpoint to ensure the corrected profile is now listed.
3. Routine Operations & Maintenance¶
Objective: To perform regular health checks and preventative maintenance on the Scraper service.
Weekly Maintenance Checklist
-
1. Review Log Volume:
- Check the disk space consumed by the scraper's logs and any
.jsonloutput files ifwrite_to_diskis used. - Ensure log rotation is configured correctly.
- Check the disk space consumed by the scraper's logs and any
-
2. Audit Scraping Performance:
- Review monitoring dashboards for scrape job durations. Identify any providers that are consistently slow or timing out.
- Consider disabling or refactoring poorly performing providers.
-
3. Check for New Libraries:
- Periodically check for updates to key dependencies like
FastAPI,requests, andBeautifulSoupto incorporate performance and security improvements.
- Periodically check for updates to key dependencies like