Runbook: Source Rate-Limiting (HTTP 429)¶

Impact: Data Latency & Potential Blocking

This alert fires when a specific data source is rate-limiting our requests. If not addressed, this can lead to significant data delays from that source and could escalate to a temporary or permanent IP block.

Triage Checklist (5 Minutes)¶

Your immediate goal is to identify the scope of the rate-limiting and assess the immediate impact.

Identify the Failing Source(s): Check the service logs to identify which domain(s) are returning 429 status codes.
```
docker compose logs --tail=200 scraper | grep "429"
```
Check Backoff Metrics: Examine the Prometheus metrics to see if the built-in backoff mechanism is handling the issue. A continuously rising count indicates the default retry logic is insufficient.
```
# Hypothetical PromQL query
rate(scraper_http_retries_total{status_code="429"}[5m])
```

Inspect Retry-After Header: Check the logs for a Retry-After header from the source. This is a clear directive that we must respect.

# Look for log entries like: "Received 429 from example.com, respecting Retry-After header of 60 seconds"
docker compose logs scraper | grep "Retry-After"

Remediation Steps¶

Follow these steps to mitigate the issue. Start with the least impactful change.

Step 1: Temporarily Disable the Profile¶

If a single profile is causing a high volume of 429 errors, the safest immediate action is to disable it. This stops all requests to the source and allows the situation to cool down.

Edit the Profile: Open the relevant JSON file in scraper/profiles/ (e.g., scraper/profiles/problem-source.json).

Set enabled to false:

scraper/profiles/problem-source.json

{
  "name": "problem-source",
  "enabled": false, // <-- Change this to false
  ...
}

Reload Profiles: Apply the change without restarting the service by calling the reload endpoint.
```
curl -X POST http://localhost:9001/profiles/reload
```

Step 2: Adjust Scraping Frequency¶

If the source is critical, a less drastic measure is to reduce the scraping frequency.

Edit the Profile's schedule: In the profile's JSON file, change the cron string to be less frequent. For example, change from every 5 minutes to every 30 minutes.
scraper/profiles/problem-source.json
```
{
  ...
  "schedule": "*/30 * * * *", // Changed from "*/5 * * * *"
  ...
}
```
Reload Profiles: Apply the change by calling the reload endpoint.
```
curl -X POST http://localhost:9001/profiles/reload
```

Step 3: Long-Term Fix (Post-Incident)¶

To prevent recurrence, a permanent change to the provider or profile may be needed.

Prefer RSS/API: If the profile uses a generic HTML scraper, investigate if the source provides an RSS feed or a public API. These are almost always more efficient and less likely to be rate-limited.
Implement Provider-Specific Backoff: For high-value sources, consider creating a dedicated provider class that implements more sophisticated, source-specific rate-limiting logic.