Rate limiting is how APIs protect themselves from abuse and ensure fair usage. But from the consumer side, hitting a rate limit at 2 AM during a traffic surge is a nightmare — your service degrades, queues back up, and users experience failures that look like your bugs, not the API provider's. Monitoring your API consumption against limits is how you catch these problems before they cascade.
Understanding API Rate Limits
Rate limits come in several flavors, and each requires different monitoring:
| Limit Type | Example | Risk | |---|---|---| | Requests per second | 100 req/s | Burst traffic causes throttling | | Requests per minute | 1000 req/min | Gradual consumption hits limits | | Requests per hour | 10000 req/hour | Long-running jobs exhaust quota | | Requests per day | 50000 req/day | High-traffic days exceed daily quota | | Concurrent connections | 50 concurrent | Parallel processing hits connection limits | | Data transfer | 10GB/hour | Large payload operations hit bandwidth limits |
Most APIs enforce multiple limit types simultaneously. You might be well within your per-minute limit but hitting your per-second burst limit during a synchronous batch operation.
Reading Rate Limit Headers
Modern APIs expose current usage through response headers. Parse these in your monitoring:
def parse_rate_limit_headers(response_headers):
"""
Parse common rate limit headers from API response.
Different APIs use different header formats.
"""
# Standard headers (RFC 6585)
limits = {}
# Format: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
if 'X-RateLimit-Limit' in response_headers:
limits['limit'] = int(response_headers['X-RateLimit-Limit'])
limits['remaining'] = int(response_headers.get('X-RateLimit-Remaining', 0))
limits['reset'] = int(response_headers.get('X-RateLimit-Reset', 0))
limits['usage_pct'] = (
(limits['limit'] - limits['remaining']) / limits['limit'] * 100
)
# GitHub format
elif 'X-GitHub-RateLimit-Limit' in response_headers:
limits['limit'] = int(response_headers['X-GitHub-RateLimit-Limit'])
limits['remaining'] = int(response_headers['X-GitHub-RateLimit-Remaining'])
limits['reset'] = int(response_headers['X-GitHub-RateLimit-Reset'])
# Retry-After (used in 429 responses)
if 'Retry-After' in response_headers:
limits['retry_after_seconds'] = int(response_headers['Retry-After'])
return limits
Track these values over time to spot trends before you hit the limit:
// Track rate limit consumption
function trackRateLimitUsage(apiName, headers) {
const usage = parseRateLimitHeaders(headers);
metrics.gauge(`api.rate_limit.remaining`, usage.remaining, {
api: apiName
});
metrics.gauge(`api.rate_limit.usage_percent`, usage.usage_pct, {
api: apiName
});
// Alert if approaching limit
if (usage.usage_pct > 80) {
console.warn(`[WARN] ${apiName}: ${usage.usage_pct}% of rate limit consumed`);
}
if (usage.usage_pct > 95) {
alerting.send({
severity: 'critical',
message: `${apiName} rate limit ${usage.usage_pct}% exhausted`,
remaining: usage.remaining,
reset_in: usage.reset - Date.now() / 1000,
});
}
}
Setting Up 429 Monitoring
A 429 (Too Many Requests) response is the API telling you that you've crossed a rate limit. Monitor for these specifically:
# Alert on 429 response rates
alert:
name: "API Rate Limit Exceeded"
metric: "http_requests_total{status='429'}"
condition: "rate(http_requests_total{status='429'}[5m]) > 0"
severity: warning
message: "429 responses detected from upstream API - rate limiting in effect"
When you start seeing 429s, you're already past the prevention stage. Set up earlier warnings:
# Warn before hitting limits
alert:
name: "Rate Limit 80% Consumed"
condition: "api_rate_limit_usage_percent > 80"
severity: warning
alert:
name: "Rate Limit 95% Consumed"
condition: "api_rate_limit_usage_percent > 95"
severity: critical
Monitoring Multiple APIs
If you depend on multiple external APIs, track consumption separately for each:
| API | Daily Limit | Current Usage | Status | |---|---|---|---| | Stripe API | 100 req/s | 12 req/s avg | Healthy | | SendGrid | 100k/day | 34k used | Healthy | | Google Maps | 10k/day | 8.2k used | Warning | | Twilio SMS | 5 req/s | 4.8 req/s | Critical | | OpenAI API | 60 req/min | 45 req/min | Warning |
This dashboard view lets you triage: which API is closest to its limit right now?
Implementing Adaptive Rate Limiting
The best defense against hitting rate limits is adaptive consumption — slow down when approaching limits:
// Adaptive API client with rate limit awareness
type AdaptiveClient struct {
baseURL string
rateLimiter *rate.Limiter
currentUsage float64
mu sync.RWMutex
}
func (c *AdaptiveClient) Do(req *http.Request) (*http.Response, error) {
// Apply local rate limiting
if err := c.rateLimiter.Wait(req.Context()); err != nil {
return nil, err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return nil, err
}
// Parse and react to rate limit headers
remaining := parseRemainingFromHeaders(resp.Header)
limit := parseLimitFromHeaders(resp.Header)
if limit > 0 {
usage := float64(limit-remaining) / float64(limit)
c.mu.Lock()
c.currentUsage = usage
c.mu.Unlock()
// Slow down if approaching limit
switch {
case usage > 0.9:
c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.1))
case usage > 0.7:
c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.5))
default:
c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.8))
}
}
// Handle 429 explicitly
if resp.StatusCode == 429 {
retryAfter := parseRetryAfter(resp.Header)
return nil, &RateLimitError{RetryAfter: retryAfter}
}
return resp, nil
}
Monitoring Your Own API's Rate Limiter
If you implement rate limiting on your own API, monitor how it's performing:
# Track rate limiter effectiveness
rate_limit_metrics = {
'allowed_requests': Counter, # Requests that passed
'throttled_requests': Counter, # Requests that hit limit (429)
'throttled_by_client': Counter, # Which clients hit limits
'throttle_rate': Gauge, # Current throttle rate
}
# Alert if throttling rate is unusual
def check_throttle_anomalies(metrics):
throttle_rate = metrics.throttled / metrics.total
if throttle_rate > 0.1: # More than 10% throttled
# Could indicate a legitimate traffic surge or an abusive client
alert("High throttle rate - investigate traffic patterns")
# Check if a single client is responsible for most throttling
top_client_throttles = get_top_throttled_clients(limit=5)
if top_client_throttles[0].percentage > 0.8:
alert(f"Client {top_client_throttles[0].id} causing 80% of throttles")
Rate Limit Monitoring for Internal APIs
External API rate limits get attention, but internal APIs can also throttle. Microservice-to-microservice rate limiting should be monitored the same way:
# Internal API rate limit monitoring
monitor:
name: "User Service Rate Limit Check"
url: "http://user-service.internal/api/health"
headers:
X-Service-Name: "monitoring"
assertions:
- type: status_code
value: 200
- type: response_header
header: "X-RateLimit-Remaining"
operator: greater_than
value: "100"
When internal services start throttling each other, it usually indicates a runaway process, a misconfigured retry loop, or insufficient rate limit configuration for actual traffic levels.
Building a Rate Limit Runbook
When you hit a rate limit, your team needs a clear runbook:
# Rate Limit Response Runbook
## Immediate Actions (0-15 minutes)
1. Identify which API is throttled (check monitoring dashboard)
2. Check if 429 responses are reaching users (check error rate)
3. Enable circuit breaker if available
4. Reduce non-critical API calls (defer background jobs)
## Investigation (15-60 minutes)
1. Identify what caused the spike in API calls
- New deployment?
- Unusual traffic pattern?
- Misconfigured retry logic?
2. Check if usage is sustainable or a one-time spike
## Resolution Options
- **Short term**: Implement exponential backoff on retries
- **Medium term**: Add request queuing/batching
- **Long term**: Request limit increase from provider, or cache aggressively
Conclusion
Rate limit monitoring is about staying ahead of throttling rather than reacting to it. By tracking consumption trends, alerting at 80% usage, and implementing adaptive clients that slow down gracefully, you prevent rate limit errors from ever reaching users. AzMonitor can track API response headers and alert you when your usage patterns suggest you're approaching limits, giving your team time to respond before the throttling starts.
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →