Rate limiting is how APIs protect themselves from abuse and ensure fair usage. But from the consumer side, hitting a rate limit at 2 AM during a traffic surge is a nightmare — your service degrades, queues back up, and users experience failures that look like your bugs, not the API provider's. Monitoring your API consumption against limits is how you catch these problems before they cascade.

Understanding API Rate Limits

Rate limits come in several flavors, and each requires different monitoring:

| Limit Type | Example | Risk | |---|---|---| | Requests per second | 100 req/s | Burst traffic causes throttling | | Requests per minute | 1000 req/min | Gradual consumption hits limits | | Requests per hour | 10000 req/hour | Long-running jobs exhaust quota | | Requests per day | 50000 req/day | High-traffic days exceed daily quota | | Concurrent connections | 50 concurrent | Parallel processing hits connection limits | | Data transfer | 10GB/hour | Large payload operations hit bandwidth limits |

Most APIs enforce multiple limit types simultaneously. You might be well within your per-minute limit but hitting your per-second burst limit during a synchronous batch operation.

Reading Rate Limit Headers

Modern APIs expose current usage through response headers. Parse these in your monitoring:

def parse_rate_limit_headers(response_headers):
    """
    Parse common rate limit headers from API response.
    Different APIs use different header formats.
    """
    # Standard headers (RFC 6585)
    limits = {}
    
    # Format: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
    if 'X-RateLimit-Limit' in response_headers:
        limits['limit'] = int(response_headers['X-RateLimit-Limit'])
        limits['remaining'] = int(response_headers.get('X-RateLimit-Remaining', 0))
        limits['reset'] = int(response_headers.get('X-RateLimit-Reset', 0))
        limits['usage_pct'] = (
            (limits['limit'] - limits['remaining']) / limits['limit'] * 100
        )
    
    # GitHub format
    elif 'X-GitHub-RateLimit-Limit' in response_headers:
        limits['limit'] = int(response_headers['X-GitHub-RateLimit-Limit'])
        limits['remaining'] = int(response_headers['X-GitHub-RateLimit-Remaining'])
        limits['reset'] = int(response_headers['X-GitHub-RateLimit-Reset'])
    
    # Retry-After (used in 429 responses)
    if 'Retry-After' in response_headers:
        limits['retry_after_seconds'] = int(response_headers['Retry-After'])
    
    return limits

Track these values over time to spot trends before you hit the limit:

// Track rate limit consumption
function trackRateLimitUsage(apiName, headers) {
  const usage = parseRateLimitHeaders(headers);
  
  metrics.gauge(`api.rate_limit.remaining`, usage.remaining, {
    api: apiName
  });
  
  metrics.gauge(`api.rate_limit.usage_percent`, usage.usage_pct, {
    api: apiName
  });
  
  // Alert if approaching limit
  if (usage.usage_pct > 80) {
    console.warn(`[WARN] ${apiName}: ${usage.usage_pct}% of rate limit consumed`);
  }
  
  if (usage.usage_pct > 95) {
    alerting.send({
      severity: 'critical',
      message: `${apiName} rate limit ${usage.usage_pct}% exhausted`,
      remaining: usage.remaining,
      reset_in: usage.reset - Date.now() / 1000,
    });
  }
}

Setting Up 429 Monitoring

A 429 (Too Many Requests) response is the API telling you that you've crossed a rate limit. Monitor for these specifically:

# Alert on 429 response rates
alert:
  name: "API Rate Limit Exceeded"
  metric: "http_requests_total{status='429'}"
  condition: "rate(http_requests_total{status='429'}[5m]) > 0"
  severity: warning
  message: "429 responses detected from upstream API - rate limiting in effect"

When you start seeing 429s, you're already past the prevention stage. Set up earlier warnings:

# Warn before hitting limits
alert:
  name: "Rate Limit 80% Consumed"
  condition: "api_rate_limit_usage_percent > 80"
  severity: warning
  
alert:
  name: "Rate Limit 95% Consumed"  
  condition: "api_rate_limit_usage_percent > 95"
  severity: critical

Monitoring Multiple APIs

If you depend on multiple external APIs, track consumption separately for each:

| API | Daily Limit | Current Usage | Status | |---|---|---|---| | Stripe API | 100 req/s | 12 req/s avg | Healthy | | SendGrid | 100k/day | 34k used | Healthy | | Google Maps | 10k/day | 8.2k used | Warning | | Twilio SMS | 5 req/s | 4.8 req/s | Critical | | OpenAI API | 60 req/min | 45 req/min | Warning |

This dashboard view lets you triage: which API is closest to its limit right now?

Implementing Adaptive Rate Limiting

The best defense against hitting rate limits is adaptive consumption — slow down when approaching limits:

// Adaptive API client with rate limit awareness
type AdaptiveClient struct {
    baseURL      string
    rateLimiter  *rate.Limiter
    currentUsage float64
    mu           sync.RWMutex
}

func (c *AdaptiveClient) Do(req *http.Request) (*http.Response, error) {
    // Apply local rate limiting
    if err := c.rateLimiter.Wait(req.Context()); err != nil {
        return nil, err
    }
    
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return nil, err
    }
    
    // Parse and react to rate limit headers
    remaining := parseRemainingFromHeaders(resp.Header)
    limit := parseLimitFromHeaders(resp.Header)
    
    if limit > 0 {
        usage := float64(limit-remaining) / float64(limit)
        c.mu.Lock()
        c.currentUsage = usage
        c.mu.Unlock()
        
        // Slow down if approaching limit
        switch {
        case usage > 0.9:
            c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.1))
        case usage > 0.7:
            c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.5))
        default:
            c.rateLimiter.SetLimit(rate.Limit(float64(limit) * 0.8))
        }
    }
    
    // Handle 429 explicitly
    if resp.StatusCode == 429 {
        retryAfter := parseRetryAfter(resp.Header)
        return nil, &RateLimitError{RetryAfter: retryAfter}
    }
    
    return resp, nil
}

Monitoring Your Own API's Rate Limiter

If you implement rate limiting on your own API, monitor how it's performing:

# Track rate limiter effectiveness
rate_limit_metrics = {
    'allowed_requests': Counter,     # Requests that passed
    'throttled_requests': Counter,   # Requests that hit limit (429)
    'throttled_by_client': Counter,  # Which clients hit limits
    'throttle_rate': Gauge,          # Current throttle rate
}

# Alert if throttling rate is unusual
def check_throttle_anomalies(metrics):
    throttle_rate = metrics.throttled / metrics.total
    
    if throttle_rate > 0.1:  # More than 10% throttled
        # Could indicate a legitimate traffic surge or an abusive client
        alert("High throttle rate - investigate traffic patterns")
    
    # Check if a single client is responsible for most throttling
    top_client_throttles = get_top_throttled_clients(limit=5)
    if top_client_throttles[0].percentage > 0.8:
        alert(f"Client {top_client_throttles[0].id} causing 80% of throttles")

Rate Limit Monitoring for Internal APIs

External API rate limits get attention, but internal APIs can also throttle. Microservice-to-microservice rate limiting should be monitored the same way:

# Internal API rate limit monitoring
monitor:
  name: "User Service Rate Limit Check"
  url: "http://user-service.internal/api/health"
  headers:
    X-Service-Name: "monitoring"
  assertions:
    - type: status_code
      value: 200
    - type: response_header
      header: "X-RateLimit-Remaining"
      operator: greater_than
      value: "100"

When internal services start throttling each other, it usually indicates a runaway process, a misconfigured retry loop, or insufficient rate limit configuration for actual traffic levels.

Building a Rate Limit Runbook

When you hit a rate limit, your team needs a clear runbook:

# Rate Limit Response Runbook

## Immediate Actions (0-15 minutes)
1. Identify which API is throttled (check monitoring dashboard)
2. Check if 429 responses are reaching users (check error rate)
3. Enable circuit breaker if available
4. Reduce non-critical API calls (defer background jobs)

## Investigation (15-60 minutes)
1. Identify what caused the spike in API calls
   - New deployment?
   - Unusual traffic pattern?
   - Misconfigured retry logic?
2. Check if usage is sustainable or a one-time spike

## Resolution Options
- **Short term**: Implement exponential backoff on retries
- **Medium term**: Add request queuing/batching
- **Long term**: Request limit increase from provider, or cache aggressively

Conclusion

Rate limit monitoring is about staying ahead of throttling rather than reacting to it. By tracking consumption trends, alerting at 80% usage, and implementing adaptive clients that slow down gracefully, you prevent rate limit errors from ever reaching users. AzMonitor can track API response headers and alert you when your usage patterns suggest you're approaching limits, giving your team time to respond before the throttling starts.

Tags:rate limitingAPI throttlingAPI monitoring429 errors

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

API Rate Limit Monitoring: Detecting Throttling Before It Breaks Your App

Understanding API Rate Limits

Reading Rate Limit Headers

Setting Up 429 Monitoring

Monitoring Multiple APIs

Implementing Adaptive Rate Limiting

Monitoring Your Own API's Rate Limiter

Rate Limit Monitoring for Internal APIs

Building a Rate Limit Runbook

Conclusion

Related articles

Uptime Monitoring for Mobile Apps and Backend APIs

Monitoring Protected Pages: Authenticated Endpoint Checks

Beyond Ping: Advanced Uptime Monitoring Techniques