API Monitoring

Webhook Monitoring: Ensuring Reliable Event Delivery

Learn how to monitor webhooks for delivery failures, latency issues, and payload validation. Ensure your event-driven integrations stay reliable in production.

AzMonitor TeamFebruary 5, 20258 min read · 1,290 wordsUpdated January 20, 2026
webhook monitoringevent deliveryAPI integrationreliability

Webhooks are the silent workhorses of modern integrations. Payment confirmations, user signups, deployment triggers, CI/CD pipelines — all of them rely on HTTP callbacks that fire when something happens. The problem is that webhooks are fundamentally fire-and-forget: the sender fires an event and hopes your endpoint receives it. When something goes wrong, neither side has great visibility into what happened.

The Webhook Reliability Problem

Unlike APIs where you initiate the request, webhooks are incoming. Your endpoint has to be available exactly when the sender decides to fire. If you're down for a 2-minute deployment, you might miss a payment confirmation, a fraud alert, or a critical inventory update. Most providers retry failed deliveries, but retry windows, schedules, and behavior vary wildly:

| Provider | Retry Window | Retry Strategy | Max Retries | |---|---|---|---| | Stripe | 72 hours | Exponential backoff | 25 | | GitHub | 7 days | Fixed intervals | Several | | Twilio | 11 hours | Exponential backoff | 7 | | SendGrid | 72 hours | Exponential backoff | 10 | | Shopify | 48 hours | Exponential backoff | 19 |

Even with retries, gaps in availability cause headaches. A missed webhook from Stripe means reconciliation work. A missed GitHub webhook means a deployment didn't trigger. The stakes are real.

Monitoring Your Webhook Endpoint

The first line of defense is monitoring your webhook receiver endpoint itself:

monitor:
  name: "Stripe Webhook Receiver"
  url: "https://api.example.com/webhooks/stripe"
  method: POST
  interval: 60
  headers:
    Content-Type: "application/json"
    Stripe-Signature: "t=1234567890,v1=test_signature"
  body: |
    {
      "type": "health.check",
      "data": { "object": {} }
    }
  assertions:
    - type: status_code
      value: 200
    - type: response_time
      operator: less_than
      value: 3000

This catches basic availability issues. But webhook monitoring goes deeper than uptime.

Validating Webhook Signatures

Most providers sign their webhook payloads so you can verify authenticity. Your receiver must validate these signatures or risk processing forged events. Monitoring should verify your signature validation code is working correctly:

# Stripe signature validation example
import hmac
import hashlib
import time

def validate_stripe_signature(payload: bytes, sig_header: str, secret: str) -> bool:
    """Validate Stripe webhook signature"""
    try:
        # Parse the signature header
        # Format: t=timestamp,v1=signature
        elements = sig_header.split(',')
        timestamp = None
        signatures = []
        
        for element in elements:
            prefix, value = element.split('=', 1)
            if prefix == 't':
                timestamp = int(value)
            elif prefix == 'v1':
                signatures.append(value)
        
        if not timestamp or not signatures:
            return False
        
        # Check timestamp freshness (5 minutes)
        if abs(time.time() - timestamp) > 300:
            return False
        
        # Compute expected signature
        signed_payload = f"{timestamp}.{payload.decode('utf-8')}"
        expected = hmac.new(
            secret.encode('utf-8'),
            signed_payload.encode('utf-8'),
            hashlib.sha256
        ).hexdigest()
        
        # Compare signatures
        return any(hmac.compare_digest(expected, sig) for sig in signatures)
        
    except Exception:
        return False

Test signature validation in your monitoring by sending both valid and invalid signatures to confirm your security checks are functioning.

Tracking Webhook Delivery Metrics

Beyond endpoint uptime, track these webhook-specific metrics:

Delivery rate — What percentage of webhook events are received successfully? If your provider's dashboard shows 1000 events sent but your database shows 950 processed, you have a 5% loss rate.

Processing latency — How long from receipt to processing completion? For time-sensitive operations like fraud prevention, even 30 seconds matters.

Error rate by event type — Some event types might fail consistently due to schema mismatches or missing handlers.

Retry frequency — High retry rates from providers indicate your endpoint is returning errors more often than it should.

-- Query for webhook delivery metrics
SELECT
    event_type,
    COUNT(*) as total_received,
    COUNT(CASE WHEN processed_at IS NOT NULL THEN 1 END) as processed,
    COUNT(CASE WHEN failed_at IS NOT NULL THEN 1 END) as failed,
    AVG(EXTRACT(EPOCH FROM (processed_at - received_at))) as avg_processing_seconds,
    PERCENTILE_CONT(0.99) WITHIN GROUP (
        ORDER BY EXTRACT(EPOCH FROM (processed_at - received_at))
    ) as p99_processing_seconds
FROM webhook_events
WHERE received_at > NOW() - INTERVAL '24 hours'
GROUP BY event_type
ORDER BY total_received DESC;

Idempotent Webhook Processing

Because providers retry webhooks, your processing code must be idempotent. Processing the same event twice should have the same result as processing it once:

// Idempotent webhook handler
func (h *WebhookHandler) ProcessStripeEvent(w http.ResponseWriter, r *http.Request) {
    payload, err := io.ReadAll(r.Body)
    if err != nil {
        http.Error(w, "Bad request", http.StatusBadRequest)
        return
    }
    
    // Parse event
    event, err := webhook.ConstructEvent(payload, r.Header.Get("Stripe-Signature"), h.stripeSecret)
    if err != nil {
        http.Error(w, "Invalid signature", http.StatusUnauthorized)
        return
    }
    
    // Idempotency check
    exists, err := h.db.WebhookEventExists(event.ID)
    if err != nil {
        http.Error(w, "Internal error", http.StatusInternalServerError)
        return
    }
    if exists {
        // Already processed - return 200 so provider doesn't retry
        w.WriteHeader(http.StatusOK)
        return
    }
    
    // Store event ID first (before processing)
    if err := h.db.StoreWebhookEvent(event.ID, event.Type); err != nil {
        http.Error(w, "Storage error", http.StatusInternalServerError)
        return
    }
    
    // Process the event
    if err := h.processEvent(event); err != nil {
        h.db.MarkWebhookFailed(event.ID, err.Error())
        // Return 500 so provider retries
        http.Error(w, "Processing error", http.StatusInternalServerError)
        return
    }
    
    h.db.MarkWebhookProcessed(event.ID)
    w.WriteHeader(http.StatusOK)
}

Webhook Delay Monitoring

Webhooks should arrive shortly after the triggering event. If there's a significant delay between when an event occurs and when your endpoint receives it, that's a sign of provider-side queuing issues or network problems.

# Monitor webhook delay
def check_webhook_delay(event_data):
    """Alert if webhook arrives more than 5 minutes after event creation"""
    event_created = datetime.fromtimestamp(event_data['created'])
    received_at = datetime.utcnow()
    delay = (received_at - event_created).total_seconds()
    
    metrics.gauge('webhook.delivery_delay_seconds', delay, {
        'provider': event_data['provider'],
        'event_type': event_data['type'],
    })
    
    if delay > 300:  # 5 minutes
        alert(
            f"Webhook delivered {delay}s after creation",
            severity='warning',
            context={'event_id': event_data['id']}
        )

Building a Webhook Delivery Dashboard

A useful webhook monitoring dashboard shows:

┌─────────────────────────────────────────┐
│ Webhook Health Dashboard                │
├─────────────┬─────────────┬─────────────┤
│ Received    │ Processed   │ Failed      │
│ 1,247       │ 1,241       │ 6           │
│ (last 24h)  │ (99.5%)     │ (0.5%)      │
├─────────────┴─────────────┴─────────────┤
│ By Event Type (last 24h)               │
│ payment.succeeded    │ 892  │ 0 failed  │
│ payment.failed       │ 124  │ 2 failed  │
│ subscription.updated │ 187  │ 4 failed  │
│ customer.created     │  44  │ 0 failed  │
├─────────────────────────────────────────┤
│ Avg Processing Time: 145ms              │
│ P99 Processing Time: 892ms              │
│ Oldest Unprocessed: 3 min ago           │
└─────────────────────────────────────────┘

Testing Webhook Integrations

Use tools like Stripe CLI, ngrok, or webhook.site to test your endpoint during development:

# Stripe CLI - forward events to local server
stripe listen --forward-to localhost:3000/webhooks/stripe

# In another terminal - trigger a test event
stripe trigger payment_intent.succeeded

# Check your local server received and processed it

For staging environments, automate webhook testing as part of your CI pipeline:

# CI webhook integration test
- name: Test Stripe Webhook Handler
  run: |
    # Start local server
    ./server &
    SERVER_PID=$!
    
    # Send test webhook
    stripe trigger payment_intent.succeeded \
      --stripe-account $STRIPE_ACCOUNT_ID
    
    # Verify it was processed
    sleep 2
    RESULT=$(curl -s http://localhost:8080/health/last-webhook)
    echo $RESULT | jq '.processed' | grep -q 'true'
    
    kill $SERVER_PID

Alerting on Webhook Failures

Configure alerts for webhook-specific failure patterns:

alerts:
  - name: "Webhook Failure Rate High"
    condition: "webhook_failure_rate > 5% for 10 minutes"
    severity: critical
    message: "More than 5% of incoming webhooks are failing to process"
    
  - name: "Webhook Queue Growing"
    condition: "unprocessed_webhooks > 100"
    severity: warning
    message: "Webhook processing is falling behind"
    
  - name: "Large Webhook Delay"
    condition: "webhook_delivery_delay_p95 > 300s"
    severity: warning
    message: "Webhooks arriving significantly after event creation"
    
  - name: "Webhook Endpoint Down"
    condition: "webhook_endpoint_available = false"
    severity: critical
    message: "Webhook receiver endpoint is not responding"

Conclusion

Webhook monitoring is a critical but often overlooked part of integration reliability. Beyond just keeping your endpoint available, you need visibility into delivery rates, processing success, idempotency behavior, and signature validation. Teams that invest in webhook observability catch integration failures within minutes instead of discovering them through customer complaints. AzMonitor can monitor your webhook endpoints for availability and response time as part of your broader API monitoring strategy, helping ensure that when providers send events, your infrastructure is ready to receive them.

Tags:webhook monitoringevent deliveryAPI integrationreliability
Back to blog
A
AzMonitor Team
The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.
Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →