On-Call Management

Alert Routing: Sending the Right Alerts to the Right People

Design effective alert routing that sends critical alerts to on-call engineers, business alerts to stakeholders, and operational alerts to teams — without noise.

AzMonitor TeamJune 25, 20257 min read · 1,213 wordsUpdated January 20, 2026
alert routingmonitoringon-callPagerDuty

Alert routing is the difference between a focused, actionable notification and an inbox full of noise that everyone ignores. When every alert goes to the same channel with the same urgency, you've essentially built no routing at all. When routing is thoughtful — right alert, right person, right channel, right time — your team responds faster and burns out less.

The Routing Decision Matrix

Every alert should be routed based on four factors:

  1. Severity — How urgent is this?
  2. Service owner — Which team is responsible?
  3. Impact — Who is affected and how broadly?
  4. Time — Is this during business hours or off-hours?

These four factors determine: who gets notified, via what channel, with what urgency.

IF severity = P1:
  → Page primary on-call (phone + SMS)
  → Notify #incidents Slack channel
  → If related to payments → also notify payments team lead

IF severity = P2 AND time = business hours:
  → Push notification to on-call
  → Notify service owner's Slack channel

IF severity = P2 AND time = off-hours:
  → Push notification to on-call
  → No additional notifications (morning review)

IF severity = P3:
  → Slack notification to relevant team channel
  → Create monitoring ticket for weekly review

IF severity = P4:
  → Log to monitoring dashboard only

Routing by Service Ownership

Different services belong to different teams. Route alerts to service owners, not a generic on-call pool:

# Alert routing configuration
routing:
  rules:
    - match:
        service: "payment-service"
      route_to:
        escalation_policy: "payments-team-escalation"
        slack_channel: "#payments-alerts"
        
    - match:
        service: "user-service"
      route_to:
        escalation_policy: "platform-team-escalation"
        slack_channel: "#platform-alerts"
        
    - match:
        service: "infrastructure"
        category: ["ssl", "dns", "network"]
      route_to:
        escalation_policy: "sre-escalation"
        slack_channel: "#sre-alerts"
        
    # Default: route to general on-call
    - match:
        service: "*"
      route_to:
        escalation_policy: "default-oncall"
        slack_channel: "#engineering-alerts"

Multi-Channel Alert Strategies

The same alert might need to go to multiple channels simultaneously:

# Multi-channel routing for critical alerts
alert:
  name: "Checkout Completely Down"
  severity: P1
  routing:
    # Technical team
    - channel: pagerduty
      escalation_policy: engineering_p1
      
    # Customer-facing update
    - channel: statuspage
      action: create_incident
      component: "checkout"
      status: "major_outage"
      
    # Internal stakeholders
    - channel: slack
      workspace: internal
      channel_name: "#incidents"
      message_template: "P1 INCIDENT: Checkout is down. On-call: {{oncall_name}}"
      
    # Support team awareness
    - channel: slack
      workspace: internal
      channel_name: "#support-escalation"
      message_template: |
        Support heads up: Checkout is currently down.
        Users cannot complete purchases.
        Workaround: None currently available.
        Dashboard: {{dashboard_link}}

Time-Based Routing

Off-hours routing should be different from business-hours routing:

from datetime import datetime
import pytz

def route_alert(alert, timezone="America/New_York"):
    """
    Route alert based on current time in configured timezone.
    """
    tz = pytz.timezone(timezone)
    local_time = datetime.now(tz)
    
    hour = local_time.hour
    weekday = local_time.weekday()  # 0=Monday, 6=Sunday
    
    is_business_hours = (
        weekday < 5 and  # Monday-Friday
        9 <= hour < 18   # 09:00-18:00 local time
    )
    
    if alert.severity == "P1":
        # P1 always pages regardless of time
        return {
            "pagerduty": True,
            "slack": True,
            "email": True,
            "sms": True
        }
    
    elif alert.severity == "P2":
        if is_business_hours:
            return {
                "pagerduty": True,     # Push notification
                "slack": True,
                "email": False,
                "sms": False
            }
        else:
            return {
                "pagerduty": True,     # Still page, but lower urgency
                "slack": True,
                "email": False,
                "sms": False
            }
    
    elif alert.severity == "P3":
        return {
            "pagerduty": False,
            "slack": True,            # Slack during any hours
            "email": False if not is_business_hours else True,
            "sms": False
        }
    
    else:  # P4
        return {
            "pagerduty": False,
            "slack": is_business_hours,  # Slack only during business hours
            "email": False,
            "sms": False
        }

Slack Alert Routing

Slack is the primary routing target for most alerts. Design your channel structure to support good routing:

Engineering Slack channels:
  #incidents          — Active P1/P2 incidents (high-urgency, all hands)
  #alerts             — P2/P3 alerts for current on-call (medium urgency)
  #payments-alerts    — Payment service alerts (payments team)
  #platform-alerts    — Infrastructure alerts (platform team)
  #sre-alerts         — SRE team alerts (certificates, DNS, etc)
  #monitoring-digest  — P4 informational alerts (low urgency, daily digest)

Route alerts to appropriate channels:

// Slack webhook routing
async function routeAlertToSlack(alert) {
  let channel;
  let color;
  
  // Determine channel based on service and severity
  if (alert.severity === 'P1') {
    channel = '#incidents';
    color = '#FF0000'; // Red
  } else if (alert.service === 'payment-service') {
    channel = '#payments-alerts';
    color = '#FF9900'; // Orange
  } else if (['ssl', 'dns', 'network'].includes(alert.category)) {
    channel = '#sre-alerts';
    color = '#FF9900';
  } else if (alert.severity === 'P2') {
    channel = '#alerts';
    color = '#FF9900';
  } else {
    channel = '#monitoring-digest';
    color = '#36A64F'; // Green
  }
  
  const message = {
    channel: channel,
    attachments: [{
      color: color,
      title: `[${alert.severity}] ${alert.name}`,
      text: alert.description,
      fields: [
        { title: 'Service', value: alert.service, short: true },
        { title: 'Region', value: alert.region, short: true },
        { title: 'Status', value: alert.status, short: true },
        { title: 'Value', value: alert.current_value, short: true },
      ],
      actions: [
        { type: 'button', text: 'Acknowledge', url: alert.acknowledge_url },
        { type: 'button', text: 'Dashboard', url: alert.dashboard_url },
        { type: 'button', text: 'Runbook', url: alert.runbook_url },
      ],
      footer: `AzMonitor | ${new Date().toISOString()}`
    }]
  };
  
  await slackWebhook.send(message);
}

Deduplication and Grouping

Without deduplication, a single incident can generate dozens of alerts across multiple services. Group related alerts:

# Alert grouping configuration
grouping:
  strategy: intelligent
  
  rules:
    # Group all database-related alerts in a 10-minute window
    - pattern:
        category: database
      group_key: "database-{region}"
      window: 600  # seconds
      
    # Group service-specific alerts
    - pattern:
        service: "payment-service"
      group_key: "payment-service-{check_type}"
      window: 300
      
  # Notify once for the group, not for each individual alert
  group_notifications: true
  initial_notification_delay: 30  # Wait 30s before notifying (catches related alerts)

Maintenance Windows

Route alerts differently (or suppress them) during planned maintenance:

def should_route_alert(alert, maintenance_windows):
    """
    Check if alert should be suppressed during maintenance window.
    """
    from datetime import datetime
    
    now = datetime.utcnow()
    
    for window in maintenance_windows:
        if window.start <= now <= window.end:
            # Check if this service/check is in scope
            if window.scope == "all" or alert.service in window.scope:
                # Suppress the alert
                return False, f"Maintenance window active: {window.name}"
    
    return True, None

# Register maintenance window
def create_maintenance_window(service, start_time, end_time, reason):
    """
    Create maintenance window to suppress alerts during planned work.
    """
    window = {
        "service": service,
        "start": start_time.isoformat(),
        "end": end_time.isoformat(),
        "reason": reason,
        "created_by": current_user
    }
    
    maintenance_db.create(window)
    
    # Notify team
    slack.post(
        "#alerts",
        f"Maintenance window started: {service} "
        f"({start_time.strftime('%H:%M')} - {end_time.strftime('%H:%M')} UTC). "
        f"Reason: {reason}"
    )

Alert Routing Audit

Regularly audit your routing to ensure it's working as expected:

-- Check alert routing effectiveness
SELECT
    alert_name,
    routing_destination,
    COUNT(*) as alerts_sent,
    AVG(time_to_acknowledge_minutes) as avg_ack_time,
    COUNT(CASE WHEN acknowledged_within_sla THEN 1 END) * 100.0 
      / COUNT(*) as ack_within_sla_pct
FROM alert_routing_log
WHERE sent_at > NOW() - INTERVAL '30 days'
GROUP BY alert_name, routing_destination
ORDER BY alerts_sent DESC;

Look for:

  • Alerts with poor ack rates (wrong destination?)
  • Channels receiving too many alerts (over-routing?)
  • Services with no routing rules (coverage gaps?)

Conclusion

Good alert routing is like good traffic management — alerts flow to the right destination without congestion, and the people who receive them can actually act on them. Define routing based on severity, service ownership, time of day, and impact. Deduplicate related alerts to prevent storms. Maintain windows suppress noise during planned work. Audit routing regularly to catch gaps. When your monitoring alerts are well-routed, your on-call engineers see relevant, actionable notifications — and nothing else. AzMonitor supports flexible alert routing to email, Slack, PagerDuty, SMS, and webhooks, making it straightforward to build the routing logic your team needs.

Tags:alert routingmonitoringon-callPagerDuty
Back to blog
A
AzMonitor Team
The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.
Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →