Alert routing is the difference between a focused, actionable notification and an inbox full of noise that everyone ignores. When every alert goes to the same channel with the same urgency, you've essentially built no routing at all. When routing is thoughtful — right alert, right person, right channel, right time — your team responds faster and burns out less.
The Routing Decision Matrix
Every alert should be routed based on four factors:
- Severity — How urgent is this?
- Service owner — Which team is responsible?
- Impact — Who is affected and how broadly?
- Time — Is this during business hours or off-hours?
These four factors determine: who gets notified, via what channel, with what urgency.
IF severity = P1:
→ Page primary on-call (phone + SMS)
→ Notify #incidents Slack channel
→ If related to payments → also notify payments team lead
IF severity = P2 AND time = business hours:
→ Push notification to on-call
→ Notify service owner's Slack channel
IF severity = P2 AND time = off-hours:
→ Push notification to on-call
→ No additional notifications (morning review)
IF severity = P3:
→ Slack notification to relevant team channel
→ Create monitoring ticket for weekly review
IF severity = P4:
→ Log to monitoring dashboard only
Routing by Service Ownership
Different services belong to different teams. Route alerts to service owners, not a generic on-call pool:
# Alert routing configuration
routing:
rules:
- match:
service: "payment-service"
route_to:
escalation_policy: "payments-team-escalation"
slack_channel: "#payments-alerts"
- match:
service: "user-service"
route_to:
escalation_policy: "platform-team-escalation"
slack_channel: "#platform-alerts"
- match:
service: "infrastructure"
category: ["ssl", "dns", "network"]
route_to:
escalation_policy: "sre-escalation"
slack_channel: "#sre-alerts"
# Default: route to general on-call
- match:
service: "*"
route_to:
escalation_policy: "default-oncall"
slack_channel: "#engineering-alerts"
Multi-Channel Alert Strategies
The same alert might need to go to multiple channels simultaneously:
# Multi-channel routing for critical alerts
alert:
name: "Checkout Completely Down"
severity: P1
routing:
# Technical team
- channel: pagerduty
escalation_policy: engineering_p1
# Customer-facing update
- channel: statuspage
action: create_incident
component: "checkout"
status: "major_outage"
# Internal stakeholders
- channel: slack
workspace: internal
channel_name: "#incidents"
message_template: "P1 INCIDENT: Checkout is down. On-call: {{oncall_name}}"
# Support team awareness
- channel: slack
workspace: internal
channel_name: "#support-escalation"
message_template: |
Support heads up: Checkout is currently down.
Users cannot complete purchases.
Workaround: None currently available.
Dashboard: {{dashboard_link}}
Time-Based Routing
Off-hours routing should be different from business-hours routing:
from datetime import datetime
import pytz
def route_alert(alert, timezone="America/New_York"):
"""
Route alert based on current time in configured timezone.
"""
tz = pytz.timezone(timezone)
local_time = datetime.now(tz)
hour = local_time.hour
weekday = local_time.weekday() # 0=Monday, 6=Sunday
is_business_hours = (
weekday < 5 and # Monday-Friday
9 <= hour < 18 # 09:00-18:00 local time
)
if alert.severity == "P1":
# P1 always pages regardless of time
return {
"pagerduty": True,
"slack": True,
"email": True,
"sms": True
}
elif alert.severity == "P2":
if is_business_hours:
return {
"pagerduty": True, # Push notification
"slack": True,
"email": False,
"sms": False
}
else:
return {
"pagerduty": True, # Still page, but lower urgency
"slack": True,
"email": False,
"sms": False
}
elif alert.severity == "P3":
return {
"pagerduty": False,
"slack": True, # Slack during any hours
"email": False if not is_business_hours else True,
"sms": False
}
else: # P4
return {
"pagerduty": False,
"slack": is_business_hours, # Slack only during business hours
"email": False,
"sms": False
}
Slack Alert Routing
Slack is the primary routing target for most alerts. Design your channel structure to support good routing:
Engineering Slack channels:
#incidents — Active P1/P2 incidents (high-urgency, all hands)
#alerts — P2/P3 alerts for current on-call (medium urgency)
#payments-alerts — Payment service alerts (payments team)
#platform-alerts — Infrastructure alerts (platform team)
#sre-alerts — SRE team alerts (certificates, DNS, etc)
#monitoring-digest — P4 informational alerts (low urgency, daily digest)
Route alerts to appropriate channels:
// Slack webhook routing
async function routeAlertToSlack(alert) {
let channel;
let color;
// Determine channel based on service and severity
if (alert.severity === 'P1') {
channel = '#incidents';
color = '#FF0000'; // Red
} else if (alert.service === 'payment-service') {
channel = '#payments-alerts';
color = '#FF9900'; // Orange
} else if (['ssl', 'dns', 'network'].includes(alert.category)) {
channel = '#sre-alerts';
color = '#FF9900';
} else if (alert.severity === 'P2') {
channel = '#alerts';
color = '#FF9900';
} else {
channel = '#monitoring-digest';
color = '#36A64F'; // Green
}
const message = {
channel: channel,
attachments: [{
color: color,
title: `[${alert.severity}] ${alert.name}`,
text: alert.description,
fields: [
{ title: 'Service', value: alert.service, short: true },
{ title: 'Region', value: alert.region, short: true },
{ title: 'Status', value: alert.status, short: true },
{ title: 'Value', value: alert.current_value, short: true },
],
actions: [
{ type: 'button', text: 'Acknowledge', url: alert.acknowledge_url },
{ type: 'button', text: 'Dashboard', url: alert.dashboard_url },
{ type: 'button', text: 'Runbook', url: alert.runbook_url },
],
footer: `AzMonitor | ${new Date().toISOString()}`
}]
};
await slackWebhook.send(message);
}
Deduplication and Grouping
Without deduplication, a single incident can generate dozens of alerts across multiple services. Group related alerts:
# Alert grouping configuration
grouping:
strategy: intelligent
rules:
# Group all database-related alerts in a 10-minute window
- pattern:
category: database
group_key: "database-{region}"
window: 600 # seconds
# Group service-specific alerts
- pattern:
service: "payment-service"
group_key: "payment-service-{check_type}"
window: 300
# Notify once for the group, not for each individual alert
group_notifications: true
initial_notification_delay: 30 # Wait 30s before notifying (catches related alerts)
Maintenance Windows
Route alerts differently (or suppress them) during planned maintenance:
def should_route_alert(alert, maintenance_windows):
"""
Check if alert should be suppressed during maintenance window.
"""
from datetime import datetime
now = datetime.utcnow()
for window in maintenance_windows:
if window.start <= now <= window.end:
# Check if this service/check is in scope
if window.scope == "all" or alert.service in window.scope:
# Suppress the alert
return False, f"Maintenance window active: {window.name}"
return True, None
# Register maintenance window
def create_maintenance_window(service, start_time, end_time, reason):
"""
Create maintenance window to suppress alerts during planned work.
"""
window = {
"service": service,
"start": start_time.isoformat(),
"end": end_time.isoformat(),
"reason": reason,
"created_by": current_user
}
maintenance_db.create(window)
# Notify team
slack.post(
"#alerts",
f"Maintenance window started: {service} "
f"({start_time.strftime('%H:%M')} - {end_time.strftime('%H:%M')} UTC). "
f"Reason: {reason}"
)
Alert Routing Audit
Regularly audit your routing to ensure it's working as expected:
-- Check alert routing effectiveness
SELECT
alert_name,
routing_destination,
COUNT(*) as alerts_sent,
AVG(time_to_acknowledge_minutes) as avg_ack_time,
COUNT(CASE WHEN acknowledged_within_sla THEN 1 END) * 100.0
/ COUNT(*) as ack_within_sla_pct
FROM alert_routing_log
WHERE sent_at > NOW() - INTERVAL '30 days'
GROUP BY alert_name, routing_destination
ORDER BY alerts_sent DESC;
Look for:
- Alerts with poor ack rates (wrong destination?)
- Channels receiving too many alerts (over-routing?)
- Services with no routing rules (coverage gaps?)
Conclusion
Good alert routing is like good traffic management — alerts flow to the right destination without congestion, and the people who receive them can actually act on them. Define routing based on severity, service ownership, time of day, and impact. Deduplicate related alerts to prevent storms. Maintain windows suppress noise during planned work. Audit routing regularly to catch gaps. When your monitoring alerts are well-routed, your on-call engineers see relevant, actionable notifications — and nothing else. AzMonitor supports flexible alert routing to email, Slack, PagerDuty, SMS, and webhooks, making it straightforward to build the routing logic your team needs.
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →