Setting up monitoring sounds simple — pick a tool, enter your URL, done. But monitoring configured carelessly is almost as useless as no monitoring at all. You'll get false alarms at 3 AM, miss real failures, and eventually stop trusting your alerts. This guide walks through setting up monitoring properly — deciding what to monitor, configuring it correctly, setting up meaningful alerts, and creating a status page.

Step 1: Decide What to Monitor

Before configuring anything, map out what matters. For a typical web application:

Critical paths (must monitor):

Homepage / landing page
Login / authentication endpoint
Core product functionality (checkout, search, main dashboard)
Payment processing API
Primary API endpoints your customers rely on

Important but not critical (should monitor):

Admin dashboard
API documentation
Email/webhook delivery
Background job health

Nice to have (optional):

Marketing pages
Secondary APIs
Analytics endpoints

Prioritize critical paths first. It's better to monitor 5 things well than 50 things poorly.

# Monitoring priority list example
critical:
  - url: "https://app.example.com"
    name: "Main Application"
    interval: 60s
    
  - url: "https://api.example.com/health"
    name: "API Health"
    interval: 60s
    
  - url: "https://api.example.com/auth/login"
    name: "Authentication"
    method: POST
    interval: 60s
    
  - url: "https://api.example.com/checkout"
    name: "Checkout API"
    method: POST
    interval: 60s

important:
  - url: "https://app.example.com/dashboard"
    name: "Dashboard"
    interval: 3min
    
  - url: "https://api.example.com/users"
    name: "Users API"
    interval: 5min

Step 2: Configure Your First Monitor

Let's start with the most common case — an HTTP/HTTPS monitor:

Basic HTTP Monitor Setup

In AzMonitor, create your first monitor:

Click Add Monitor
Select HTTP/HTTPS
Configure:

name: "Main Website"
url: "https://example.com"
method: GET
interval: 60  # Check every minute
timeout: 30000  # 30 second timeout

Add Response Assertions

Don't just check that the server responds — verify it responds correctly:

assertions:
  # Must return 200 OK
  - type: status_code
    operator: equals
    value: 200
    
  # Must respond within 3 seconds
  - type: response_time
    operator: less_than
    value: 3000
    
  # Must contain expected content (catches broken deploys)
  - type: response_body
    operator: contains
    value: "Example Company"

The content assertion is particularly valuable — it catches scenarios where your server returns 200 but with an error page or completely wrong content.

Configure Monitoring Regions

Always use multiple regions to avoid false positives:

regions:
  - us-east-1
  - eu-west-1
  - ap-southeast-1
  
# Alert only when 2+ regions confirm the failure
alert_after_regions: 2

A single-region check will page you when the monitoring provider has a network hiccup. Multi-region checks only alert on real failures.

Step 3: Configure API Monitors

For API endpoints, add more sophisticated checks:

monitor:
  name: "User API - Get User"
  url: "https://api.example.com/v1/users/{{TEST_USER_ID}}"
  method: GET
  headers:
    Authorization: "Bearer {{API_MONITOR_TOKEN}}"
    Accept: "application/json"
  assertions:
    - type: status_code
      value: 200
    - type: json_path
      path: "$.id"
      operator: exists
    - type: json_path
      path: "$.email"
      operator: is_string
    - type: response_time
      operator: lt
      value: 1000

monitor:
  name: "Auth API - Token Issuance"
  url: "https://api.example.com/auth/token"
  method: POST
  headers:
    Content-Type: "application/json"
  body: |
    {
      "client_id": "{{MONITOR_CLIENT_ID}}",
      "client_secret": "{{MONITOR_CLIENT_SECRET}}",
      "grant_type": "client_credentials"
    }
  assertions:
    - type: status_code
      value: 200
    - type: json_path
      path: "$.access_token"
      operator: exists
    - type: json_path
      path: "$.expires_in"
      operator: greater_than
      value: 0

Step 4: Set Up SSL Certificate Monitoring

SSL expiry is a predictable failure mode that always seems to sneak up on teams. Set up SSL monitoring:

ssl_monitor:
  name: "Main Domain SSL"
  hostname: "example.com"
  port: 443
  alerts:
    - days_before_expiry: 30
      severity: warning
      channels: ["slack-ops"]
    - days_before_expiry: 7
      severity: critical
      channels: ["pagerduty", "slack-ops", "email-ops"]

Add SSL monitors for every custom domain:

Your main domain
API subdomain
Dashboard subdomain
Any customer-facing subdomains

Step 5: Configure Alert Channels

Alerts mean nothing if they don't reach the right people through the right channels.

Slack Integration

# Get your Slack webhook URL
# Slack App > Incoming Webhooks > Add New Webhook

# Test the webhook
curl -X POST YOUR_WEBHOOK_URL \
  -H "Content-Type: application/json" \
  -d '{"text": "Test alert from AzMonitor"}'

Configure different Slack channels for different severity levels:

#incidents — P1 critical failures
#alerts — P2/P3 degradations
#monitoring — P4 informational

Email Alerts

Add the on-call team email list, not individual accounts. Team email lists ensure alerts reach someone even during vacations.

PagerDuty Integration

For P1 alerts that require 24/7 response:

integrations:
  pagerduty:
    integration_key: "YOUR_PAGERDUTY_INTEGRATION_KEY"
    severity_mapping:
      critical: P1   # Triggers immediately
      warning: P2    # Sends push notification
      info: P3       # Logs to PagerDuty only

Step 6: Set Alert Thresholds Appropriately

The most common monitoring mistake is setting thresholds that generate constant noise:

# Bad: Too sensitive, will generate false positives
alerts:
  - condition: "response_time > 500ms"
    severity: critical

# Good: Accounts for normal variation, requires persistence
alerts:
  - condition: "response_time > 2000ms for 3 consecutive checks"
    severity: warning
    
  - condition: "availability < 95% in last 5 checks"
    severity: critical

Consecutive failure requirements dramatically reduce false positives. Require 2-3 consecutive failures before alerting for non-critical monitors.

Starting Threshold Values

Until you have baseline data, start with these conservative thresholds and refine:

| Monitor Type | Timeout | Warning | Critical | |---|---|---|---| | Homepage | 30s | > 3s for 3 checks | Down for 2 checks | | API endpoint | 10s | > 1s for 3 checks | Down for 2 checks | | Auth endpoint | 5s | > 500ms for 5 checks | Down for 2 checks | | Payment API | 5s | > 2s for 3 checks | Down for 2 checks |

Step 7: Create a Status Page

Even if you only have 5 customers, a status page is worth setting up. It provides a canonical place for status information and reduces support ticket volume during incidents.

status_page:
  name: "Example Status"
  domain: "status.example.com"
  
  components:
    - name: "Website"
      monitors: ["main-website"]
      
    - name: "API"
      monitors: ["api-health", "user-api"]
      
    - name: "Authentication"
      monitors: ["auth-api"]
      
    - name: "Payment Processing"
      monitors: ["payment-api"]
      
  settings:
    show_uptime_history: true
    history_days: 90
    enable_subscriber_notifications: true
    custom_css: false

Step 8: Test Your Setup

Before trusting your monitoring, verify it works:

# Test 1: Verify the monitor is running
# Check your monitoring dashboard - should show last check time within 2 minutes

# Test 2: Simulate a failure
# Temporarily block your server's response (or use a non-existent URL)
# Verify you receive an alert within expected time window

# Test 3: Verify alert resolution
# Unblock your server
# Verify you receive an "all clear" alert

# Test 4: Test each notification channel
# Click "Send test alert" for each configured channel
# Verify Slack, email, and PagerDuty receive the test

Step 9: Document Your Monitoring Setup

Create a monitoring runbook that your entire team can reference:

# Monitoring Runbook

## Where to Find Monitoring
Dashboard: https://app.azmonitor.com/dashboard
Status page: https://status.example.com

## Alert Channels
P1 incidents: PagerDuty (Engineering team) + #incidents Slack
P2 degradations: #alerts Slack channel
P3 informational: #monitoring Slack channel

## What Each Monitor Watches
- Main Website: homepage load time and content
- API Health: /health endpoint availability
- Authentication: login API availability  
- Payment API: checkout endpoint
- SSL - example.com: certificate expiry

## Alert Response Guide
P1 (Critical): Page on-call, update status page, begin incident response
P2 (Warning): Investigate during business hours, update in #alerts
P3 (Info): Note in next standup, create ticket if persistent

## Maintenance Windows
Create maintenance window before: deployments, database migrations, infra changes
Location: AzMonitor > Settings > Maintenance Windows

Step 10: Review and Improve

Monitoring is never "done." Schedule a quarterly review:

Which alerts fired in the last quarter?
Which were false positives? (Tune thresholds)
Which incidents weren't caught by monitoring? (Add coverage)
Which monitors haven't fired in 6 months? (Remove or reconsider)

# Monitoring health check script
def quarterly_monitoring_review(monitoring_client):
    """Run quarterly monitoring review checks"""
    
    issues = []
    
    # Find monitors that never alerted (might not be testing the right thing)
    silent_monitors = monitoring_client.get_monitors_with_no_alerts(days=90)
    if silent_monitors:
        issues.append(f"Monitors with no alerts in 90 days: {[m.name for m in silent_monitors]}")
    
    # Find monitors with high false positive rates
    noisy_monitors = monitoring_client.get_monitors_with_high_auto_resolve_rate(threshold=0.5)
    if noisy_monitors:
        issues.append(f"Noisy monitors (>50% auto-resolve): {[m.name for m in noisy_monitors]}")
    
    # Find services without monitoring
    all_services = get_all_services()
    monitored_domains = {m.get_domain() for m in monitoring_client.get_monitors()}
    unmonitored = [s for s in all_services if s.domain not in monitored_domains]
    if unmonitored:
        issues.append(f"Services without monitoring: {[s.name for s in unmonitored]}")
    
    return issues

Conclusion

Good monitoring is a practice, not a one-time setup. Start with your critical paths, configure multi-region checks with meaningful assertions, route alerts appropriately, and review regularly. AzMonitor makes the setup process straightforward with sensible defaults that get you from zero to production-ready monitoring in under 30 minutes — and the review and improvement cycle that follows turns that initial setup into increasingly reliable coverage over time.

Tags:monitoring setupgetting starteduptime monitoringwebsite monitoring

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

How to Set Up Website Monitoring: A Step-by-Step Guide