DNS is the phone book of the internet, and when it has problems, nothing works. DNS failures are uniquely frustrating because they're often invisible to affected users (they just see "site not found"), they can be geographically inconsistent (working in one city, broken in another), and they frequently take engineers by surprise because DNS seems like infrastructure that should "just work." Monitoring DNS properly prevents these surprises.

Why DNS Monitoring Matters

DNS issues cause more outages than most teams realize:

Certificate renewals fail because ACME DNS challenges time out
Deployments point traffic to wrong servers after a zone file error
CDN configurations break because CNAMEs don't resolve
Email delivery fails due to SPF/DKIM/DMARC DNS record issues
Third-party integrations break when their endpoints change DNS

DNS monitoring should be a standard part of your infrastructure monitoring, not an afterthought.

Types of DNS Records to Monitor

Different record types serve different purposes and need different monitoring:

| Record Type | Purpose | Monitoring Focus | |---|---|---| | A | IPv4 address for hostname | Correct IP, TTL | | AAAA | IPv6 address for hostname | Correct IPv6, TTL | | CNAME | Alias to another hostname | Correct target, chain depth | | MX | Mail server for domain | Correct priority and server | | TXT | Text data (SPF, DKIM, verification) | Correct value, syntax | | NS | Authoritative nameservers | Correct servers, accessibility | | SOA | Zone authority information | Serial number, timing | | CAA | Certificate authority authorization | Correct CA entries |

Monitoring DNS Record Values

Verify your DNS records contain expected values:

import dns.resolver
import dns.exception

def check_dns_record(hostname, record_type, expected_value=None, expected_pattern=None):
    """
    Check a DNS record and optionally validate its value.
    
    Args:
        hostname: Domain to check
        record_type: 'A', 'AAAA', 'CNAME', 'MX', 'TXT', etc.
        expected_value: Expected exact value
        expected_pattern: Regex pattern to match against
    
    Returns:
        Dict with status, values, and any errors
    """
    import re
    
    resolver = dns.resolver.Resolver()
    resolver.timeout = 5
    resolver.lifetime = 10
    
    try:
        answers = resolver.resolve(hostname, record_type)
        
        values = []
        for rdata in answers:
            if record_type == 'A':
                values.append(str(rdata.address))
            elif record_type == 'CNAME':
                values.append(str(rdata.target))
            elif record_type == 'MX':
                values.append(f"{rdata.preference} {rdata.exchange}")
            elif record_type == 'TXT':
                values.append(''.join(s.decode() for s in rdata.strings))
            else:
                values.append(str(rdata))
        
        result = {
            "status": "resolved",
            "hostname": hostname,
            "record_type": record_type,
            "values": values,
            "ttl": answers.rrset.ttl
        }
        
        # Validate against expected value
        if expected_value and expected_value not in values:
            result["status"] = "value_mismatch"
            result["expected"] = expected_value
            result["actual"] = values
        
        # Validate against pattern
        if expected_pattern:
            matched = any(re.match(expected_pattern, v) for v in values)
            if not matched:
                result["status"] = "pattern_mismatch"
                result["pattern"] = expected_pattern
        
        return result
        
    except dns.resolver.NXDOMAIN:
        return {
            "status": "nxdomain",
            "hostname": hostname,
            "record_type": record_type,
            "error": "Domain does not exist"
        }
    except dns.resolver.NoAnswer:
        return {
            "status": "no_record",
            "hostname": hostname,
            "record_type": record_type,
            "error": f"No {record_type} record found"
        }
    except dns.exception.Timeout:
        return {
            "status": "timeout",
            "hostname": hostname,
            "record_type": record_type,
            "error": "DNS query timed out"
        }

Monitoring DNS Propagation

When you change DNS records, propagation takes time. Monitor how your changes propagate across global resolvers:

def monitor_dns_propagation(hostname, record_type, expected_value, resolvers=None):
    """
    Check DNS record propagation across multiple resolvers.
    Shows which resolvers have the new value vs old value.
    """
    if resolvers is None:
        # Public DNS resolvers around the world
        resolvers = {
            "Google Public DNS (US)": "8.8.8.8",
            "Cloudflare DNS (US)": "1.1.1.1",
            "OpenDNS (US)": "208.67.222.222",
            "Quad9 (Europe)": "9.9.9.9",
            "Google DNS (Secondary)": "8.8.4.4",
            "Cloudflare Secondary": "1.0.0.1",
            "Comcast DNS": "75.75.75.75",
            "Level3 DNS": "209.244.0.3",
        }
    
    results = {}
    propagated_count = 0
    
    for resolver_name, resolver_ip in resolvers.items():
        try:
            import dns.resolver
            resolver = dns.resolver.Resolver()
            resolver.nameservers = [resolver_ip]
            resolver.timeout = 3
            
            answers = resolver.resolve(hostname, record_type)
            values = [str(a) for a in answers]
            
            is_propagated = expected_value in values
            if is_propagated:
                propagated_count += 1
            
            results[resolver_name] = {
                "ip": resolver_ip,
                "values": values,
                "propagated": is_propagated,
                "status": "propagated" if is_propagated else "old_value"
            }
            
        except Exception as e:
            results[resolver_name] = {
                "ip": resolver_ip,
                "status": "error",
                "error": str(e)
            }
    
    total = len(resolvers)
    propagation_percentage = (propagated_count / total) * 100
    
    return {
        "hostname": hostname,
        "expected_value": expected_value,
        "propagation_percentage": round(propagation_percentage, 1),
        "propagated_count": propagated_count,
        "total_resolvers": total,
        "results": results,
        "fully_propagated": propagated_count == total
    }

# Example output:
# {
#   "propagation_percentage": 75.0,
#   "propagated_count": 6,
#   "total_resolvers": 8,
#   "results": {
#     "Google Public DNS": {"propagated": True, "values": ["192.0.2.1"]},
#     "Cloudflare DNS": {"propagated": False, "values": ["192.0.2.0"]}  # Old value
#   }
# }

DNS TTL Management

TTL (Time To Live) determines how long resolvers cache your records. It directly affects propagation time:

| TTL Value | Cache Duration | Propagation Time | Use Case | |---|---|---|---| | 60s | 1 minute | Very fast | Before planned changes | | 300s | 5 minutes | 5-10 minutes | Active change windows | | 3600s | 1 hour | 1-2 hours | Normal operations | | 86400s | 24 hours | 24-48 hours | Stable, rarely changed records |

Best practice for DNS changes:

Lower TTL to 300s at least 2x the current TTL before your change window
Make the DNS change
Verify propagation (see above)
Restore TTL to normal value after successful propagation

# Check current TTL
dig api.example.com A +short

# Check against authoritative nameserver (most accurate)
dig @ns1.example.com api.example.com A +noall +answer

# Output includes TTL in second column:
# api.example.com. 300 IN A 192.0.2.1
#                  ^^^ TTL = 300 seconds

Monitoring DNS Response Time

DNS lookup time contributes to TTFB. Monitor it:

# Measure DNS resolution time from different locations
# Using dig with timing
time dig api.example.com A @8.8.8.8 +noall +answer

# Or curl with timing breakdown
curl -s -o /dev/null -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\n" \
  https://api.example.com

# Typical healthy values:
# DNS from cache: < 1ms
# DNS uncached (same continent): 20-50ms
# DNS uncached (cross-continent): 100-300ms

Alert when DNS resolution time is unexpectedly high:

monitor:
  name: "DNS Resolution Time"
  type: dns
  hostname: "api.example.com"
  record_type: A
  assertions:
    - type: resolution_time
      operator: less_than
      value: 100  # ms
    - type: record_value
      expected: "192.0.2.1"
    - type: ttl
      operator: greater_than
      value: 60  # Ensure minimum TTL is set

Monitoring DNS for Email (MX, SPF, DKIM, DMARC)

Email delivery depends on DNS records that are easy to misconfigure and hard to detect as broken without specific monitoring:

def check_email_dns_health(domain):
    """
    Comprehensive check of email-related DNS records.
    Returns health status for email deliverability.
    """
    checks = {}
    
    # MX Records
    try:
        mx_records = dns.resolver.resolve(domain, 'MX')
        checks['mx'] = {
            'status': 'healthy',
            'records': sorted([str(r.exchange) for r in mx_records])
        }
    except Exception as e:
        checks['mx'] = {'status': 'missing', 'error': str(e)}
    
    # SPF Record (TXT record starting with "v=spf1")
    try:
        txt_records = dns.resolver.resolve(domain, 'TXT')
        spf_records = [
            ''.join(s.decode() for s in r.strings)
            for r in txt_records
            if any(s.decode().startswith('v=spf1') for s in r.strings)
        ]
        
        if spf_records:
            checks['spf'] = {
                'status': 'healthy',
                'record': spf_records[0]
            }
        else:
            checks['spf'] = {
                'status': 'missing',
                'error': 'No SPF record found'
            }
    except Exception as e:
        checks['spf'] = {'status': 'error', 'error': str(e)}
    
    # DMARC Record
    try:
        dmarc_records = dns.resolver.resolve(f'_dmarc.{domain}', 'TXT')
        dmarc = ''.join(
            ''.join(s.decode() for s in r.strings)
            for r in dmarc_records
        )
        
        if 'v=DMARC1' in dmarc:
            # Check policy
            policy = 'none'
            if 'p=reject' in dmarc:
                policy = 'reject'
            elif 'p=quarantine' in dmarc:
                policy = 'quarantine'
            
            checks['dmarc'] = {
                'status': 'healthy',
                'policy': policy,
                'record': dmarc
            }
        else:
            checks['dmarc'] = {'status': 'invalid', 'record': dmarc}
            
    except Exception as e:
        checks['dmarc'] = {'status': 'missing', 'error': str(e)}
    
    # Overall email health
    missing_critical = [
        k for k, v in checks.items()
        if v['status'] in ('missing', 'error')
    ]
    
    return {
        'domain': domain,
        'overall_status': 'unhealthy' if missing_critical else 'healthy',
        'missing_critical': missing_critical,
        'checks': checks
    }

DNS Hijacking Detection

DNS hijacking redirects users to malicious servers. Monitor for unexpected IP changes:

import hashlib
import json
from datetime import datetime

class DNSChangeDetector:
    """
    Detect unexpected DNS record changes.
    Stores baseline and alerts on changes.
    """
    
    def __init__(self, storage_path):
        self.storage_path = storage_path
        self.baseline = self.load_baseline()
    
    def load_baseline(self):
        try:
            with open(self.storage_path) as f:
                return json.load(f)
        except FileNotFoundError:
            return {}
    
    def save_baseline(self):
        with open(self.storage_path, 'w') as f:
            json.dump(self.baseline, f, indent=2)
    
    def check_and_update(self, hostname, record_type):
        """Check current values and compare to baseline"""
        current = self.resolve(hostname, record_type)
        key = f"{hostname}:{record_type}"
        
        if key not in self.baseline:
            # First run - establish baseline
            self.baseline[key] = {
                'values': current['values'],
                'first_seen': datetime.utcnow().isoformat(),
                'last_seen': datetime.utcnow().isoformat()
            }
            self.save_baseline()
            return {'status': 'baseline_established', 'values': current['values']}
        
        baseline_values = set(self.baseline[key]['values'])
        current_values = set(current['values'])
        
        added = current_values - baseline_values
        removed = baseline_values - current_values
        
        if added or removed:
            alert = {
                'status': 'CHANGE_DETECTED',
                'hostname': hostname,
                'record_type': record_type,
                'added': list(added),
                'removed': list(removed),
                'baseline_values': list(baseline_values),
                'current_values': list(current_values),
                'detected_at': datetime.utcnow().isoformat()
            }
            
            # This could indicate DNS hijacking
            print(f"[DNS_CHANGE] {json.dumps(alert, indent=2)}")
            
            return alert
        
        # Update last seen timestamp
        self.baseline[key]['last_seen'] = datetime.utcnow().isoformat()
        self.save_baseline()
        
        return {'status': 'no_change', 'values': list(current_values)}

External DNS Monitoring with AzMonitor

Configure continuous DNS monitoring:

monitors:
  # A record monitoring
  - name: "DNS - API Hostname"
    type: dns
    hostname: "api.example.com"
    record_type: A
    interval: 300
    assertions:
      - type: resolves_to
        expected_ips: ["192.0.2.1", "192.0.2.2"]
      - type: resolution_time
        operator: less_than
        value: 200
    
  # SSL cert + DNS consistency
  - name: "DNS - SSL Alignment"
    type: dns
    hostname: "api.example.com"
    assertions:
      - type: resolves
        # Just checks it resolves to something
      - type: ttl
        operator: greater_than
        value: 60
        
  # CNAME chain monitoring
  - name: "DNS - CDN CNAME"
    type: dns
    hostname: "cdn.example.com"
    record_type: CNAME
    assertions:
      - type: resolves_to_pattern
        pattern: ".*\\.cloudfront\\.net$"

Conclusion

DNS monitoring sits at the intersection of infrastructure reliability and security. Basic health checks confirm your records resolve correctly. Change detection catches unexpected modifications (which can indicate hijacking or misconfiguration). Propagation monitoring tracks how DNS changes roll out globally. And email DNS checks ensure your deliverability setup stays intact. AzMonitor's monitoring capabilities include DNS checks that verify record values, TTL settings, and resolution time from multiple global locations — providing the continuous DNS visibility that prevents the class of "everything is down and nobody knows why" incidents that turn out to be DNS problems in disguise.

Tags:DNS monitoringDNS propagationDNS healthinfrastructure

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

DNS Propagation Monitoring: Tracking Changes Across the Global DNS System

Why DNS Monitoring Matters

Types of DNS Records to Monitor

Monitoring DNS Record Values

Monitoring DNS Propagation

DNS TTL Management

Monitoring DNS Response Time

Monitoring DNS for Email (MX, SPF, DKIM, DMARC)

DNS Hijacking Detection

External DNS Monitoring with AzMonitor

Conclusion

Related articles

DNS Monitoring Explained: Catching Outages at the Source

The Three Pillars of Observability: Logs, Metrics, and Traces

WebSocket Monitoring: Observing Real-Time Connection Health