Microservices architecture fundamentally changes how monitoring works. Instead of monitoring one application with one database, you're monitoring dozens or hundreds of services, each with their own health, dependencies, and failure modes. A single user request might touch 15 services — and any one of them can fail independently.

Traditional uptime monitoring (check one URL, get one status) breaks down in microservices environments. This guide covers the patterns and practices that actually work.

The Microservices Monitoring Challenge

In a monolithic application, the app is either up or it isn't. In microservices:

Individual services can be up while dependent services are down
A "degraded" service (high latency, elevated error rate) can cascade into failures elsewhere
Service A failing might not cause user-visible problems if Service B handles the load
Service A failing might cause catastrophic cascading failures if other services depend on it

This complexity requires monitoring strategies that understand service relationships and cascade effects.

Health Check Patterns for Microservices

Liveness Probes

A liveness probe answers: "Is this process alive and should it keep running?" If it fails, the orchestration system (Kubernetes) restarts the container.

// Simple liveness check - just confirms the process can handle requests
func livenessHandler(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "alive"})
}

Liveness probes should be simple and always succeed unless the process is truly stuck. A liveness probe that checks database connectivity will restart your service even if the database is temporarily unavailable — which usually makes things worse.

Readiness Probes

A readiness probe answers: "Is this service ready to serve traffic?" If it fails, traffic is routed away from this instance (but the instance isn't restarted).

// Readiness check - verifies dependencies are accessible
func readinessHandler(w http.ResponseWriter, r *http.Request) {
    // Check database
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        json.NewEncoder(w).Encode(map[string]interface{}{
            "status": "not_ready",
            "reason": "database_unavailable",
        })
        return
    }
    
    // Check cache
    if err := cache.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        json.NewEncoder(w).Encode(map[string]interface{}{
            "status": "not_ready",
            "reason": "cache_unavailable",
        })
        return
    }
    
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
}

Deep Health Checks

A deep health check (sometimes called a startup probe or diagnostic endpoint) answers: "Is this service fully healthy and functioning as expected?"

// Deep health check - comprehensive dependency verification
func healthHandler(w http.ResponseWriter, r *http.Request) {
    health := HealthStatus{
        Service: "order-service",
        Version: buildVersion,
        Checks: map[string]CheckResult{},
    }
    
    allHealthy := true
    
    // Database check with timing
    dbStart := time.Now()
    if err := db.Ping(); err != nil {
        health.Checks["database"] = CheckResult{Status: "unhealthy", Error: err.Error()}
        allHealthy = false
    } else {
        health.Checks["database"] = CheckResult{Status: "healthy", Latency: time.Since(dbStart).Milliseconds()}
    }
    
    // Upstream service check
    if resp, err := http.Get("http://payment-service/health"); err != nil || resp.StatusCode != 200 {
        health.Checks["payment_service"] = CheckResult{Status: "unhealthy"}
        allHealthy = false
    } else {
        health.Checks["payment_service"] = CheckResult{Status: "healthy"}
    }
    
    if allHealthy {
        health.Status = "healthy"
        w.WriteHeader(http.StatusOK)
    } else {
        health.Status = "degraded"
        w.WriteHeader(http.StatusServiceUnavailable)
    }
    
    json.NewEncoder(w).Encode(health)
}

See our complete microservices health check guide for the full implementation reference.

External vs Internal Monitoring in Microservices

Internal monitoring (Kubernetes health probes): Ensures containers are restarted when they fail. This is your infrastructure layer.

External monitoring (AzMonitor): Validates that the service works from the outside — the same perspective as your users. External monitoring doesn't know or care about your Kubernetes cluster; it only knows what your service returns.

Both are required. Internal monitoring keeps your instances healthy. External monitoring confirms the combination of all internal systems delivers correct responses to users.

The External Monitoring Layer for Microservices

For external uptime monitoring in a microservices environment, focus on:

1. API Gateway / Entry Points Monitor the public-facing entry points to your system. These are the aggregation points where all internal service failures manifest as user-visible errors.

GET https://api.yourapp.com/health
→ Aggregated health of all upstream services
→ Alert if any dependency is degraded

2. Critical User Journey Endpoints Monitor the API endpoints that represent complete user workflows, not just individual service health checks:

| User Journey | Endpoint to Monitor | |-------------|---------------------| | User login | POST /auth/login | | Core feature | GET /api/feature/list | | Data creation | POST /api/resource | | Data retrieval | GET /api/resource/{id} |

3. Event Bus / Message Queue Health In event-driven microservices, message queue health is critical:

Monitor: Queue depth (alert if growing unbounded)
Monitor: Consumer lag (alert if consumers fall behind)
Monitor: Failed message count (alert on failed message accumulation)

Alert Strategies for Microservices

Single-service alerts generate too much noise in microservices environments. Use alert correlation:

Symptom-based alerting: Alert on user-visible symptoms rather than individual service failures. "Payment API returning 5% errors" is more actionable than "Payment service health check failing."

Dependency-aware suppression: When Service A is down and Services B, C, D all depend on A, suppress alerts for B, C, D and focus on A. This prevents alert floods from cascading failures.

Alert deduplication: Group related alerts into a single incident. 20 services failing simultaneously due to a database outage is one incident, not 20.

Monitoring Cascading Failures

Cascading failures — where one service's failure causes others to fail — are a major risk in microservices. Detect them early by monitoring:

Error rate trends: A service seeing increasing 5xx responses is under stress and may soon fail completely. Alert on trends, not just binary up/down status.

Latency percentiles (P99): High P99 latency in Service A creates pressure on services that call A. A P99 spike often precedes a cascade.

Connection pool utilization: When a service's connection pool to a database or upstream service is near capacity, it's about to experience failure. Alert at 80% utilization.

Service Mesh Integration

If you're using a service mesh (Istio, Linkerd, Consul Connect), you gain rich telemetry about service-to-service communication. Integrate this data with your external monitoring:

Service mesh provides internal metrics (latency, error rates between services)
External monitoring provides user-perspective availability data
Combined: you know what users experience AND why

AzMonitor integrates with standard observability formats, making it straightforward to correlate external check data with service mesh telemetry.

Building a Microservices Monitoring Dashboard

For effective microservices monitoring, structure your dashboard in layers:

Layer 1: User Impact

Overall service availability (%)
API error rate
P95 response time across all services

Layer 2: Service Health

Health status of each service (healthy/degraded/down)
Per-service error rates and latencies

Layer 3: Infrastructure

Per-service resource utilization
Database connection pool status
Message queue depths

Start external monitoring with AzMonitor for Layer 1 and the critical endpoints in Layer 2. Try it free and build your microservices monitoring foundation in under an hour.

Tags:microservices monitoringdistributed systemshealth checksuptime monitoring

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

Uptime Monitoring for Microservices Architectures

The Microservices Monitoring Challenge

Health Check Patterns for Microservices

Liveness Probes

Readiness Probes

Deep Health Checks

External vs Internal Monitoring in Microservices

The External Monitoring Layer for Microservices

Alert Strategies for Microservices

Monitoring Cascading Failures

Service Mesh Integration

Building a Microservices Monitoring Dashboard

Related articles

Synthetic Monitoring vs Real User Monitoring: When to Use Each

Uptime Monitoring for Mobile Apps and Backend APIs

Monitoring Protected Pages: Authenticated Endpoint Checks