Reliability Engineering

DORA Metrics: Measuring Software Delivery and Operational Performance

Learn the four DORA metrics for software delivery performance — deployment frequency, lead time, MTTR, and change failure rate — and how to use them to improve engineering.

AzMonitor TeamNovember 26, 20258 min read · 1,569 wordsUpdated January 20, 2026
DORA metricsDevOpsdeployment frequencyMTTRreliability

The DORA (DevOps Research and Assessment) metrics are the most widely validated framework for measuring software delivery performance. Over six years of research involving thousands of teams across hundreds of organizations, DORA established that four key metrics predict both software delivery performance and organizational outcomes like profitability, market share, and productivity. These metrics aren't just interesting — they're predictive of business results.

The Four DORA Metrics

1. Deployment Frequency

How often does your organization successfully release software to production?

This metric captures the cadence of delivering value to users. Elite performers deploy multiple times per day. Low performers deploy once a month or less.

| Category | Frequency | |---|---| | Elite | Multiple deploys per day | | High | 1 per day to 1 per week | | Medium | 1 per week to 1 per month | | Low | Fewer than once per month |

2. Lead Time for Changes

How long does it take to get a committed code change into production?

Measures the time from committing code to that code running in production. Shorter lead times mean faster feedback and faster delivery of value.

| Category | Lead Time | |---|---| | Elite | Less than 1 hour | | High | 1 day to 1 week | | Medium | 1 week to 1 month | | Low | More than 1 month |

3. Mean Time to Restore (MTTR)

How quickly can you restore service after an incident?

When your service fails, how long does it take to get back to normal? This captures your incident response capability and reliability.

| Category | MTTR | |---|---| | Elite | Less than 1 hour | | High | Less than 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |

4. Change Failure Rate

What percentage of changes cause incidents?

How often do your deployments cause service degradation or require rollback? Lower is better, but note: the DORA research found elite performers don't have zero failures — they have fast recovery.

| Category | Change Failure Rate | |---|---| | Elite | 0-15% | | High | 16-30% | | Medium | 16-30% | | Low | 16-30% |

(The research found medium and low performers had similar change failure rates — the difference was primarily in deployment frequency and MTTR.)

Calculating DORA Metrics

Deployment Frequency

from datetime import datetime, timedelta

def calculate_deployment_frequency(deployments, days=30):
    """
    Calculate deployment frequency over a time period.
    """
    # Filter to production deployments only
    production_deployments = [
        d for d in deployments 
        if d.environment == "production" and d.status == "success"
    ]
    
    # Count deployments in the window
    start_date = datetime.utcnow() - timedelta(days=days)
    recent_deployments = [
        d for d in production_deployments
        if d.deployed_at >= start_date
    ]
    
    deployments_per_day = len(recent_deployments) / days
    
    # Classify
    if deployments_per_day >= 1:
        category = "elite"
    elif deployments_per_day >= 1/7:  # At least weekly
        category = "high"
    elif deployments_per_day >= 1/30:  # At least monthly
        category = "medium"
    else:
        category = "low"
    
    return {
        "deployments_count": len(recent_deployments),
        "deployments_per_day": round(deployments_per_day, 2),
        "deployments_per_week": round(deployments_per_day * 7, 1),
        "category": category
    }

# Example calculation
deployments = load_deployments(days=90)
frequency = calculate_deployment_frequency(deployments, days=90)
print(f"Deployment frequency: {frequency['deployments_per_week']}/week ({frequency['category']})")

Lead Time for Changes

def calculate_lead_time(deployments, code_commits):
    """
    Calculate lead time from commit to production deployment.
    
    This requires linking deployments to their commits.
    Methods: git tags, deployment metadata, CI/CD pipeline data.
    """
    lead_times = []
    
    for deployment in deployments:
        if deployment.commit_hash and deployment.deployed_at:
            # Find the commit timestamp
            commit = next(
                (c for c in code_commits if c.hash == deployment.commit_hash),
                None
            )
            
            if commit:
                lead_time_hours = (
                    deployment.deployed_at - commit.committed_at
                ).total_seconds() / 3600
                
                lead_times.append(lead_time_hours)
    
    if not lead_times:
        return {"error": "No deployments with commit data"}
    
    sorted_times = sorted(lead_times)
    median_hours = sorted_times[len(sorted_times) // 2]
    
    # Classify
    if median_hours < 1:
        category = "elite"
    elif median_hours < 168:  # 7 days
        category = "high"
    elif median_hours < 720:  # 30 days
        category = "medium"
    else:
        category = "low"
    
    return {
        "median_lead_time_hours": round(median_hours, 1),
        "p90_lead_time_hours": round(sorted_times[int(len(sorted_times) * 0.9)], 1),
        "category": category,
        "sample_size": len(lead_times)
    }

MTTR Calculation

def calculate_mttr(incidents, days=30):
    """
    Calculate Mean Time to Restore from incident data.
    
    Requires incidents with:
    - detected_at: When monitoring first detected the issue
    - resolved_at: When the service returned to normal
    """
    start_date = datetime.utcnow() - timedelta(days=days)
    
    recent_incidents = [
        i for i in incidents
        if i.detected_at >= start_date and i.resolved_at is not None
    ]
    
    if not recent_incidents:
        return {"error": "No resolved incidents in window", "category": "insufficient_data"}
    
    restoration_times = [
        (i.resolved_at - i.detected_at).total_seconds() / 3600
        for i in recent_incidents
    ]
    
    mean_hours = sum(restoration_times) / len(restoration_times)
    sorted_times = sorted(restoration_times)
    median_hours = sorted_times[len(sorted_times) // 2]
    
    # Classify
    if mean_hours < 1:
        category = "elite"
    elif mean_hours < 24:
        category = "high"
    elif mean_hours < 168:  # 7 days
        category = "medium"
    else:
        category = "low"
    
    return {
        "mean_mttr_hours": round(mean_hours, 2),
        "median_mttr_hours": round(median_hours, 2),
        "p95_mttr_hours": round(sorted_times[int(len(sorted_times) * 0.95)], 2),
        "incident_count": len(recent_incidents),
        "category": category
    }

Change Failure Rate

def calculate_change_failure_rate(deployments, incidents, days=30):
    """
    Calculate what percentage of deployments caused incidents.
    
    Methodology: A deployment "caused" an incident if the incident
    started within 1 hour of the deployment.
    """
    start_date = datetime.utcnow() - timedelta(days=days)
    
    recent_deployments = [
        d for d in deployments
        if d.deployed_at >= start_date and d.environment == "production"
    ]
    
    failed_deployments = 0
    
    for deployment in recent_deployments:
        # Check if any incident started within 1 hour of this deployment
        deployment_caused_incident = any(
            abs((incident.detected_at - deployment.deployed_at).total_seconds()) < 3600
            for incident in incidents
            if incident.detected_at >= start_date
        )
        
        if deployment_caused_incident:
            failed_deployments += 1
    
    cfr = failed_deployments / len(recent_deployments) if recent_deployments else 0
    
    # Classify
    if cfr <= 0.15:
        category = "elite"
    elif cfr <= 0.30:
        category = "high"
    else:
        category = "low"
    
    return {
        "change_failure_rate": round(cfr, 3),
        "change_failure_rate_pct": round(cfr * 100, 1),
        "total_deployments": len(recent_deployments),
        "failed_deployments": failed_deployments,
        "category": category
    }

Creating a DORA Dashboard

Build a dashboard that tracks all four metrics:

def generate_dora_report(deployments, commits, incidents, days=30):
    """Generate comprehensive DORA metrics report"""
    
    return {
        "period_days": days,
        "generated_at": datetime.utcnow().isoformat(),
        
        "deployment_frequency": calculate_deployment_frequency(deployments, days),
        "lead_time": calculate_lead_time(deployments, commits),
        "mttr": calculate_mttr(incidents, days),
        "change_failure_rate": calculate_change_failure_rate(deployments, incidents, days),
        
        "overall_category": determine_overall_category({
            "deployment_frequency": calculate_deployment_frequency(deployments, days)["category"],
            "lead_time": calculate_lead_time(deployments, commits)["category"],
            "mttr": calculate_mttr(incidents, days)["category"],
            "change_failure_rate": calculate_change_failure_rate(deployments, incidents, days)["category"]
        })
    }

def determine_overall_category(metric_categories):
    """Determine overall DORA performance category"""
    category_scores = {"elite": 4, "high": 3, "medium": 2, "low": 1}
    
    scores = [category_scores.get(c, 0) for c in metric_categories.values()]
    avg_score = sum(scores) / len(scores)
    
    if avg_score >= 3.5:
        return "elite"
    elif avg_score >= 2.5:
        return "high"
    elif avg_score >= 1.5:
        return "medium"
    else:
        return "low"

Using DORA Metrics to Drive Improvement

DORA metrics are useful only when they drive action:

| Low Metric | Root Cause Patterns | Improvement Approach | |---|---|---| | Deployment Frequency | Long release cycles, manual processes | Automate deployment pipeline, reduce batch size | | Lead Time | Long review processes, manual testing | Parallel reviews, automated testing | | High MTTR | Poor monitoring, slow diagnosis | Better observability, playbooks, chaos engineering | | High CFR | Insufficient testing, risky deployments | Better test coverage, feature flags, gradual rollouts |

Improving MTTR with Better Monitoring

MTTR is the metric most directly influenced by monitoring quality. Three components:

  1. Time to detect — Monitoring alert latency. Reduce with: lower check intervals, synthetic monitoring, RUM
  2. Time to diagnose — Observability quality. Reduce with: distributed tracing, structured logs, dashboards
  3. Time to resolve — Playbooks, automation. Reduce with: runbooks, automated remediation, feature flags
# Track MTTR components separately
def calculate_mttr_breakdown(incidents):
    """
    Break down MTTR into detection, diagnosis, and resolution components.
    Requires incidents to track when each phase ended.
    """
    breakdowns = []
    
    for incident in incidents:
        if not all([incident.detected_at, incident.diagnosed_at, incident.resolved_at]):
            continue
        
        detection_time = (
            incident.detected_at - incident.actual_start_at
        ).total_seconds() / 60 if incident.actual_start_at else None
        
        diagnosis_time = (
            incident.diagnosed_at - incident.detected_at
        ).total_seconds() / 60
        
        resolution_time = (
            incident.resolved_at - incident.diagnosed_at
        ).total_seconds() / 60
        
        total_mttr = (
            incident.resolved_at - incident.detected_at
        ).total_seconds() / 60
        
        breakdowns.append({
            "detection_minutes": detection_time,
            "diagnosis_minutes": diagnosis_time,
            "resolution_minutes": resolution_time,
            "total_minutes": total_mttr
        })
    
    # Average each component
    return {
        "avg_detection_minutes": sum(b["detection_minutes"] for b in breakdowns if b["detection_minutes"]) / len(breakdowns),
        "avg_diagnosis_minutes": sum(b["diagnosis_minutes"] for b in breakdowns) / len(breakdowns),
        "avg_resolution_minutes": sum(b["resolution_minutes"] for b in breakdowns) / len(breakdowns),
        "avg_total_minutes": sum(b["total_minutes"] for b in breakdowns) / len(breakdowns)
    }

DORA Metrics and Monitoring

DORA metrics have a direct relationship with monitoring capabilities:

  • Deployment Frequency — Automated deployment pipelines with monitoring gates allow faster, safer deployments
  • Lead Time — Automated testing and monitoring-based deploy decisions reduce lead time
  • MTTR — Monitoring quality is the primary determinant of detection and diagnosis time
  • Change Failure Rate — Post-deploy monitoring catches regressions quickly, enabling rapid rollback

Elite engineering teams invest heavily in monitoring because it enables all four DORA improvements. You can't safely increase deployment frequency without monitoring that catches regressions. You can't reduce MTTR without monitoring that detects issues quickly.

Conclusion

DORA metrics give engineering organizations a validated framework for measuring and improving software delivery performance. They work because they measure the outcomes that matter — speed (frequency, lead time) and stability (MTTR, change failure rate) — rather than proxy metrics like code coverage or story points. Start measuring your four DORA metrics now, identify your lowest-performing metric, and focus improvement efforts there. AzMonitor directly impacts two DORA metrics: MTTR (through faster incident detection) and change failure rate (through post-deploy monitoring that catches regressions before they become major incidents).

Tags:DORA metricsDevOpsdeployment frequencyMTTRreliability
Back to blog
A
AzMonitor Team
The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.
Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →