The DORA (DevOps Research and Assessment) metrics are the most widely validated framework for measuring software delivery performance. Over six years of research involving thousands of teams across hundreds of organizations, DORA established that four key metrics predict both software delivery performance and organizational outcomes like profitability, market share, and productivity. These metrics aren't just interesting — they're predictive of business results.
The Four DORA Metrics
1. Deployment Frequency
How often does your organization successfully release software to production?
This metric captures the cadence of delivering value to users. Elite performers deploy multiple times per day. Low performers deploy once a month or less.
| Category | Frequency | |---|---| | Elite | Multiple deploys per day | | High | 1 per day to 1 per week | | Medium | 1 per week to 1 per month | | Low | Fewer than once per month |
2. Lead Time for Changes
How long does it take to get a committed code change into production?
Measures the time from committing code to that code running in production. Shorter lead times mean faster feedback and faster delivery of value.
| Category | Lead Time | |---|---| | Elite | Less than 1 hour | | High | 1 day to 1 week | | Medium | 1 week to 1 month | | Low | More than 1 month |
3. Mean Time to Restore (MTTR)
How quickly can you restore service after an incident?
When your service fails, how long does it take to get back to normal? This captures your incident response capability and reliability.
| Category | MTTR | |---|---| | Elite | Less than 1 hour | | High | Less than 1 day | | Medium | 1 day to 1 week | | Low | More than 1 week |
4. Change Failure Rate
What percentage of changes cause incidents?
How often do your deployments cause service degradation or require rollback? Lower is better, but note: the DORA research found elite performers don't have zero failures — they have fast recovery.
| Category | Change Failure Rate | |---|---| | Elite | 0-15% | | High | 16-30% | | Medium | 16-30% | | Low | 16-30% |
(The research found medium and low performers had similar change failure rates — the difference was primarily in deployment frequency and MTTR.)
Calculating DORA Metrics
Deployment Frequency
from datetime import datetime, timedelta
def calculate_deployment_frequency(deployments, days=30):
"""
Calculate deployment frequency over a time period.
"""
# Filter to production deployments only
production_deployments = [
d for d in deployments
if d.environment == "production" and d.status == "success"
]
# Count deployments in the window
start_date = datetime.utcnow() - timedelta(days=days)
recent_deployments = [
d for d in production_deployments
if d.deployed_at >= start_date
]
deployments_per_day = len(recent_deployments) / days
# Classify
if deployments_per_day >= 1:
category = "elite"
elif deployments_per_day >= 1/7: # At least weekly
category = "high"
elif deployments_per_day >= 1/30: # At least monthly
category = "medium"
else:
category = "low"
return {
"deployments_count": len(recent_deployments),
"deployments_per_day": round(deployments_per_day, 2),
"deployments_per_week": round(deployments_per_day * 7, 1),
"category": category
}
# Example calculation
deployments = load_deployments(days=90)
frequency = calculate_deployment_frequency(deployments, days=90)
print(f"Deployment frequency: {frequency['deployments_per_week']}/week ({frequency['category']})")
Lead Time for Changes
def calculate_lead_time(deployments, code_commits):
"""
Calculate lead time from commit to production deployment.
This requires linking deployments to their commits.
Methods: git tags, deployment metadata, CI/CD pipeline data.
"""
lead_times = []
for deployment in deployments:
if deployment.commit_hash and deployment.deployed_at:
# Find the commit timestamp
commit = next(
(c for c in code_commits if c.hash == deployment.commit_hash),
None
)
if commit:
lead_time_hours = (
deployment.deployed_at - commit.committed_at
).total_seconds() / 3600
lead_times.append(lead_time_hours)
if not lead_times:
return {"error": "No deployments with commit data"}
sorted_times = sorted(lead_times)
median_hours = sorted_times[len(sorted_times) // 2]
# Classify
if median_hours < 1:
category = "elite"
elif median_hours < 168: # 7 days
category = "high"
elif median_hours < 720: # 30 days
category = "medium"
else:
category = "low"
return {
"median_lead_time_hours": round(median_hours, 1),
"p90_lead_time_hours": round(sorted_times[int(len(sorted_times) * 0.9)], 1),
"category": category,
"sample_size": len(lead_times)
}
MTTR Calculation
def calculate_mttr(incidents, days=30):
"""
Calculate Mean Time to Restore from incident data.
Requires incidents with:
- detected_at: When monitoring first detected the issue
- resolved_at: When the service returned to normal
"""
start_date = datetime.utcnow() - timedelta(days=days)
recent_incidents = [
i for i in incidents
if i.detected_at >= start_date and i.resolved_at is not None
]
if not recent_incidents:
return {"error": "No resolved incidents in window", "category": "insufficient_data"}
restoration_times = [
(i.resolved_at - i.detected_at).total_seconds() / 3600
for i in recent_incidents
]
mean_hours = sum(restoration_times) / len(restoration_times)
sorted_times = sorted(restoration_times)
median_hours = sorted_times[len(sorted_times) // 2]
# Classify
if mean_hours < 1:
category = "elite"
elif mean_hours < 24:
category = "high"
elif mean_hours < 168: # 7 days
category = "medium"
else:
category = "low"
return {
"mean_mttr_hours": round(mean_hours, 2),
"median_mttr_hours": round(median_hours, 2),
"p95_mttr_hours": round(sorted_times[int(len(sorted_times) * 0.95)], 2),
"incident_count": len(recent_incidents),
"category": category
}
Change Failure Rate
def calculate_change_failure_rate(deployments, incidents, days=30):
"""
Calculate what percentage of deployments caused incidents.
Methodology: A deployment "caused" an incident if the incident
started within 1 hour of the deployment.
"""
start_date = datetime.utcnow() - timedelta(days=days)
recent_deployments = [
d for d in deployments
if d.deployed_at >= start_date and d.environment == "production"
]
failed_deployments = 0
for deployment in recent_deployments:
# Check if any incident started within 1 hour of this deployment
deployment_caused_incident = any(
abs((incident.detected_at - deployment.deployed_at).total_seconds()) < 3600
for incident in incidents
if incident.detected_at >= start_date
)
if deployment_caused_incident:
failed_deployments += 1
cfr = failed_deployments / len(recent_deployments) if recent_deployments else 0
# Classify
if cfr <= 0.15:
category = "elite"
elif cfr <= 0.30:
category = "high"
else:
category = "low"
return {
"change_failure_rate": round(cfr, 3),
"change_failure_rate_pct": round(cfr * 100, 1),
"total_deployments": len(recent_deployments),
"failed_deployments": failed_deployments,
"category": category
}
Creating a DORA Dashboard
Build a dashboard that tracks all four metrics:
def generate_dora_report(deployments, commits, incidents, days=30):
"""Generate comprehensive DORA metrics report"""
return {
"period_days": days,
"generated_at": datetime.utcnow().isoformat(),
"deployment_frequency": calculate_deployment_frequency(deployments, days),
"lead_time": calculate_lead_time(deployments, commits),
"mttr": calculate_mttr(incidents, days),
"change_failure_rate": calculate_change_failure_rate(deployments, incidents, days),
"overall_category": determine_overall_category({
"deployment_frequency": calculate_deployment_frequency(deployments, days)["category"],
"lead_time": calculate_lead_time(deployments, commits)["category"],
"mttr": calculate_mttr(incidents, days)["category"],
"change_failure_rate": calculate_change_failure_rate(deployments, incidents, days)["category"]
})
}
def determine_overall_category(metric_categories):
"""Determine overall DORA performance category"""
category_scores = {"elite": 4, "high": 3, "medium": 2, "low": 1}
scores = [category_scores.get(c, 0) for c in metric_categories.values()]
avg_score = sum(scores) / len(scores)
if avg_score >= 3.5:
return "elite"
elif avg_score >= 2.5:
return "high"
elif avg_score >= 1.5:
return "medium"
else:
return "low"
Using DORA Metrics to Drive Improvement
DORA metrics are useful only when they drive action:
| Low Metric | Root Cause Patterns | Improvement Approach | |---|---|---| | Deployment Frequency | Long release cycles, manual processes | Automate deployment pipeline, reduce batch size | | Lead Time | Long review processes, manual testing | Parallel reviews, automated testing | | High MTTR | Poor monitoring, slow diagnosis | Better observability, playbooks, chaos engineering | | High CFR | Insufficient testing, risky deployments | Better test coverage, feature flags, gradual rollouts |
Improving MTTR with Better Monitoring
MTTR is the metric most directly influenced by monitoring quality. Three components:
- Time to detect — Monitoring alert latency. Reduce with: lower check intervals, synthetic monitoring, RUM
- Time to diagnose — Observability quality. Reduce with: distributed tracing, structured logs, dashboards
- Time to resolve — Playbooks, automation. Reduce with: runbooks, automated remediation, feature flags
# Track MTTR components separately
def calculate_mttr_breakdown(incidents):
"""
Break down MTTR into detection, diagnosis, and resolution components.
Requires incidents to track when each phase ended.
"""
breakdowns = []
for incident in incidents:
if not all([incident.detected_at, incident.diagnosed_at, incident.resolved_at]):
continue
detection_time = (
incident.detected_at - incident.actual_start_at
).total_seconds() / 60 if incident.actual_start_at else None
diagnosis_time = (
incident.diagnosed_at - incident.detected_at
).total_seconds() / 60
resolution_time = (
incident.resolved_at - incident.diagnosed_at
).total_seconds() / 60
total_mttr = (
incident.resolved_at - incident.detected_at
).total_seconds() / 60
breakdowns.append({
"detection_minutes": detection_time,
"diagnosis_minutes": diagnosis_time,
"resolution_minutes": resolution_time,
"total_minutes": total_mttr
})
# Average each component
return {
"avg_detection_minutes": sum(b["detection_minutes"] for b in breakdowns if b["detection_minutes"]) / len(breakdowns),
"avg_diagnosis_minutes": sum(b["diagnosis_minutes"] for b in breakdowns) / len(breakdowns),
"avg_resolution_minutes": sum(b["resolution_minutes"] for b in breakdowns) / len(breakdowns),
"avg_total_minutes": sum(b["total_minutes"] for b in breakdowns) / len(breakdowns)
}
DORA Metrics and Monitoring
DORA metrics have a direct relationship with monitoring capabilities:
- Deployment Frequency — Automated deployment pipelines with monitoring gates allow faster, safer deployments
- Lead Time — Automated testing and monitoring-based deploy decisions reduce lead time
- MTTR — Monitoring quality is the primary determinant of detection and diagnosis time
- Change Failure Rate — Post-deploy monitoring catches regressions quickly, enabling rapid rollback
Elite engineering teams invest heavily in monitoring because it enables all four DORA improvements. You can't safely increase deployment frequency without monitoring that catches regressions. You can't reduce MTTR without monitoring that detects issues quickly.
Conclusion
DORA metrics give engineering organizations a validated framework for measuring and improving software delivery performance. They work because they measure the outcomes that matter — speed (frequency, lead time) and stability (MTTR, change failure rate) — rather than proxy metrics like code coverage or story points. Start measuring your four DORA metrics now, identify your lowest-performing metric, and focus improvement efforts there. AzMonitor directly impacts two DORA metrics: MTTR (through faster incident detection) and change failure rate (through post-deploy monitoring that catches regressions before they become major incidents).
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →