API performance benchmarking answers a question that uptime monitoring can't: not just "is the API responding?" but "is it responding as fast as it should?" Establishing performance baselines before problems occur is the only way to detect regressions — without a baseline, you can't tell whether 800ms is normal or 2x slower than last week. Benchmarking creates the reference point that makes performance degradation visible.
Why Baselines Matter
The most common performance monitoring mistake is setting thresholds based on intuition rather than measurement. Teams choose 2000ms as a threshold because it sounds reasonable, without knowing their actual P99 is 150ms. When something genuinely degrades, the monitor doesn't fire until it's 13x slower — which means the degradation has been happening for a long time before it's noticed.
A proper performance baseline gives you:
- Accurate thresholds — Set alerts at 2x your P95, not an arbitrary number
- Regression detection — Compare current performance against historical baseline
- Deployment validation — Automatically flag deployments that regress performance
- SLO input — Base latency SLOs on measured reality, not aspirational targets
Benchmark Tools
k6 for Load Testing and Benchmarking
// k6-benchmark.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const apiLatency = new Trend('api_latency', true);
export const options = {
// Baseline measurement scenario (not load test)
scenarios: {
baseline: {
executor: 'constant-vus',
vus: 5, // Low concurrency for baseline
duration: '5m', // Enough samples for statistical significance
}
},
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
errors: ['rate<0.01'], // Less than 1% errors
}
};
export default function() {
const BASE_URL = __ENV.BASE_URL || 'https://api.example.com';
const TOKEN = __ENV.API_TOKEN;
const params = {
headers: {
'Authorization': `Bearer ${TOKEN}`,
'Content-Type': 'application/json',
},
timeout: '10s',
};
// Benchmark key endpoints
const endpoints = [
{ name: 'list_users', url: `${BASE_URL}/v3/users?limit=20` },
{ name: 'get_user', url: `${BASE_URL}/v3/users/test-user-id` },
{ name: 'search', url: `${BASE_URL}/v3/search?q=test` },
{ name: 'health', url: `${BASE_URL}/health` },
];
for (const endpoint of endpoints) {
const start = Date.now();
const res = http.get(endpoint.url, params);
const duration = Date.now() - start;
// Record to custom metric with endpoint tag
apiLatency.add(duration, { endpoint: endpoint.name });
const success = check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});
errorRate.add(!success);
sleep(0.1); // Small delay between requests
}
}
export function handleSummary(data) {
// Output detailed benchmark results
const results = {};
for (const [metric, values] of Object.entries(data.metrics)) {
if (metric.startsWith('http_req_duration') || metric === 'api_latency') {
results[metric] = {
min: values.values.min,
avg: values.values.avg,
med: values.values.med,
p90: values.values['p(90)'],
p95: values.values['p(95)'],
p99: values.values['p(99)'],
max: values.values.max,
};
}
}
return {
'benchmark-results.json': JSON.stringify({ results, timestamp: new Date().toISOString() }, null, 2),
stdout: `\nBenchmark complete. Results saved to benchmark-results.json\n`,
};
}
Apache Bench for Quick Baselines
# Quick benchmark of a specific endpoint
# 1000 requests, 10 concurrent users
ab -n 1000 -c 10 \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
https://api.example.com/v3/users
# Output:
# Requests per second: 245.30 [#/sec] (mean)
# Time per request: 40.766 [ms] (mean)
#
# Percentage of the requests served within a certain time (ms)
# 50% 38
# 66% 41
# 75% 44
# 80% 46
# 90% 54
# 95% 62
# 98% 78
# 99% 89
# 100% 203 (longest request)
wrk for More Realistic Load Patterns
# wrk benchmark with Lua script for custom logic
wrk -t4 -c50 -d60s \
--latency \
-H "Authorization: Bearer ${API_TOKEN}" \
https://api.example.com/v3/users
# Output includes:
# Latency Distribution
# 50% 38.24ms
# 75% 48.12ms
# 90% 62.55ms
# 99% 112.34ms
Establishing a Baseline
Run benchmarks in a consistent, controlled environment:
# benchmark_baseline.py
import json
import subprocess
import statistics
from datetime import datetime
from pathlib import Path
def run_k6_benchmark(script, env_vars, output_file):
"""Run k6 benchmark and capture results"""
cmd = [
"k6", "run",
"--env", f"BASE_URL={env_vars['BASE_URL']}",
"--env", f"API_TOKEN={env_vars['API_TOKEN']}",
"--out", f"json={output_file}",
script
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.returncode == 0
def extract_percentiles(k6_output_file):
"""Extract key percentiles from k6 JSON output"""
metrics = {}
with open(k6_output_file) as f:
for line in f:
try:
entry = json.loads(line)
if entry.get("type") == "Point" and entry.get("metric") == "http_req_duration":
endpoint = entry.get("data", {}).get("tags", {}).get("name", "unknown")
value = entry["data"]["value"]
if endpoint not in metrics:
metrics[endpoint] = []
metrics[endpoint].append(value)
except json.JSONDecodeError:
continue
baselines = {}
for endpoint, values in metrics.items():
sorted_values = sorted(values)
n = len(sorted_values)
baselines[endpoint] = {
"sample_count": n,
"p50_ms": sorted_values[int(n * 0.50)],
"p90_ms": sorted_values[int(n * 0.90)],
"p95_ms": sorted_values[int(n * 0.95)],
"p99_ms": sorted_values[int(n * 0.99)],
"max_ms": sorted_values[-1],
"mean_ms": statistics.mean(values),
"stdev_ms": statistics.stdev(values) if len(values) > 1 else 0
}
return baselines
def save_baseline(baselines, baseline_path):
"""Save baseline with metadata"""
baseline = {
"created_at": datetime.utcnow().isoformat(),
"environment": "production",
"metrics": baselines,
"thresholds": {
endpoint: {
# Alert threshold = 2x P95 baseline
"warning_ms": data["p95_ms"] * 2,
# Critical threshold = 3x P95 baseline
"critical_ms": data["p95_ms"] * 3
}
for endpoint, data in baselines.items()
}
}
Path(baseline_path).write_text(json.dumps(baseline, indent=2))
return baseline
Regression Detection in CI/CD
Run performance tests on every deployment and compare against baseline:
# regression_detector.py
import json
import sys
from pathlib import Path
def detect_regression(current_results, baseline_path, threshold_multiplier=2.0):
"""
Compare current benchmark results against baseline.
Returns list of regressions detected.
"""
baseline = json.loads(Path(baseline_path).read_text())
regressions = []
for endpoint, current in current_results.items():
if endpoint not in baseline["metrics"]:
# New endpoint, no baseline to compare
continue
baseline_p95 = baseline["metrics"][endpoint]["p95_ms"]
current_p95 = current["p95_ms"]
ratio = current_p95 / baseline_p95 if baseline_p95 > 0 else float('inf')
if ratio >= threshold_multiplier:
regressions.append({
"endpoint": endpoint,
"baseline_p95_ms": baseline_p95,
"current_p95_ms": current_p95,
"ratio": round(ratio, 2),
"regression_pct": round((ratio - 1) * 100, 1),
"severity": "critical" if ratio >= 3.0 else "warning"
})
return regressions
def generate_regression_report(regressions, current_results, baseline):
"""Generate CI-friendly report"""
if not regressions:
print("Performance check PASSED — no regressions detected")
return True
print(f"\nPerformance regression detected in {len(regressions)} endpoint(s):\n")
print(f"{'Endpoint':<30} {'Baseline P95':>12} {'Current P95':>12} {'Change':>10} {'Severity':>10}")
print("-" * 80)
for reg in regressions:
print(
f"{reg['endpoint']:<30} "
f"{reg['baseline_p95_ms']:>10.0f}ms "
f"{reg['current_p95_ms']:>10.0f}ms "
f"{reg['regression_pct']:>+9.1f}% "
f"{reg['severity']:>10}"
)
print("\nFailing CI check due to performance regression.")
return False
# In CI pipeline
if __name__ == "__main__":
current_file = sys.argv[1]
baseline_file = sys.argv[2]
current = json.loads(Path(current_file).read_text())
baseline = json.loads(Path(baseline_file).read_text())
regressions = detect_regression(current["metrics"], baseline_file)
passed = generate_regression_report(regressions, current, baseline)
sys.exit(0 if passed else 1)
CI/CD Integration
# GitHub Actions workflow with performance gates
name: API Performance Benchmark
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install k6
run: |
sudo apt-get update
sudo apt-get install -y k6
- name: Deploy to staging
run: ./scripts/deploy-staging.sh
- name: Wait for deployment
run: ./scripts/wait-for-healthy.sh $STAGING_URL
- name: Run performance benchmark
env:
BASE_URL: ${{ vars.STAGING_URL }}
API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
run: |
k6 run \
--env BASE_URL=$BASE_URL \
--env API_TOKEN=$API_TOKEN \
--out json=benchmark-current.json \
./tests/k6-benchmark.js
- name: Download baseline from S3
run: |
aws s3 cp s3://benchmarks/baseline.json ./baseline.json
- name: Check for regressions
run: |
python3 ./scripts/regression_detector.py \
benchmark-current.json \
baseline.json
- name: Update baseline (main branch only)
if: github.ref == 'refs/heads/main' && success()
run: |
aws s3 cp ./benchmark-current.json \
s3://benchmarks/baseline.json
- name: Upload results
uses: actions/upload-artifact@v3
if: always()
with:
name: benchmark-results
path: benchmark-current.json
Latency Percentiles and What They Tell You
Understanding which percentile to use for different decisions:
| Percentile | Use Case | What It Measures | |---|---|---| | P50 (median) | "Typical" user experience | Half of requests are faster than this | | P90 | "Most" user experience | 90% of requests are faster than this | | P95 | SLO target setting | Standard latency SLO target | | P99 | Identifying tail latency | Worst 1% of users | | P99.9 | Enterprise SLA targets | Worst 0.1% of users | | Max | Outlier detection | Single worst request |
def set_monitoring_thresholds_from_baseline(baseline):
"""
Derive monitoring thresholds from measured baseline.
"""
thresholds = {}
for endpoint, metrics in baseline["metrics"].items():
p95 = metrics["p95_ms"]
p99 = metrics["p99_ms"]
thresholds[endpoint] = {
# Uptime monitor: alert if P50 exceeds 2x baseline P95
# (This means something is very wrong)
"uptime_alert_threshold_ms": p95 * 2,
# Performance alert: P95 is 50% above baseline
"performance_warning_ms": p95 * 1.5,
# Performance critical: P95 has doubled
"performance_critical_ms": p95 * 2,
# SLO target: 99% of requests under this threshold
"slo_p99_target_ms": p99 * 1.2, # 20% headroom
}
return thresholds
Continuous Performance Monitoring
Beyond CI/CD gates, run continuous performance benchmarks against production:
# Scheduled performance benchmark — runs every 6 hours
# This detects gradual performance drift that CI tests miss
monitors:
# AzMonitor handles continuous uptime + basic response time
- name: "API - Users Endpoint"
url: "https://api.example.com/v3/users?limit=1"
interval: 60
assertions:
- type: response_time
operator: less_than
value: 500 # Set from measured P95 baseline
- name: "API - Search Endpoint"
url: "https://api.example.com/v3/search?q=test"
interval: 60
assertions:
- type: response_time
operator: less_than
value: 1200 # Search is naturally slower
# Separate scheduled k6 benchmark for detailed percentile tracking
scheduled_benchmarks:
- name: "production-performance-baseline"
schedule: "0 */6 * * *" # Every 6 hours
script: "k6-benchmark.js"
environment: production
compare_against: rolling_7d_baseline
alert_on_regression: true
Conclusion
API performance benchmarking transforms monitoring from reactive to proactive. Without baselines, you're guessing at thresholds and missing gradual degradation that happens too slowly to trigger point-in-time alerts. With baselines, you have the context to detect a 30% slowdown before it becomes a user-visible problem, and to automatically block deployments that regress performance. AzMonitor's response time assertions — when configured from measured baselines rather than guesses — become meaningful gates that catch the regressions that matter, giving you both continuous production monitoring and meaningful alerting thresholds grounded in real performance data.
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →