SLA negotiation is where engineering reality meets commercial pressure. Sales teams want to promise 99.99% uptime to close deals; engineering teams know that 99.99% means less than 5 minutes of downtime per month across all failure modes. The negotiation determines what you're contractually obligated to deliver — and whether you can actually do it.

Start With Your Actual Uptime

Before negotiating any SLA, know your real numbers. Promising 99.9% when you're currently delivering 98.5% creates a contract you'll breach from day one.

def calculate_historical_uptime(monitoring_data, months=12):
    """
    Calculate actual historical availability as baseline for SLA negotiations.
    """
    monthly_availabilities = []
    
    for month in get_last_n_months(months):
        period_start, period_end = get_month_bounds(month)
        
        checks = monitoring_data.get_checks(
            start=period_start,
            end=period_end
        )
        
        availability = calculate_availability(checks, period_start, period_end)
        monthly_availabilities.append({
            "month": month.strftime("%Y-%m"),
            "availability_pct": availability["availability_pct"],
            "downtime_minutes": availability["downtime_minutes"]
        })
    
    availabilities = [m["availability_pct"] for m in monthly_availabilities]
    
    return {
        "monthly_data": monthly_availabilities,
        "mean_availability": sum(availabilities) / len(availabilities),
        "min_availability": min(availabilities),
        "max_availability": max(availabilities),
        "worst_month": min(monthly_availabilities, key=lambda m: m["availability_pct"]),
        "sla_recommendation": {
            "achievable_target": min(availabilities) * 0.9995,  # Conservative buffer
            "aggressive_target": min(availabilities) * 0.9990,  # Less conservative
        }
    }

# If your worst month was 99.7%, don't commit to 99.9%.
# Set the SLA target below your actual worst performance.

Common SLA Structures

Flat Availability SLA

The simplest structure — a single availability percentage with a credit schedule:

Service Level Agreement

Provider commits to 99.9% monthly availability for the Core API.
Availability is measured as: (total minutes - downtime minutes) / total minutes.

Credit Schedule:
- 99.0% - 99.9%: 10% credit of monthly fees
- 95.0% - 99.0%: 25% credit of monthly fees
- Below 95.0%:   50% credit of monthly fees

Maximum credit: 50% of monthly fees in any calendar month.
Credits are the sole remedy for availability failures.

Tiered SLA by Service Component

Different components have different reliability requirements:

Service Level Commitments

| Component | Monthly Availability | Max Downtime |
|---|---|---|
| Core API | 99.9% | 43 min/month |
| Dashboard | 99.5% | 3.6 hr/month |
| Reporting | 99.0% | 7.3 hr/month |
| Analytics | 98.0% | 14.4 hr/month |

Rationale: Core API impacts all user operations. Reporting is 
batch-nature and less time-critical. Separate SLAs reflect 
actual reliability characteristics of each component.

Response Time SLA

Add latency commitments alongside availability:

Latency Commitments

The API will respond to 99% of requests within 500ms (p99 latency).
The API will respond to 95% of requests within 200ms (p95 latency).

Measurement: Calculated from server-side timing, excluding client 
network latency. Measured in 5-minute rolling windows.

Credit for latency SLA breach: 5% of monthly fees for each calendar 
day where p99 latency exceeds 1000ms.

Key Definitions to Negotiate

The precise definitions in an SLA determine what counts as downtime:

Availability Definition

## Negotiating the Availability Definition

"Availability" can mean different things. Define clearly:

Option A (Provider-favorable):
"Availability means the API returns HTTP responses. Slow responses, 
partial failures, and error responses do not constitute unavailability."

Option B (Customer-favorable):
"Availability means the API returns successful (2xx) responses within 
2000ms. Error responses (5xx) and timeouts constitute unavailability."

Option C (Balanced):
"Availability means the error rate for API requests is below 5% 
as measured in any 5-minute window, AND p99 response time 
is below 2000ms."

Our recommendation: Use a threshold-based definition (error rate %)
rather than binary up/down. This better reflects real user experience
and is more measurable.

Measurement Source

## Who Measures Availability?

Option A: Provider measures (using their own internal monitoring)
Risk: Conflict of interest; provider can manipulate measurement

Option B: Customer measures (using their own monitoring)
Risk: Customer's measurement may not reflect actual service state;
may include customer-side network issues

Option C: Third-party measurement (external monitoring service)
Best practice: Use independent external monitoring as the reference.
Example: "Availability is measured by [AzMonitor/third party tool], 
using HTTP checks from 3+ geographic regions, with a 60-second 
check interval."

Exclusions

Downtime that doesn't count against the SLA:

## Standard SLA Exclusions

The following are typically excluded from SLA calculations:

1. Scheduled maintenance windows
   - Must be announced X hours in advance (typically 48-72 hours)
   - Should be limited to N hours/month (typically 4-8 hours)
   - Usually must occur during low-usage windows

2. Force majeure events
   - Natural disasters, government actions, etc.
   - Internet backbone/carrier failures outside provider control
   - AWS/GCP/Azure regional outages (if applicable)

3. Customer-caused issues
   - Customer-initiated DDoS
   - Excessive API usage beyond contract limits
   - Customer misconfiguration of their integration

4. Beta features
   - Features explicitly marked "beta" or "preview"
   - Typically excluded for 6-12 months after introduction

Negotiating tip: Customers should push back on broad exclusions
like "internet failures" without clear definitions. Outages due to
your infrastructure choices (e.g., single-region deployment) should
not be excused.

What Enterprise Customers Will Push For

Know what sophisticated procurement teams will negotiate:

| Customer Request | Provider Position | Compromise | |---|---|---| | 99.99% uptime | We deliver 99.9% consistently | 99.95% with carve-outs | | Unlimited SLA credits | Maximum 50% of monthly fee | Maximum 100% of monthly fee | | Right to terminate after 1 breach | 3 breaches in 12 months | 2 breaches in 12 months | | Consequential damages | Limited to fees paid | Fees paid + documented direct costs | | Real-time uptime data access | API access to monitoring | Dashboard access + monthly report | | Maintenance windows > 24hr notice | 48 hours | 72 hours for enterprise tier |

Structuring Credits to Align Incentives

Credit schedules should incentivize reliability, not just compensate for failures:

## Credit Schedule Design Principles

1. Make credits meaningful but not punishing
   - 10% credit for minor breach: shows seriousness
   - 50% maximum: maintains viability while compensating impact
   - Don't cap credits below 1 month of downtime value

2. Don't create moral hazard
   - "Once we breach, we might as well be completely down" thinking
   - Avoid: same credit for 1 hour and 24 hours of downtime
   - Prefer: tiered credits that increase with breach severity

3. Request-based tracking for API SLAs
   For APIs with variable usage:
   Credit = (failed_requests / total_requests) * monthly_fee * multiplier
   
   This directly correlates credits to actual impact.

4. Automatic application
   "Credits shall be applied automatically to the next invoice 
   without requiring customer request."
   This signals confidence in your reliability.

The Credit Request Process

Define how credits are claimed:

## SLA Credit Process

Standard (Provider-favorable):
Customer must request credits within 30 days of the month end.
Credits applied to following month's invoice.
Customer must provide evidence of downtime to claim credits.

Better (More balanced):
Provider automatically calculates and applies credits 
without requiring customer request.
Credits appear on the following month's invoice with explanation.

Best practice (Trust-building):
Provider proactively notifies customer when an SLA breach occurred
and the credit amount, before the customer requests it.
This is a competitive differentiator.

Maintenance Windows

Structure maintenance windows to minimize customer impact:

## Maintenance Window Best Practices

Standard window parameters:
- Maximum 4 hours of scheduled maintenance per month
- Minimum 72 hours advance notice for enterprise customers
- Maintenance windows during customer's lowest usage period
  (typically 2am-6am in customer's primary timezone)

Notification requirements:
- Email to customer's primary technical contact
- Status page announcement
- In-app notification for maintenance windows > 30 min

Emergency maintenance (unplanned):
- Maximum 4 hours per quarter outside standard windows
- 1-hour notice minimum
- Counts against SLA only if outage exceeds 4 hours

Negotiating tip: Push for customer opt-out or rescheduling rights 
for large enterprise accounts. If a customer has a critical business
event during your maintenance window, they should be able to request
a postponement.

Conclusion

SLA negotiation works best when you enter it with honesty about your capabilities, data supporting your commitments, and a structure that aligns your incentives with your customers' reliability needs. The specific percentages matter less than the measurement methodology, exclusion definitions, and credit processes — these determine whether the SLA is a meaningful commitment or a paper guarantee. AzMonitor provides the independent external monitoring that serves as an objective measurement source for SLA compliance, removing the conflict of interest inherent in self-reported uptime and giving both sides confidence in the accuracy of the availability data underlying the SLA.

Tags:SLA negotiationenterprise contractsuptime SLAservice levels

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

SLA Negotiation: Setting Realistic Availability Commitments You Can Actually Meet