Authentication failures are uniquely disruptive. When your API is slow, users complain. When authentication breaks, users are completely locked out. A broken login endpoint, an expired OAuth token, or a misconfigured JWT validation can lock everyone out simultaneously — and because auth failures often look like user errors at first, they can go undetected longer than other outages.
Why Auth Monitoring Is Harder Than Regular API Monitoring
Standard API monitoring checks: is the endpoint responding? Auth monitoring has to go further:
- Is the authentication server reachable?
- Are tokens being issued correctly?
- Is token validation working end-to-end?
- Are refresh flows functioning?
- Have certificates or signing keys expired?
Each of these can fail independently. Your API might respond fine to requests with valid tokens while completely failing to issue new tokens — so existing users stay logged in while new users or expired sessions can't authenticate.
Monitoring OAuth2 Flows
OAuth2 is the most common auth protocol for modern APIs. The token endpoint is your most critical dependency:
monitor:
name: "OAuth2 Token Endpoint"
url: "https://auth.example.com/oauth/token"
method: POST
interval: 60
headers:
Content-Type: "application/x-www-form-urlencoded"
body: "grant_type=client_credentials&client_id=${MONITOR_CLIENT_ID}&client_secret=${MONITOR_CLIENT_SECRET}&scope=monitoring:health"
assertions:
- type: status_code
value: 200
- type: json_body
path: "$.access_token"
operator: exists
- type: json_body
path: "$.token_type"
value: "Bearer"
- type: json_body
path: "$.expires_in"
operator: greater_than
value: 0
- type: response_time
operator: less_than
value: 2000
Create a dedicated monitoring OAuth2 client with minimal scopes — just enough to issue tokens and verify the flow works, without broad access to your data.
Multi-Step OAuth2 Monitoring
For authorization code flows, you need to simulate the full flow:
// Automated OAuth2 flow test
async function testOAuthFlow(config) {
const steps = {};
// Step 1: Get authorization URL
const authUrl = `${config.authServer}/authorize?` +
`client_id=${config.clientId}&` +
`redirect_uri=${config.redirectUri}&` +
`response_type=code&` +
`scope=openid profile&` +
`state=${generateState()}`;
steps.authUrlGenerated = { status: 'ok', url: authUrl };
// Step 2: Exchange code for token (using test credentials)
const tokenResponse = await fetch(`${config.authServer}/oauth/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'authorization_code',
code: config.testCode,
redirect_uri: config.redirectUri,
client_id: config.clientId,
client_secret: config.clientSecret,
}),
});
const tokenData = await tokenResponse.json();
steps.tokenIssued = {
status: tokenResponse.ok ? 'ok' : 'error',
hasAccessToken: !!tokenData.access_token,
hasRefreshToken: !!tokenData.refresh_token,
};
// Step 3: Validate the token
const validateResponse = await fetch(`${config.apiBase}/user/me`, {
headers: { 'Authorization': `Bearer ${tokenData.access_token}` },
});
steps.tokenValidated = {
status: validateResponse.ok ? 'ok' : 'error',
statusCode: validateResponse.status,
};
// Step 4: Test refresh flow
const refreshResponse = await fetch(`${config.authServer}/oauth/token`, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'refresh_token',
refresh_token: tokenData.refresh_token,
client_id: config.clientId,
client_secret: config.clientSecret,
}),
});
const refreshData = await refreshResponse.json();
steps.refreshWorked = {
status: refreshResponse.ok ? 'ok' : 'error',
hasNewAccessToken: !!refreshData.access_token,
};
return steps;
}
JWT Monitoring
JWTs can fail in several ways that monitoring should catch:
| Failure Mode | Symptom | Detection Method |
|---|---|---|
| Expired signing key | All new tokens invalid | Verify newly issued tokens |
| Key rotation mismatch | Old tokens rejected early | Check token exp claim |
| Algorithm mismatch | Signature validation fails | Test full issue+validate cycle |
| Claim missing | Authorization failures | Parse and validate JWT claims |
| Clock skew | Valid tokens rejected | Check nbf and iat claims |
JWT Validation Monitor
import jwt
import requests
import time
from datetime import datetime
def monitor_jwt_health(token_endpoint, validation_endpoint, credentials):
"""
End-to-end JWT health check:
1. Issue a new token
2. Decode and validate claims
3. Use token to hit a protected endpoint
"""
results = {}
# Issue token
token_response = requests.post(
token_endpoint,
data=credentials,
timeout=5
)
if token_response.status_code != 200:
return {"healthy": False, "step": "token_issuance",
"status_code": token_response.status_code}
token_data = token_response.json()
access_token = token_data.get("access_token")
if not access_token:
return {"healthy": False, "step": "token_issuance",
"error": "No access_token in response"}
# Decode without verification to check claims
try:
claims = jwt.decode(access_token, options={"verify_signature": False})
exp = claims.get("exp", 0)
iat = claims.get("iat", 0)
if exp < time.time():
return {"healthy": False, "step": "claim_validation",
"error": "Token already expired"}
results["expires_in"] = exp - time.time()
results["issued_at"] = datetime.fromtimestamp(iat).isoformat()
except jwt.DecodeError as e:
return {"healthy": False, "step": "token_decode", "error": str(e)}
# Use token against protected endpoint
protected_response = requests.get(
validation_endpoint,
headers={"Authorization": f"Bearer {access_token}"},
timeout=5
)
results["validation_status"] = protected_response.status_code
results["healthy"] = protected_response.status_code == 200
return results
API Key Monitoring
API keys are simpler than OAuth2 but still need monitoring:
# Monitor API key authentication
monitor:
name: "API Key Authentication Check"
url: "https://api.example.com/v1/account"
method: GET
headers:
X-API-Key: "${MONITORING_API_KEY}"
assertions:
- type: status_code
value: 200
- type: json_body
path: "$.account.status"
value: "active"
API key monitors should use keys with:
- Read-only permissions
- No access to sensitive data
- Dedicated to monitoring (so you can rotate them independently)
- Longer expiration than normal user keys
Monitoring API Key Expiration
Many APIs issue API keys with expiration dates. Build a check that alerts before keys expire:
def check_api_key_expiration(api_keys_config):
"""Alert if any monitoring API key expires within 30 days"""
alerts = []
for service, config in api_keys_config.items():
response = requests.get(
config['key_info_endpoint'],
headers={'Authorization': f'Bearer {config["admin_token"]}'}
)
key_info = response.json()
expiry = datetime.fromisoformat(key_info['expires_at'])
days_until_expiry = (expiry - datetime.now()).days
if days_until_expiry < 30:
alerts.append({
'service': service,
'key_id': key_info['id'],
'expires_in_days': days_until_expiry,
'severity': 'critical' if days_until_expiry < 7 else 'warning'
})
return alerts
SAML and SSO Monitoring
Enterprise applications often use SAML for single sign-on. Monitoring SAML flows is more complex because they're browser-redirect-based, but you can monitor critical components:
# Monitor IdP metadata endpoint
monitor:
name: "SAML IdP Metadata"
url: "https://idp.example.com/metadata"
method: GET
assertions:
- type: status_code
value: 200
- type: response_contains
value: "EntityDescriptor"
- type: response_time
operator: less_than
value: 2000
# Monitor SP metadata endpoint
monitor:
name: "SAML SP Metadata"
url: "https://app.example.com/auth/saml/metadata"
method: GET
assertions:
- type: status_code
value: 200
- type: response_contains
value: "AssertionConsumerService"
Also monitor your SAML certificate expiration — a common cause of sudden SSO outages:
# Check SAML certificate expiry
openssl x509 -in saml-sp.crt -noout -enddate
# Output: notAfter=Mar 15 12:00:00 2026 GMT
# Set an alert 90 days before this date
Monitoring Auth Provider Dependencies
If you use Auth0, Okta, Cognito, or another identity provider, monitor their status and your dependency on them:
| Dependency | What to Monitor | Alert Threshold | |---|---|---| | Auth0 | Token endpoint latency | > 500ms | | Auth0 | Status page | Any degradation | | Okta | Authorization server availability | Any downtime | | Cognito | Token issuance latency | > 1000ms | | Your JWKS endpoint | Key set availability | Any downtime |
Track auth latency separately from API latency so you know when auth is contributing to overall request time:
Total Request Time = Auth Latency + Application Logic + Database
If auth latency is 400ms and your SLA is 500ms, you have only 100ms for everything else.
Auth Failure Alerting
Configure alerts specifically for authentication failures:
alerts:
- name: "Auth Token Endpoint Down"
condition: "oauth_token_endpoint_available = false"
severity: critical
message: "Authentication service is unavailable - users cannot log in"
- name: "High Auth Failure Rate"
condition: "auth_failure_rate > 5% for 5 minutes"
severity: critical
message: "Authentication is failing for more than 5% of attempts"
- name: "JWT Validation Failures"
condition: "jwt_validation_errors > 10 per minute"
severity: warning
message: "JWT validation errors spiking - possible key rotation issue"
- name: "Refresh Token Failure"
condition: "refresh_token_failure_rate > 2%"
severity: warning
message: "Refresh token flow is failing - users will be logged out"
Conclusion
Authentication is the gateway to your entire service. An outage in auth doesn't just degrade functionality — it locks users out completely. Comprehensive auth monitoring covers the full authentication lifecycle: token issuance, validation, refresh, and expiration. It monitors your dependencies on external identity providers and alerts early enough that you can fix issues before they cause widespread lockout. AzMonitor makes it straightforward to build multi-step authentication health checks that catch these failures before your users do.
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →