Time to First Byte (TTFB) is the starting gun for every page load. Every other performance metric — FCP, LCP, TTI — is constrained by TTFB. If your server takes 2 seconds to respond, your LCP can't be under 2 seconds regardless of how well you optimize everything else. Reducing TTFB is often the highest-ROI performance optimization available.

What TTFB Measures

TTFB = Time from browser sending HTTP request → first byte of response received

It includes:

DNS lookup time
TCP connection time
SSL/TLS handshake time
Server processing time (the part you control most)
Time for first response byte to travel from server to client

TTFB thresholds:

< 200ms: Excellent (server processing is fast)
200-800ms: Acceptable (room for improvement)
800ms+: Needs investigation (significant optimization opportunity)

Google's "Good" threshold for TTFB is 800ms, but the best-performing sites achieve 100-200ms consistently.

Diagnosing High TTFB

Before optimizing, identify where the time is being spent:

Total TTFB: 850ms
├── DNS lookup: 45ms        → Cache at CDN
├── TCP connection: 80ms    → Geographic distance issue
├── SSL handshake: 120ms    → Session resumption, TLS 1.3
└── Server processing: 605ms → Primary optimization target

Use Chrome DevTools → Network tab → click any request → Timing panel to see this breakdown.

Tools for TTFB measurement:

Chrome DevTools (local testing)
WebPageTest (from multiple global locations)
AzMonitor performance monitoring (continuous TTFB tracking)
curl for quick server-side measurement: curl -o /dev/null -s -w "TTFB: %{time_starttransfer}s\n" https://yoursite.com

The Biggest TTFB Killers

Slow Database Queries

Database query time is the single most common cause of high TTFB for dynamic applications. A page that requires 15 database queries averaging 40ms each will have 600ms+ of TTFB from database time alone.

Diagnostic query:

-- PostgreSQL: Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Solutions:

Add indexes for columns used in WHERE clauses and JOINs
Rewrite N+1 queries as JOINs or batch loads
Cache frequently-read data in Redis/Memcached
Use query result caching at the application layer

Synchronous External API Calls in Request Path

If your page rendering requires calling an external API (payment processor status, third-party data), and that call takes 300ms, your TTFB is at least 300ms.

Solutions:

Cache external API responses (even with short TTLs like 30-60 seconds)
Move external API calls to background jobs where possible
Use async/parallel requests for multiple external calls
Implement circuit breakers to fail fast when APIs are slow

Missing or Misconfigured Caching

A page that generates the same HTML for every anonymous visitor but regenerates it on every request is wasting server resources.

Server-side caching options:

Page cache: Full HTML cached (Nginx, Varnish, CDN)
  └─ Best for: Largely static pages (blogs, marketing)

Fragment cache: Cache expensive template fragments
  └─ Best for: Dynamic pages with some shared elements

Object cache: Cache database query results
  └─ Best for: Complex queries used across multiple requests

HTTP cache: Cache-Control headers for browsers and CDNs
  └─ Best for: All pages with appropriate TTLs

Not Using a CDN

If all your users hit your origin server (wherever it's hosted), users geographically distant from that server experience high network latency. A server in Virginia serving a user in Singapore adds 200-300ms of network latency to TTFB.

A CDN with a Singapore edge node reduces that to 10-20ms for cached responses, and routes dynamic requests through optimized network paths.

Specific TTFB Optimization Techniques

1. Enable HTTP/2 and HTTP/3

HTTP/2 multiplexes multiple requests over a single connection, reducing connection overhead. HTTP/3 (QUIC protocol) further reduces connection establishment time, especially on mobile networks.

# Nginx: Enable HTTP/2
server {
    listen 443 ssl http2;
    # ... rest of config
}

2. Implement TLS Session Resumption

SSL handshakes for new connections take 1-2 round trips. Session resumption (TLS session tickets or session IDs) allows subsequent connections to skip most of the handshake.

# Nginx: TLS optimization
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets on;

3. Enable Server-Side Caching

For dynamic pages, implement output caching that serves pre-generated HTML:

# Django example: Cache entire view output
from django.views.decorators.cache import cache_page

@cache_page(60 * 15)  # Cache for 15 minutes
def my_view(request):
    # Expensive computation or database queries
    return render(request, 'template.html', context)

4. Optimize Database Connection Pooling

Opening a new database connection for every request adds significant overhead. Connection pooling reuses established connections:

# SQLAlchemy connection pool configuration
engine = create_engine(
    DATABASE_URL,
    pool_size=20,          # Number of persistent connections
    max_overflow=30,        # Extra connections when pool is full
    pool_pre_ping=True,     # Check connection health before use
    pool_recycle=3600       # Recycle connections every hour
)

5. Implement Early Hints (HTTP 103)

HTTP 103 Early Hints allows the server to send Link headers with preload directives before the full response is ready, letting the browser start loading critical resources while the server finishes generating the page.

HTTP/1.1 103 Early Hints
Link: </css/app.css>; rel=preload; as=style
Link: </fonts/main.woff2>; rel=preload; as=font; crossorigin

HTTP/1.1 200 OK
Content-Type: text/html
[page content]

6. Streaming HTML Responses

Instead of buffering the entire response before sending, stream HTML as it's generated. The browser can start parsing and loading resources while the server finishes generating the rest of the page.

// React 18 streaming SSR
import { renderToPipeableStream } from 'react-dom/server';

const { pipe } = renderToPipeableStream(<App />, {
  onShellReady() {
    res.setHeader('Content-type', 'text/html');
    pipe(res);
  },
});

Monitoring TTFB Continuously

TTFB regressions are common after deployments — a new slow query, an added external API call, or a caching configuration change can double TTFB overnight.

AzMonitor tracks TTFB as part of performance monitoring and alerts when TTFB increases significantly from baseline. Configure alerts for:

TTFB > 800ms (absolute threshold)
TTFB 50% worse than 7-day average (regression detection)

Set up TTFB monitoring with AzMonitor and catch regressions immediately. See also LCP monitoring — since TTFB improvements cascade directly into LCP improvements.

Tags:TTFBTime to First Byteperformance optimizationserver response time

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

TTFB Optimization: Reducing Time to First Byte Below 200ms