gRPC powers some of the most performance-sensitive services in modern infrastructure. Google uses it internally for virtually everything. Netflix, Square, and Lyft rely on it for service-to-service communication. If you're running gRPC services in production, you need a monitoring strategy built for its unique characteristics — because the REST monitoring playbook doesn't apply here.
gRPC vs REST: Why Monitoring Differs
gRPC uses HTTP/2 as its transport layer and Protocol Buffers for serialization. This gives you bidirectional streaming, multiplexing, and extremely efficient binary encoding — but it also means:
- Traditional HTTP monitors can't easily inspect gRPC traffic
- Status codes are gRPC-specific, not HTTP status codes (though they map to HTTP/2 status codes)
- Binary payloads require schema knowledge to validate
- Streaming RPCs have different health semantics than request-response
Most infrastructure tools — load balancers, proxies, HTTP monitors — weren't designed with gRPC in mind. Proper monitoring requires intentional setup.
gRPC Status Codes
gRPC defines its own status codes. Understanding them is essential for writing meaningful monitors:
| Code | Name | Meaning | HTTP Equivalent | |---|---|---|---| | 0 | OK | Success | 200 | | 1 | CANCELLED | Client cancelled | 499 | | 2 | UNKNOWN | Unknown error | 500 | | 3 | INVALID_ARGUMENT | Bad request | 400 | | 4 | DEADLINE_EXCEEDED | Timeout | 504 | | 5 | NOT_FOUND | Resource not found | 404 | | 8 | RESOURCE_EXHAUSTED | Rate limit hit | 429 | | 13 | INTERNAL | Internal server error | 500 | | 14 | UNAVAILABLE | Service unavailable | 503 |
Alert thresholds should be set per status code. INVALID_ARGUMENT errors are usually client bugs; UNAVAILABLE errors are server-side problems that need immediate attention.
The gRPC Health Check Protocol
gRPC defines a standard health checking protocol you should implement in every service:
// health.proto (from grpc/grpc)
syntax = "proto3";
package grpc.health.v1;
message HealthCheckRequest {
string service = 1;
}
message HealthCheckResponse {
enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
SERVICE_UNKNOWN = 3;
}
ServingStatus status = 1;
}
service Health {
rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}
Implement this in your services:
// Go implementation
import "google.golang.org/grpc/health/grpc_health_v1"
type healthServer struct{}
func (s *healthServer) Check(
ctx context.Context,
req *grpc_health_v1.HealthCheckRequest,
) (*grpc_health_v1.HealthCheckResponse, error) {
// Check dependencies
if err := db.Ping(); err != nil {
return &grpc_health_v1.HealthCheckResponse{
Status: grpc_health_v1.HealthCheckResponse_NOT_SERVING,
}, nil
}
return &grpc_health_v1.HealthCheckResponse{
Status: grpc_health_v1.HealthCheckResponse_SERVING,
}, nil
}
Then monitor it with grpc-health-probe:
# Install grpc-health-probe
go install github.com/grpc-ecosystem/grpc-health-probe@latest
# Check a service
grpc-health-probe -addr=localhost:50051 -service=UserService
# Output: status: SERVING
# Exit code: 0 = healthy, 1 = unhealthy, 2 = connection failed
This can be integrated into Kubernetes liveness and readiness probes:
livenessProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:50051
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:50051
initialDelaySeconds: 5
periodSeconds: 10
Instrumenting gRPC Services with Interceptors
Interceptors (gRPC's equivalent of middleware) are the cleanest way to add observability without modifying every handler:
// Unary server interceptor for metrics
func metricsInterceptor(
ctx context.Context,
req interface{},
info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler,
) (interface{}, error) {
start := time.Now()
resp, err := handler(ctx, req)
duration := time.Since(start)
statusCode := status.Code(err)
// Record metrics
requestCounter.WithLabelValues(
info.FullMethod,
statusCode.String(),
).Inc()
requestDuration.WithLabelValues(
info.FullMethod,
).Observe(duration.Seconds())
return resp, err
}
// Register the interceptor
server := grpc.NewServer(
grpc.UnaryInterceptor(metricsInterceptor),
)
The go-grpc-prometheus library provides ready-made Prometheus metrics:
import grpc_prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
server := grpc.NewServer(
grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor),
grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor),
)
grpc_prometheus.Register(server)
// Expose metrics
http.Handle("/metrics", promhttp.Handler())
go http.ListenAndServe(":9090", nil)
Key Metrics to Monitor
Once instrumented, track these metrics for each service:
# Request rate (requests per second)
rate(grpc_server_handled_total[5m])
# Error rate by service and method
rate(grpc_server_handled_total{grpc_code!="OK"}[5m])
/ rate(grpc_server_handled_total[5m])
# p99 latency
histogram_quantile(0.99,
rate(grpc_server_handling_seconds_bucket[5m])
)
# In-flight requests
grpc_server_started_total - grpc_server_handled_total
Visualize these in a dashboard with alerts:
| Metric | Warning | Critical | |---|---|---| | Error rate | > 1% | > 5% | | p99 latency | > 500ms | > 2000ms | | Deadline exceeded rate | > 0.5% | > 2% | | In-flight requests | > 100 | > 500 |
Monitoring Streaming RPCs
Streaming RPCs — server streaming, client streaming, bidirectional — require extra attention because they hold connections open for extended periods.
Server-Side Streaming
// Example: streaming log entries
func (s *LogServer) StreamLogs(
req *pb.LogRequest,
stream pb.LogService_StreamLogsServer,
) error {
startTime := time.Now()
messageCount := 0
defer func() {
streamDuration.Observe(time.Since(startTime).Seconds())
streamMessageCount.Observe(float64(messageCount))
}()
for log := range s.logChan {
if err := stream.Send(log); err != nil {
streamErrors.Inc()
return err
}
messageCount++
// Check if client cancelled
if err := stream.Context().Err(); err != nil {
return err
}
}
return nil
}
Monitor streaming RPCs differently from unary calls:
- Stream duration — How long streams stay open
- Messages per stream — Total messages sent/received
- Stream error rate — Streams that end with non-OK status
- Concurrent streams — Number of active streaming connections
Testing gRPC with grpcurl
For ad-hoc testing and monitoring integration, grpcurl lets you interact with gRPC services without writing code:
# List services (requires server reflection)
grpcurl -plaintext localhost:50051 list
# List methods of a service
grpcurl -plaintext localhost:50051 list UserService
# Call a method
grpcurl -plaintext \
-d '{"user_id": "monitor-test-123"}' \
localhost:50051 \
UserService/GetUser
# With TLS
grpcurl \
-d '{"user_id": "monitor-test-123"}' \
api.example.com:443 \
UserService/GetUser
You can wrap grpcurl calls in shell scripts for basic health monitoring:
#!/bin/bash
RESULT=$(grpcurl -plaintext \
-d '{"service": "UserService"}' \
localhost:50051 \
grpc.health.v1.Health/Check 2>&1)
if echo "$RESULT" | grep -q "SERVING"; then
echo "HEALTHY"
exit 0
else
echo "UNHEALTHY: $RESULT"
exit 1
fi
Distributed Tracing for gRPC
For multi-service gRPC architectures, distributed tracing is essential. OpenTelemetry provides gRPC instrumentation:
import (
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
)
// Server with tracing
server := grpc.NewServer(
grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),
grpc.StreamInterceptor(otelgrpc.StreamServerInterceptor()),
)
// Client with tracing (propagates trace context)
conn, err := grpc.Dial(
"service:50051",
grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
grpc.WithStreamInterceptor(otelgrpc.StreamClientInterceptor()),
)
With tracing enabled, a slow gRPC call becomes immediately diagnosable — you can see which service in the call chain introduced the latency.
Load Balancer and Service Mesh Considerations
gRPC over HTTP/2 doesn't work well with traditional L4 load balancers. If all requests go to one server because the TCP connection stays open, monitoring will show healthy averages while individual servers are overloaded.
Proper gRPC load balancing requires:
- L7-aware load balancers (Envoy, NGINX with grpc_pass, AWS ALB)
- Client-side load balancing
- Service mesh (Istio, Linkerd)
Monitor load distribution across instances:
# Check if load is evenly distributed
stddev(rate(grpc_server_handled_total[5m])) by (instance)
/ avg(rate(grpc_server_handled_total[5m]))
A high coefficient of variation (> 0.3) indicates uneven load distribution.
Conclusion
gRPC monitoring requires understanding its unique characteristics: binary protocols, streaming RPCs, custom status codes, and HTTP/2 multiplexing. Start with the standard health check protocol, add interceptor-based instrumentation, and build dashboards around error rate and latency percentiles broken down by method. AzMonitor can monitor gRPC health check endpoints and integrate with your existing observability stack to give you unified visibility across REST, GraphQL, and gRPC services.
3 monitors free forever · No credit card needed · Set up in 2 minutes
Start monitoring free →