GraphQL solves real problems — flexible queries, fewer round trips, self-documenting schemas — but it introduces monitoring challenges that traditional REST tooling wasn't built to handle. With REST, every endpoint is a separate URL you can check independently. With GraphQL, everything goes through a single endpoint, and the complexity lives in the request body. This fundamentally changes how you approach monitoring.

Why GraphQL Monitoring Is Different

A REST monitoring check is simple: send a GET to /api/users, expect a 200. GraphQL isn't like that. All queries go to POST /graphql, which means:

You can't infer what's being queried from the URL alone
A 200 response can still contain errors in the response body
One slow resolver poisons an entire query
N+1 query problems can hide behind a healthy-looking endpoint

Your monitoring strategy has to account for all of these.

The GraphQL Error Model

GraphQL responses always return HTTP 200, even when something went wrong. Errors live in the response body:

{
  "data": {
    "user": null
  },
  "errors": [
    {
      "message": "User not found",
      "locations": [{"line": 2, "column": 3}],
      "path": ["user"],
      "extensions": {
        "code": "USER_NOT_FOUND"
      }
    }
  ]
}

This means a traditional HTTP status check will show green while your users are actually getting errors. Your monitoring must parse the response body and check for the errors field.

Writing GraphQL Assertions

monitor:
  name: "GraphQL - User Profile Query"
  url: "https://api.example.com/graphql"
  method: POST
  headers:
    Content-Type: "application/json"
    Authorization: "Bearer ${MONITORING_TOKEN}"
  body: |
    {
      "query": "query MonitorCheck { user(id: \"monitor-test-user\") { id name email } }"
    }
  assertions:
    - type: status_code
      value: 200
    - type: json_body
      path: "$.errors"
      operator: not_exists
    - type: json_body
      path: "$.data.user.id"
      operator: exists
    - type: response_time
      operator: less_than
      value: 1500

The key assertion here is checking that $.errors does not exist. If it does, the monitor fails even though the HTTP status was 200.

Monitoring Individual Resolvers

The real performance challenge with GraphQL is resolver-level monitoring. A query that fetches a user with their posts and comments might call 3+ database queries. If any resolver is slow, the whole query suffers.

Using Apollo Studio / GraphOS

If you're using Apollo Server, Apollo Studio provides resolver-level tracing out of the box. Enable it with:

// Apollo Server setup
const server = new ApolloServer({
  typeDefs,
  resolvers,
  plugins: [
    ApolloServerPluginUsageReporting({
      sendVariableValues: { none: true }, // Don't send PII
      sendHeaders: { onlyNames: ["x-request-id"] },
    }),
  ],
});

This gives you per-resolver latency breakdowns, which you can use to identify slow resolvers before they become incidents.

Custom Resolver Timing

If you're not using Apollo, add resolver timing manually:

// Resolver wrapper for timing
function withTiming(name, resolver) {
  return async (parent, args, context, info) => {
    const start = Date.now();
    try {
      const result = await resolver(parent, args, context, info);
      const duration = Date.now() - start;
      metrics.histogram('resolver.duration', duration, { resolver: name });
      return result;
    } catch (error) {
      const duration = Date.now() - start;
      metrics.increment('resolver.error', { resolver: name });
      throw error;
    }
  };
}

// Usage
const resolvers = {
  Query: {
    user: withTiming('Query.user', async (_, { id }, { db }) => {
      return db.users.findById(id);
    }),
  },
};

Detecting the N+1 Problem

The N+1 query problem is GraphQL's most notorious performance killer. When you fetch a list of users and each user resolver makes a separate database call for their posts, you get N+1 queries for N users.

Signs of N+1 in monitoring:

| Users Returned | Expected DB Queries | Actual DB Queries (N+1) | |---|---|---| | 10 | 1 | 11 | | 50 | 1 | 51 | | 100 | 1 | 101 |

If response time scales linearly with result set size, you likely have N+1. Set up monitors that vary the query complexity:

# Small query - baseline
query SmallCheck {
  users(limit: 5) { id name posts { id title } }
}

# Medium query - should not be 10x slower
query MediumCheck {
  users(limit: 50) { id name posts { id title } }
}

If the medium query takes 10x longer than the small query, you have N+1.

Schema Change Monitoring

GraphQL schemas evolve, and breaking changes can silently break clients. Monitor for schema changes as part of your CI pipeline:

# Using graphql-inspector
npx @graphql-inspector/cli diff \
  "https://api.example.com/graphql" \
  "./schema/current.graphql"

# Output example:
# ✖ Field 'user.email' changed type from 'String' to 'String!'
# ✖ Field 'user.avatar' was removed
# ✓ Field 'user.profilePicture' was added

Integrate this into your deployment pipeline so breaking changes are caught before they reach production:

# GitHub Actions example
- name: Check for GraphQL breaking changes
  run: |
    npx @graphql-inspector/cli diff \
      "https://staging-api.example.com/graphql" \
      "https://api.example.com/graphql" \
      --rule suppressRemovalOfDeprecatedField

Query Complexity and Depth Limits

Expensive queries — deeply nested, requesting large collections — can overwhelm your server. Monitor query complexity limits to ensure they're working correctly:

// GraphQL server with complexity limits
const server = new ApolloServer({
  validationRules: [
    depthLimit(7),
    createComplexityRule({
      maximumComplexity: 1000,
      onComplete: (complexity) => {
        console.log('Query complexity:', complexity);
        metrics.histogram('graphql.query.complexity', complexity);
      },
    }),
  ],
});

Your monitoring should include a query that approaches — but doesn't exceed — your complexity limit to verify the protection is working.

Subscription Monitoring

GraphQL subscriptions add WebSocket connections to the mix. Monitoring subscriptions requires:

Establishing a WebSocket connection
Sending a subscription message
Verifying you receive events within a timeout window

// Subscription health check
async function checkSubscriptionHealth(wsUrl, authToken) {
  return new Promise((resolve, reject) => {
    const ws = new WebSocket(wsUrl, 'graphql-ws');
    const timeout = setTimeout(() => {
      ws.close();
      reject(new Error('Subscription timeout'));
    }, 10000);
    
    ws.on('open', () => {
      ws.send(JSON.stringify({
        type: 'connection_init',
        payload: { authorization: authToken }
      }));
    });
    
    ws.on('message', (data) => {
      const msg = JSON.parse(data);
      if (msg.type === 'connection_ack') {
        clearTimeout(timeout);
        ws.close();
        resolve({ healthy: true, latency: Date.now() - startTime });
      }
    });
  });
}

Monitoring Federated GraphQL

If you're using Apollo Federation or GraphQL Federation, you need to monitor both the gateway and individual subgraphs:

| Layer | What to Monitor | Alert Threshold | |---|---|---| | Gateway | Query routing latency | > 50ms overhead | | Auth subgraph | Token validation time | > 100ms | | User subgraph | Entity resolution time | > 200ms | | Products subgraph | Catalog query time | > 300ms | | Gateway | Overall query latency | > 1000ms p95 |

If the gateway is slow but subgraphs are fast, the issue is in query planning. If a subgraph is slow, it's an application or database issue.

Setting Up Effective Alerts

GraphQL-specific alert conditions to configure:

alerts:
  - name: "GraphQL Error Rate High"
    condition: "graphql_error_rate > 2%"
    window: "5 minutes"
    severity: critical
    
  - name: "Slow Resolver Detected"
    condition: "resolver_p95_latency{resolver='Query.products'} > 500ms"
    window: "10 minutes"
    severity: warning
    
  - name: "Query Complexity Spike"
    condition: "avg_query_complexity > 800"
    window: "5 minutes"
    severity: warning
    
  - name: "Schema Validation Errors"
    condition: "graphql_validation_errors > 0"
    window: "1 minute"
    severity: critical

Practical Monitoring Checklist

Before considering your GraphQL API properly monitored, verify:

[ ] Response body is parsed and errors field is checked
[ ] At least one query per major entity type is monitored
[ ] Resolver latency is tracked via APM or custom instrumentation
[ ] N+1 detection queries are running at different result set sizes
[ ] Schema change detection is in CI pipeline
[ ] Query complexity limits are tested
[ ] Subscription health is checked if subscriptions are in use
[ ] Federation subgraphs are monitored independently
[ ] Error rate alerting accounts for the GraphQL 200-always model

Conclusion

GraphQL monitoring requires reading past the HTTP status code and understanding what's actually happening inside your queries. The teams that invest in proper GraphQL observability — resolver tracing, error body parsing, complexity monitoring, and schema change detection — catch problems in minutes rather than hours. AzMonitor supports custom request bodies and response assertions that work well with GraphQL's single-endpoint model, letting you build comprehensive query health checks without specialized tooling.

Tags:GraphQLAPI monitoringresolver latencyquery performance

Back to blog

AzMonitor Team

The AzMonitor team writes guides based on experience monitoring millions of endpoints daily across 10,000+ customer environments. Our expertise covers uptime monitoring, SRE practices, and reliability engineering.

Try AzMonitor free

3 monitors free forever · No credit card needed · Set up in 2 minutes

Start monitoring free →

GraphQL Monitoring: How to Monitor GraphQL APIs Effectively