Graceful shutdown for HTTP services: signals, draining, and Kubernetes

Stop Node and containerized APIs without 502 spikes: SIGTERM semantics, draining in-flight requests, readiness vs liveness, and background job coordination.

Autor: Matheus Palma11 de abril de 20267 min de lectura

Software engineeringBackendNode.jsKubernetesReliabilityDevOps

Your load balancer starts routing traffic away from pod checkout-api-7f9c because a new ReplicaSet is rolling out. For thirty seconds, that pod still has open TCP connections and in-flight POST /orders calls. If the process exits the moment it receives SIGTERM, those requests fail mid-flight; clients see errors even though the replacement pod is healthy. In consulting and product work, this shows up as mysterious error spikes during every deploy—often blamed on “the cloud” when the real issue is shutdown discipline.

Graceful shutdown means finishing work you already accepted (or explicitly rejecting new work) before the process exits, within the time the platform gives you. This article breaks down why platforms send SIGTERM, how to drain HTTP servers in Node.js, how readiness probes interact with rollouts, and how to extend the same ideas to queues and workers.

Why shutdown is a first-class production concern

Modern runtimes assume short-lived processes: containers are replaced on deploy, autoscaled away, or evicted when nodes are drained. The orchestrator does not “wait until idle”; it sends a signal and starts a timer. If your app ignores that contract, you get:

Client-visible errors on connection reset or truncated responses during deploys
Duplicate or orphaned side effects if shutdown races with async work (e.g. a payment call completes after you already returned 503 and the client retried)
Stuck jobs if workers stop without releasing locks or acknowledging messages correctly

The fix is not to run forever. It is to define a shutdown sequence: stop accepting new load, bound the time you wait for in-flight work, then exit with a clear code so the platform can replace you.

Signals and platform timeouts

On Linux, PID 1 in a container typically receives SIGTERM first. That is the polite “please exit” signal. If the process does not terminate within a grace period (for example terminationGracePeriodSeconds in Kubernetes, often 30 seconds by default), the kernel may follow with SIGKILL, which cannot be caught. Your graceful logic must complete before that wall clock expires—including flushing logs and closing databases if your drivers require it.

SIGINT (Ctrl+C locally) behaves similarly for local development; handling both keeps dev and prod aligned.

Design implications:

Never block signal handlers indefinitely. Node’s default for SIGTERM is to exit; once you install a handler, you own the exit decision.
Know your grace budget. If draining takes longer than the platform allows, you need load-shedding earlier (readiness) or smaller units of work—not an infinite wait.

HTTP: stop listening, then drain

The usual pattern for an HTTP server has three phases:

Readiness down — tell load balancers and orchestrators to stop sending new requests (see next section).
Close the listening socket — server.close() in Node stops accepting new connections; existing keep-alive connections may still send requests until you end them.
Wait for in-flight requests — track active requests or use server.close()’s callback semantics combined with explicit timeouts.

Node’s http.Server#close waits for connections to finish by default, but idle keep-alive connections can delay shutdown unless you track server.on('connection', ...) and destroy sockets after the listening socket is closed, or use server.closeAllConnections() (Node 18.2+) once you have stopped accepting new work.

Readiness versus liveness

Liveness answers: “Should this instance be restarted?” If it fails, the platform may kill the pod.

Readiness answers: “Should traffic be sent here?” If it fails, endpoints are removed from the service while the process often keeps running.

During shutdown, you typically:

Flip readiness to not ready immediately on SIGTERM (or slightly before, if you use a preStop hook that sleeps—see pitfalls).
Keep liveness passing until you are actually stuck, so you are not killed mid-drain unless you exceed grace.

Kubernetes sends SIGTERM after the endpoint is removed from Services for many configurations, but timing is not something to rely on blindly. The safe approach is: readiness off, then drain, then exit. PreStop hooks can add a short sleep to allow the control plane to propagate endpoint updates before termination begins; tune this against your load balancer’s behavior.

Trade-offs and limitations

Bounded wait. You cannot guarantee every request completes if clients hang or if work is unbounded. A shutdown timeout (e.g. 25 seconds when grace is 30) forces exit after logging how many requests were aborted. Document this for API consumers: retries with backoff and idempotency remain essential.

WebSockets and long polling. These tie up connections for minutes. Either use shorter platform grace, migrate sessions to other instances (sticky sessions + cooperative close), or accept hard cutoffs with client-side reconnect.

Separate admin and workload ports. Some teams serve metrics or health on a different port and keep the readiness check on the main app port so draining semantics stay clear. What matters is consistency: readiness must reflect “safe to receive traffic.”

Database and external calls. Draining HTTP does not wait for a slow fetch to a third party unless you track those operations. Per-request cancellation (AbortSignal) lets you fail fast during shutdown instead of hanging until SIGKILL.

Practical example: minimal Node.js HTTP server with draining

The following sketch shows readiness state, signal handling, and a shutdown timeout. Adapt to your framework: Express wraps the same http.Server; Fastify exposes app.close().

import http from "node:http";

let acceptingTraffic = true;
let activeRequests = 0;

const server = http.createServer((req, res) => {
  activeRequests += 1;
  res.on("close", () => {
    activeRequests -= 1;
  });

  if (req.url === "/health/live") {
    res.writeHead(200).end("ok");
    return;
  }
  if (req.url === "/health/ready") {
    if (!acceptingTraffic) {
      res.writeHead(503).end("shutting down");
      return;
    }
    res.writeHead(200).end("ok");
    return;
  }

  // Simulate handler work
  setTimeout(() => {
    res.writeHead(200).end("done");
  }, 100);
});

const GRACE_MS = 25_000;

function shutdown(signal: string) {
  console.log(`received ${signal}, draining...`);
  acceptingTraffic = false;

  server.close((err) => {
    if (err) console.error("server.close error", err);
  });

  const deadline = setTimeout(() => {
    console.error(`shutdown timeout after ${GRACE_MS}ms, active=${activeRequests}, exiting`);
    process.exit(1);
  }, GRACE_MS);

  const check = setInterval(() => {
    if (activeRequests === 0 && !server.listening) {
      clearInterval(check);
      clearTimeout(deadline);
      console.log("clean shutdown");
      process.exit(0);
    }
  }, 50);
}

server.listen(3000, () => console.log("listening on :3000"));
process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT", () => shutdown("SIGINT"));

In production you would replace manual counters with your framework’s hooks, add structured logging, and use closeAllConnections() after the grace period’s warning phase if idle keep-alives block exit.

Workers and message consumers

For bullmq, SQS, or similar: on SIGTERM, stop polling, wait for in-flight message handlers (with a cap), then ack/nack according to your poison-policy. If you nack and visibility timeout is short, another consumer can pick up the message—often better than SIGKILL mid-handler. For database-backed job rows, use a lease or “claimed_by” column so abandoned jobs become eligible again after timeout.

The same readiness flip idea applies: mark the worker not ready so orchestrators do not route HTTP to it while it still drains queue tasks, if your process does both.

Common mistakes and pitfalls

Only handling SIGINT in development. Production containers send SIGTERM; if you never register for it, you rely on defaults and may exit before draining.

Readiness still true while draining. Traffic keeps arriving until endpoints update, shrinking your effective grace window.

preStop sleep without tuning. A blind sleep 15 can help or hurt depending on propagation latency; measure with your cloud provider and adjust.

Ignoring keep-alive. server.close() alone may not exit quickly if clients hold idle connections open.

Draining without cancellation. Long external calls block shutdown until SIGKILL; use AbortSignal linked to shutdown.

Exiting zero on timeout after force-close. If you had to abort in-flight work, prefer a non-zero exit or explicit metric so deploy pipelines can alert.

Conclusion

Graceful shutdown connects process lifecycle to user-perceived reliability: signals tell you when to stop taking new work; readiness tells the platform when to stop sending it; draining and timeouts bound how long you wait. Getting this right is standard work when building scalable, production-ready APIs and background processors—whether on Kubernetes, ECS, or a single VM behind a reverse proxy.

Key takeaways:

Treat SIGTERM as a contract with a finite grace period, not a suggestion
Drop readiness early, then close the listener and wait with a hard timeout
Align HTTP draining with queue and job semantics so work is neither lost nor double-processed carelessly

If you are evolving deploy semantics or service boundaries for a team shipping critical paths, the contact page is the right place to reach out. For background on how reliability patterns fit together with APIs and events, see the about page for the broader engineering focus.

Recibe un correo cuando se publiquen artículos nuevos. Sin spam — solo entradas nuevas de este blog.

Con Resend. Puedes darte de baja en cualquier correo.