WebSocket production patterns: heartbeats, backpressure, and graceful shutdown in Node.js

Ship bidirectional real-time APIs that survive proxies, load balancers, and traffic spikes. Heartbeats, ping/pong, write buffering, auth at upgrade time, and Kubernetes-friendly shutdown.

Author: Matheus PalmaMay 15, 20268 min read

Software engineeringBackendNode.jsWebSocketsKubernetesAPI design

You ship a dashboard that updates live as orders change. The first version uses long polling; it works until your busiest customer pins ten tabs and your API servers spend half their time holding idle HTTP requests. You upgrade the route to WebSockets—and a week later, support tickets mention “random disconnects,” ghost sessions after deploys, and one tenant’s browser tab freezing the whole tab group. None of that is mysterious once you treat a WebSocket as a long-lived, stateful TCP session with its own failure modes, not “HTTP but faster.”

This article walks through why proxies and browsers behave the way they do, how to detect half-open connections, how to apply backpressure so slow clients do not exhaust memory, and how to close cooperatively during rollouts. The patterns show up in production services and in consulting work where teams move from SSE or polling to bidirectional channels: the goal is predictable behavior under load, not the smallest demo on localhost.

What makes WebSockets operationally different from HTTP

An ordinary request/response cycle has a clear end: the server sends headers and a body, the connection may be reused, but each exchange is bounded. A WebSocket upgrades a TCP connection and then exchanges framed messages indefinitely until one side closes or the network partitions.

That changes several engineering constraints:

Middleboxes (corporate proxies, CDNs, API gateways) may buffer, idle-timeout, or strip traffic they classify as “stuck” HTTP. A connection that looks idle for 60 seconds is a common default kill condition—even if your application logic considers the session healthy.
Backpressure is easy to ignore: socket.send() in many stacks queues data in user space. If the client stops reading, the queue grows until the process runs out of memory or the kernel buffers fill and writes block the event loop.
Authentication is not automatically repeated per message. Whatever identity you establish at upgrade time is what you have until you revalidate.
Deployment and scaling require explicit decisions about stickiness, drain, and broadcast semantics across instances.

Understanding those constraints is the difference between a demo that works on ws://localhost and a channel that survives a Tuesday traffic spike behind nginx and a cloud load balancer.

Heartbeats, ping/pong, and idle detection

Why heartbeats exist

TCP gives you no cheap signal that the peer has silently disappeared (laptop lid closed, Wi-Fi handoff, proxy killed the socket without a FIN). Your server may believe a session is active while the client is gone—a half-open connection. Heartbeats turn “assume alive” into “prove liveness on a schedule.”

The WebSocket protocol defines ping and pong control frames. Many servers schedule periodic pings; compliant clients answer with pongs. Libraries differ: in browser WebSockets you typically cannot send raw ping frames from JavaScript—the browser responds to server pings. In Node, libraries like ws expose ping() on the server side.

Choosing intervals and timeouts

There is no universal constant; you are balancing proxy idle limits, battery on mobile, and how quickly you want to reclaim resources from dead peers.

Interval between server pings: often 20–45 seconds when corporate proxies use 60s idle cuts—stay comfortably below.
Missed pong / read idle threshold: allow 2–3 missed cycles before closing server-side, but log why—frequent false positives usually mean your interval fights an intermediary’s buffering policy.

Treat heartbeat configuration as tunable per environment: staging proxies are not production proxies.

Application-level heartbeat messages

Some stacks or gateways interfere with WebSocket control frames. A pragmatic fallback is a small JSON message ({"type":"ping"} / {"type":"pong"}) on the same channel. Trade-offs:

Pros: visible in application logs, easy to reason about in browsers you control end-to-end.
Cons: mixes control traffic with business events; ensure your message router does not accidentally fan out pings to subscribers.

In engagements where multiple vendors sit between user and origin, I usually implement both: protocol ping/pong where supported, plus an application ping at a lower frequency as a second line of defense.

Backpressure: bounded queues and slow consumers

The failure mode

A producer (market ticks, chat messages, build logs) emits faster than a consumer reads. If each send allocates and enqueues unbounded memory, one slow tab becomes a memory leak for the server process.

Mitigations

Check backpressure signals before enqueueing. In Node ws, websocket.bufferedAmount exposes bytes queued for transmission. If it crosses a threshold, drop, sample, or pause upstream producers for that socket.
Per-connection caps: ring buffer or max queue length with explicit policy (drop oldest vs drop newest vs disconnect).
Separate concerns: a fan-out service should not run heavy work inline in the message handler; hand off to workers and keep the socket thread/event loop responsive.

Backpressure is a product decision as much as a technical one: for a trading UI you might prefer coalesced latest price over a complete history; for audit logs you might prefer disconnect and resync over silent loss.

Authentication and authorization at the upgrade

Browsers send WebSocket handshakes with restricted header sets; you cannot attach arbitrary headers from JavaScript like you would with curl. Common patterns:

Cookie-based sessions—works if cookies are scoped correctly and CSRF risks at upgrade are understood.
Ticket in query string—short-lived, single-use tokens issued over HTTPS immediately before connect. Never long-lived secrets in URLs that get logged.
Post-connect challenge—upgrade first, send a signed message within milliseconds; close if invalid. Slightly more complex but avoids leaking tokens in logs and referrer headers.

Whatever you choose, revalidate on privilege changes: role revocation should close or refresh sockets, not wait until the next HTTP request in another tab.

Horizontal scaling and broadcast

WebSocket sessions are sticky in practice: a given connection terminates on one instance. Cross-room broadcast requires a side channel (Redis pub/sub, NATS, Kafka compact topics for small control messages, etc.). Design for at-least-once delivery over the bus: your socket layer should deduplicate by message id if ordering and uniqueness matter.

When helping teams scale chat or collaborative editors, the recurring mistake is treating “Redis pub/sub” as a message bus with persistence—it is a signal layer. If you need replay after reconnect, you still need durable history elsewhere.

Graceful shutdown and rolling deploys

Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds, then SIGKILL. Your process should:

Stop accepting new WebSocket upgrades (readiness probe fails; load balancer stops routing new connections—depending on platform).
Send a close frame with a machine-readable code and a short reason (“server draining”) so clients can reconnect with exponential backoff.
Drain existing messages up to a deadline, then close remaining sockets.

Clients must implement reconnect with jitter and resume protocols (cursor, last event id, or snapshot + delta). Without that, draining is indistinguishable from failure.

Practical example: minimal guarded echo server with `ws`

The following example uses the ws package. It shows: origin check, heartbeat with termination on timeout, bufferedAmount guard, and SIGINT drain (extend similarly for SIGTERM in containers).

import http from "http";
import { WebSocketServer, WebSocket } from "ws";

const HEARTBEAT_MS = 25_000;
const DEAD_AFTER_MS = 45_000;
const MAX_BUFFERED_BYTES = 512 * 1024;

type AliveSocket = WebSocket & { isAlive?: boolean; heartbeat?: NodeJS.Timeout };

function safeOrigin(origin: string | undefined): boolean {
  if (!origin) return false;
  const allowed = new Set(["https://app.example.com", "http://localhost:3000"]);
  return allowed.has(origin);
}

const server = http.createServer((_req, res) => {
  res.writeHead(200, { "content-type": "text/plain" });
  res.end("ok");
});

const wss = new WebSocketServer({ noServer: true });

server.on("upgrade", (req, socket, head) => {
  if (!safeOrigin(req.headers.origin)) {
    socket.write("HTTP/1.1 403 Forbidden\r\n\r\n");
    socket.destroy();
    return;
  }
  wss.handleUpgrade(req, socket, head, (ws) => {
    wss.emit("connection", ws, req);
  });
});

wss.on("connection", (ws: AliveSocket) => {
  ws.isAlive = true;

  ws.on("pong", () => {
    ws.isAlive = true;
  });

  ws.on("message", (data) => {
    if (ws.bufferedAmount > MAX_BUFFERED_BYTES) {
      ws.close(1008, "client too slow");
      return;
    }
    ws.send(data);
  });

  ws.heartbeat = setInterval(() => {
    if (ws.isAlive === false) {
      ws.terminate();
      return;
    }
    ws.isAlive = false;
    ws.ping();
  }, HEARTBEAT_MS);

  const idleKill = setTimeout(() => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.close(1000, "idle hard stop");
    }
  }, DEAD_AFTER_MS * 4);

  ws.on("close", () => {
    clearInterval(ws.heartbeat!);
    clearTimeout(idleKill);
  });
});

function shutdown() {
  for (const client of wss.clients) {
    if (client.readyState === WebSocket.OPEN) {
      client.close(1001, "server shutting down");
    }
  }
  server.close(() => process.exit(0));
  setTimeout(() => process.exit(1), 10_000).unref();
}

process.on("SIGINT", shutdown);

server.listen(8080);

In production you would replace the origin allowlist with configuration, wire metrics (connection_open_total, heartbeat_timeout_close_total, buffered_amount_high_water), and integrate readiness with your orchestrator.

Common mistakes and pitfalls

No heartbeat behind corporate HTTP proxies → mysterious 60-second disconnects that reproduce only on customer networks.
Unbounded send queues → process RSS grows until OOM when one client throttles.
Treating the first HTTP cookie as perpetual proof of identity → revoked users keep receiving private events until reconnect.
Broadcasting Redis messages without deduplication → duplicate UI events under retries.
Deploy that kills connections instantly → thundering herd on reconnect because every client uses the same backoff.
Assuming browser tabs share one WebSocket → they do not; design server-side fan-in/fan-out accordingly.

Conclusion

WebSockets buy you low-latency, bidirectional messaging at the cost of connection lifecycle management: heartbeats for half-open detection, explicit backpressure for slow readers, auth decisions tied to upgrade and revocation, and cooperative shutdown so deploys look like planned maintenance instead of outages. Those concerns are exactly what separates a prototype from something you can run under a load balancer, behind a WAF, and on a cluster that rolls every week.

If you are evaluating real-time channels for a product or hardening an existing gateway, the About page summarizes how I work with teams on scalable, production-ready backends; for a concrete engagement or review, use Contact.

Get an email when new articles are published. No spam — only new posts from this blog.