HTTP streaming in production backends: SSE, chunked transfer, proxies, and backpressure

Designing long-lived HTTP responses for live dashboards and LLM-style token delivery: Server-Sent Events vs chunked bodies, intermediary buffering, timeouts, and safe client reconnection.

作者: Matheus Palma约 8 分钟阅读
Software engineeringBackendHTTPNode.jsAPI designReliabilityReal-time systems

Introduction

A support dashboard polls /api/tickets every two seconds. At midnight, a regional outage doubles traffic. Your API nodes spend a measurable fraction of CPU serializing the same JSON snapshot for thousands of clients who only needed to know that one field changed. You switch to “push” semantics: the browser opens a stream, and the server emits updates as they happen. Suddenly you are debugging 502s that appear only behind the corporate reverse proxy, memory growth on idle connections, and duplicate events after reconnect—not because streaming is exotic, but because HTTP streaming crosses every layer in the stack, each with its own buffering rules and timeouts.

The topic matters because incremental responses are now the default shape for more than chat UIs: deployment progress, export pipelines, and observability fan-out all reuse the same primitives. This article explains how to implement chunked bodies and Server-Sent Events (SSE) so they behave predictably behind CDNs, load balancers, and service meshes—patterns I apply when building scalable, production-ready HTTP APIs for product teams and integration-heavy clients.

How HTTP streaming differs from a normal request/response

In a classic handler, you compute a full payload, set Content-Length, and return. The runtime may flush the entire response at once. In a streaming handler, you write bytes incrementally on a single response while the connection stays open.

Two common mechanisms:

  1. Chunked transfer encoding (Transfer-Encoding: chunked): arbitrary byte chunks framed by HTTP; the client reads until the terminating zero-length chunk. You control framing; the client must understand your format (often newline-delimited JSON, length-prefixed frames, or a custom binary protocol).

  2. Server-Sent Events (Content-Type: text/event-stream): a text protocol on top of chunked encoding where each logical message is a block of field: value lines terminated by a blank line. Browsers expose it through EventSource, which handles automatic reconnect with Last-Event-ID.

Both are one-way server → client on a standard HTTP connection (often HTTP/2 multiplexed). They differ from WebSockets, which upgrade to a bidirectional channel and require different operational tooling (heartbeats, message framing, often different auth patterns).

Why prefer SSE or chunked HTTP over WebSockets?

Trade-offs:

ConcernSSE / chunked HTTPWebSockets
DirectionServer → client (client still POSTs when needed)Full duplex
InfrastructureReuses HTTP routing, auth middleware, mTLS, WAF rulesSeparate upgrade path; some proxies restrict or buffer
Browser primitivesEventSource for SSE; fetch + ReadableStream for chunkedCustom client code
Binary payloadsSSE is text-oriented (base64 if needed)Native binary frames
BackpressureWrite to the socket until kernel buffers fill; must handle drain events in NodeSame underlying issue, different API

For many dashboard and “AI tokens over HTTP” use cases, SSE is enough: the user action is a POST or GET that returns a stream; the client does not need hundreds of kilobits per second upstream on the same socket.

Server-Sent Events: protocol details that matter

SSE messages look like:

event: ticket.updated
id: 42
data: {"ticketId":"t_9","status":"open"}

Rules worth internalizing:

  • Line endings should be \n (spec allows \r\n; be consistent).
  • data: lines may be split across multiple data: lines; the client joins them with \n.
  • id: is optional but critical for resume: compliant clients send Last-Event-ID on reconnect.
  • retry: (milliseconds) hints reconnection delay; clients may ignore it.
  • Comments (: or lines starting with :) keep connections alive through silent proxies that would otherwise close “idle” TCP sessions.

Heartbeats and comments

If you emit only on rare domain events, middleboxes may treat the connection as idle. Periodic comment pings cost almost nothing:

: keep-alive

In freelance engagements, the absence of these pings is a recurring cause of “works on my machine / breaks only at the customer’s office” reports—corporate HTTP proxies are conservative.

Chunked JSONL streams without SSE

When you need arbitrary JSON objects (not data: lines) or binary, use chunked encoding with a simple contract, for example newline-delimited JSON (NDJSON):

HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked

{"type":"delta","text":"Hello"}
{"type":"delta","text":" world"}
{"type":"done"}

Clients using fetch can for await (const chunk of response.body) and split on newlines. You lose built-in Last-Event-ID semantics; you must design your own cursor (e.g. monotonic sequence numbers in each line).

Intermediaries: where streaming silently breaks

Reverse proxies and buffering

Nginx’s proxy_buffering (on by default in many templates) buffers upstream responses to disk or memory. For SSE, you typically want:

proxy_buffering off;
proxy_cache off;
gzip off;

Otherwise the client sees nothing until the buffer fills or the stream ends—fine for HTML pages, fatal for “live” streams.

Load balancers and idle timeouts

AWS ALB idle timeout defaults to 60 seconds; many teams set 60–120s on other LBs. If your only traffic is occasional SSE events, the connection dies unless you send frequent heartbeats under the LB threshold.

HTTP/2 and SSE

SSE works over HTTP/2, but tooling varies. Some older mobile WebViews had quirks; always test actual customer environments if you ship B2B.

Backpressure and memory: the Node.js Writable contract

In Node.js HTTP, res.write(chunk) may return false when the kernel send buffer is full. Continued writes without waiting for 'drain' buffer unbounded data in user space. Production services must:

  • Respect res.write return value or use stream.pipeline / Readable.pipe(res, { end: false }) with proper error handling.
  • Cap queue depth for domain events: if the client cannot keep up, drop, sample, or close with a clear error—never let RAM grow with the backlog.

This is the same class of bug as unbounded in-memory message queues: streaming only moves the problem from “response size” to “write queue depth”.

Authentication and authorization

EventSource in browsers does not support custom headers, which pushes teams toward:

  • Cookie-based session auth (SameSite, CSRF strategy for non-GET if applicable), or
  • Token in query string (short-lived, narrowly scoped, logged carefully—URLs leak via Referer and logs).

For machine clients, prefer fetch with streaming and normal Authorization headers.

Practical example: minimal SSE endpoint with heartbeats and id

Below is a self-contained Next.js App Route sketch (Node runtime) that streams ticket updates. It uses comment heartbeats and monotonically increasing id fields. For high-throughput sources, replace the bare controller.enqueue loop with a pipeline that respects backpressure (or drop/slow the producer when the client lags).

import type { NextRequest } from "next/server";

export const runtime = "nodejs";

function sseEncode(event: { id: string; event?: string; data: string }): string {
  const lines: string[] = [];
  if (event.event) lines.push(`event: ${event.event}`);
  lines.push(`id: ${event.id}`);
  for (const line of event.data.split("\n")) {
    lines.push(`data: ${line}`);
  }
  lines.push("", "");
  return lines.join("\n");
}

export async function GET(req: NextRequest) {
  const encoder = new TextEncoder();
  let seq = 0;

  const stream = new ReadableStream({
    async start(controller) {
      const send = (bytes: Uint8Array) => {
        controller.enqueue(bytes);
      };

      const heartbeat = setInterval(() => {
        send(encoder.encode(": hb\n\n"));
      }, 15000);

      // Example: push synthetic events; replace with domain subscription.
      const tick = setInterval(() => {
        seq += 1;
        const payload = sseEncode({
          id: String(seq),
          event: "tick",
          data: JSON.stringify({ seq, t: Date.now() }),
        });
        send(encoder.encode(payload));
      }, 5000);

      req.signal.addEventListener("abort", () => {
        clearInterval(heartbeat);
        clearInterval(tick);
        try {
          controller.close();
        } catch {
          /* ignore */
        }
      });
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream; charset=utf-8",
      "Cache-Control": "no-cache, no-transform",
      Connection: "keep-alive",
      // Help some proxies: disable buffering where honored
      "X-Accel-Buffering": "no",
    },
  });
}

Client-side, new EventSource("/api/tickets/stream") will resume after network blips using Last-Event-ID—as long as your origin can reconstruct or safely ignore gaps (e.g. by sending a snapshot after reconnect).

Common mistakes and pitfalls

  1. Forgetting proxy buffering — everything works locally over plain Node, fails in staging behind Nginx with default proxy_buffering on.

  2. No heartbeats under load balancer idle timeouts — mysterious disconnects every 60 seconds.

  3. Ignoring backpressure — memory climbs when slow clients connect to a high-throughput producer.

  4. Treating SSE as a message bus — it is best-effort over TCP. Combine with durable logs, cursors, or CRDT sync if clients cannot miss events.

  5. Unbounded retry after Last-Event-ID — if your server replays every event since the beginning of time, a stale ID becomes a DoS vector. Cap replay windows or snapshot first.

  6. Mixing gzip with SSE through broken stacks — some combinations buffer aggressively; often simpler to disable compression on the streaming route.

Conclusion

HTTP streaming is often the smallest step from polling to push: same TLS, same routing, same auth middleware—different assumptions about buffering and connection lifetime. Server-Sent Events give you a standard text framing, browser-managed reconnect, and Last-Event-ID—ideal for dashboards and incremental text. Chunked encodings give flexibility at the cost of bespoke client logic. In either case, production readiness means explicit heartbeats, proxy-aware configuration, backpressure-aware writes, and a clear story for reconnect and missed events.

Key takeaways:

  • Disable or tune proxy buffering on streaming routes; otherwise clients see batched flushes or nothing until the end.
  • Heartbeat below load-balancer idle timeouts; treat comment lines in SSE as operational infrastructure, not optional polish.
  • Never treat the HTTP connection as a durable log—persist cursors, cap replay after Last-Event-ID, and snapshot when gaps are unacceptable.
  • Measure slow consumers; bounded queues or explicit disconnects beat runaway memory.

Used deliberately, these patterns scale operational visibility and interactive products without immediately jumping to WebSockets—something I reach for when helping teams ship APIs that behave the same in the data center, on Kubernetes, and behind a customer’s legacy gateway. Background on how I work with teams is on About; for new integrations or architecture reviews, Contact is the right place to start.

订阅邮件通讯

新文章发布时收到邮件。无垃圾信息 — 仅本博客的新文章通知。

由 Resend 发送,可在邮件中退订。