HTTP streaming in production backends: SSE, chunked transfer, proxies, and backpressure
Designing long-lived HTTP responses for live dashboards and LLM-style token delivery: Server-Sent Events vs chunked bodies, intermediary buffering, timeouts, and safe client reconnection.
Introduction
A support dashboard polls /api/tickets every two seconds. At midnight, a regional outage doubles traffic. Your API nodes spend a measurable fraction of CPU serializing the same JSON snapshot for thousands of clients who only needed to know that one field changed. You switch to “push” semantics: the browser opens a stream, and the server emits updates as they happen. Suddenly you are debugging 502s that appear only behind the corporate reverse proxy, memory growth on idle connections, and duplicate events after reconnect—not because streaming is exotic, but because HTTP streaming crosses every layer in the stack, each with its own buffering rules and timeouts.
The topic matters because incremental responses are now the default shape for more than chat UIs: deployment progress, export pipelines, and observability fan-out all reuse the same primitives. This article explains how to implement chunked bodies and Server-Sent Events (SSE) so they behave predictably behind CDNs, load balancers, and service meshes—patterns I apply when building scalable, production-ready HTTP APIs for product teams and integration-heavy clients.
How HTTP streaming differs from a normal request/response
In a classic handler, you compute a full payload, set Content-Length, and return. The runtime may flush the entire response at once. In a streaming handler, you write bytes incrementally on a single response while the connection stays open.
Two common mechanisms:
-
Chunked transfer encoding (
Transfer-Encoding: chunked): arbitrary byte chunks framed by HTTP; the client reads until the terminating zero-length chunk. You control framing; the client must understand your format (often newline-delimited JSON, length-prefixed frames, or a custom binary protocol). -
Server-Sent Events (
Content-Type: text/event-stream): a text protocol on top of chunked encoding where each logical message is a block offield: valuelines terminated by a blank line. Browsers expose it throughEventSource, which handles automatic reconnect withLast-Event-ID.
Both are one-way server → client on a standard HTTP connection (often HTTP/2 multiplexed). They differ from WebSockets, which upgrade to a bidirectional channel and require different operational tooling (heartbeats, message framing, often different auth patterns).
Why prefer SSE or chunked HTTP over WebSockets?
Trade-offs:
| Concern | SSE / chunked HTTP | WebSockets |
|---|---|---|
| Direction | Server → client (client still POSTs when needed) | Full duplex |
| Infrastructure | Reuses HTTP routing, auth middleware, mTLS, WAF rules | Separate upgrade path; some proxies restrict or buffer |
| Browser primitives | EventSource for SSE; fetch + ReadableStream for chunked | Custom client code |
| Binary payloads | SSE is text-oriented (base64 if needed) | Native binary frames |
| Backpressure | Write to the socket until kernel buffers fill; must handle drain events in Node | Same underlying issue, different API |
For many dashboard and “AI tokens over HTTP” use cases, SSE is enough: the user action is a POST or GET that returns a stream; the client does not need hundreds of kilobits per second upstream on the same socket.
Server-Sent Events: protocol details that matter
SSE messages look like:
event: ticket.updated
id: 42
data: {"ticketId":"t_9","status":"open"}
Rules worth internalizing:
- Line endings should be
\n(spec allows\r\n; be consistent). data:lines may be split across multipledata:lines; the client joins them with\n.id:is optional but critical for resume: compliant clients sendLast-Event-IDon reconnect.retry:(milliseconds) hints reconnection delay; clients may ignore it.- Comments (
:or lines starting with:) keep connections alive through silent proxies that would otherwise close “idle” TCP sessions.
Heartbeats and comments
If you emit only on rare domain events, middleboxes may treat the connection as idle. Periodic comment pings cost almost nothing:
: keep-alive
In freelance engagements, the absence of these pings is a recurring cause of “works on my machine / breaks only at the customer’s office” reports—corporate HTTP proxies are conservative.
Chunked JSONL streams without SSE
When you need arbitrary JSON objects (not data: lines) or binary, use chunked encoding with a simple contract, for example newline-delimited JSON (NDJSON):
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
{"type":"delta","text":"Hello"}
{"type":"delta","text":" world"}
{"type":"done"}
Clients using fetch can for await (const chunk of response.body) and split on newlines. You lose built-in Last-Event-ID semantics; you must design your own cursor (e.g. monotonic sequence numbers in each line).
Intermediaries: where streaming silently breaks
Reverse proxies and buffering
Nginx’s proxy_buffering (on by default in many templates) buffers upstream responses to disk or memory. For SSE, you typically want:
proxy_buffering off;
proxy_cache off;
gzip off;
Otherwise the client sees nothing until the buffer fills or the stream ends—fine for HTML pages, fatal for “live” streams.
Load balancers and idle timeouts
AWS ALB idle timeout defaults to 60 seconds; many teams set 60–120s on other LBs. If your only traffic is occasional SSE events, the connection dies unless you send frequent heartbeats under the LB threshold.
HTTP/2 and SSE
SSE works over HTTP/2, but tooling varies. Some older mobile WebViews had quirks; always test actual customer environments if you ship B2B.
Backpressure and memory: the Node.js Writable contract
In Node.js HTTP, res.write(chunk) may return false when the kernel send buffer is full. Continued writes without waiting for 'drain' buffer unbounded data in user space. Production services must:
- Respect
res.writereturn value or usestream.pipeline/Readable.pipe(res, { end: false })with proper error handling. - Cap queue depth for domain events: if the client cannot keep up, drop, sample, or close with a clear error—never let RAM grow with the backlog.
This is the same class of bug as unbounded in-memory message queues: streaming only moves the problem from “response size” to “write queue depth”.
Authentication and authorization
EventSource in browsers does not support custom headers, which pushes teams toward:
- Cookie-based session auth (SameSite, CSRF strategy for non-GET if applicable), or
- Token in query string (short-lived, narrowly scoped, logged carefully—URLs leak via Referer and logs).
For machine clients, prefer fetch with streaming and normal Authorization headers.
Practical example: minimal SSE endpoint with heartbeats and id
Below is a self-contained Next.js App Route sketch (Node runtime) that streams ticket updates. It uses comment heartbeats and monotonically increasing id fields. For high-throughput sources, replace the bare controller.enqueue loop with a pipeline that respects backpressure (or drop/slow the producer when the client lags).
import type { NextRequest } from "next/server";
export const runtime = "nodejs";
function sseEncode(event: { id: string; event?: string; data: string }): string {
const lines: string[] = [];
if (event.event) lines.push(`event: ${event.event}`);
lines.push(`id: ${event.id}`);
for (const line of event.data.split("\n")) {
lines.push(`data: ${line}`);
}
lines.push("", "");
return lines.join("\n");
}
export async function GET(req: NextRequest) {
const encoder = new TextEncoder();
let seq = 0;
const stream = new ReadableStream({
async start(controller) {
const send = (bytes: Uint8Array) => {
controller.enqueue(bytes);
};
const heartbeat = setInterval(() => {
send(encoder.encode(": hb\n\n"));
}, 15000);
// Example: push synthetic events; replace with domain subscription.
const tick = setInterval(() => {
seq += 1;
const payload = sseEncode({
id: String(seq),
event: "tick",
data: JSON.stringify({ seq, t: Date.now() }),
});
send(encoder.encode(payload));
}, 5000);
req.signal.addEventListener("abort", () => {
clearInterval(heartbeat);
clearInterval(tick);
try {
controller.close();
} catch {
/* ignore */
}
});
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream; charset=utf-8",
"Cache-Control": "no-cache, no-transform",
Connection: "keep-alive",
// Help some proxies: disable buffering where honored
"X-Accel-Buffering": "no",
},
});
}
Client-side, new EventSource("/api/tickets/stream") will resume after network blips using Last-Event-ID—as long as your origin can reconstruct or safely ignore gaps (e.g. by sending a snapshot after reconnect).
Common mistakes and pitfalls
-
Forgetting proxy buffering — everything works locally over plain Node, fails in staging behind Nginx with default
proxy_buffering on. -
No heartbeats under load balancer idle timeouts — mysterious disconnects every 60 seconds.
-
Ignoring backpressure — memory climbs when slow clients connect to a high-throughput producer.
-
Treating SSE as a message bus — it is best-effort over TCP. Combine with durable logs, cursors, or CRDT sync if clients cannot miss events.
-
Unbounded retry after
Last-Event-ID— if your server replays every event since the beginning of time, a stale ID becomes a DoS vector. Cap replay windows or snapshot first. -
Mixing gzip with SSE through broken stacks — some combinations buffer aggressively; often simpler to disable compression on the streaming route.
Conclusion
HTTP streaming is often the smallest step from polling to push: same TLS, same routing, same auth middleware—different assumptions about buffering and connection lifetime. Server-Sent Events give you a standard text framing, browser-managed reconnect, and Last-Event-ID—ideal for dashboards and incremental text. Chunked encodings give flexibility at the cost of bespoke client logic. In either case, production readiness means explicit heartbeats, proxy-aware configuration, backpressure-aware writes, and a clear story for reconnect and missed events.
Key takeaways:
- Disable or tune proxy buffering on streaming routes; otherwise clients see batched flushes or nothing until the end.
- Heartbeat below load-balancer idle timeouts; treat comment lines in SSE as operational infrastructure, not optional polish.
- Never treat the HTTP connection as a durable log—persist cursors, cap replay after
Last-Event-ID, and snapshot when gaps are unacceptable. - Measure slow consumers; bounded queues or explicit disconnects beat runaway memory.
Used deliberately, these patterns scale operational visibility and interactive products without immediately jumping to WebSockets—something I reach for when helping teams ship APIs that behave the same in the data center, on Kubernetes, and behind a customer’s legacy gateway. Background on how I work with teams is on About; for new integrations or architecture reviews, Contact is the right place to start.
Suscríbete al boletín
Recibe un correo cuando se publiquen artículos nuevos. Sin spam — solo entradas nuevas de este blog.
Con Resend. Puedes darte de baja en cualquier correo.