Request coalescing and single-flight: stopping cache stampedes before they flatten your database

Why synchronized TTL expiry causes thundering herds, how in-process and distributed single-flight deduplicate work, and trade-offs when scaling Node.js and Redis-backed caches.

Autor: Matheus Palma7 Min. Lesezeit
Software engineeringBackendNode.jsPerformanceRedisArchitecture

You ship a read-heavy endpoint backed by Redis. Traffic is steady until a hot key expires—or every pod restarts at once—and suddenly Postgres CPU pegs at 100% while p95 latency explodes. The cache was doing its job; the synchronized miss was not. In client projects and product teams I work with, this pattern shows up after deploys, regional failovers, and “simple” cache TTL changes: many concurrent requests discover the same absent key and each tries to recompute or load the underlying row.

This article explains request coalescing (single-flight): deduplicating in-flight work so only one logical computation populates the cache per key (or per shard) while others await the same result. It covers in-process patterns, distributed coordination with Redis, trade-offs, and mistakes that leave you with false confidence.

The stampede: synchronized misses and fan-out

Caching trades freshness for load reduction. When a popular key expires at time T, every request that arrives after T sees a miss. Without coordination, N concurrent requests may issue N identical database queries or N identical calls to an expensive service. That is a cache stampede (thundering herd): load spikes precisely when the protective layer briefly disappears.

The problem worsens when:

  • TTLs align across keys (e.g. daily refresh at midnight UTC)
  • Cold start clears local or remote cache together (deploy, eviction policy, Redis failover)
  • Nested caches each miss in sequence, amplifying work per request

Coalescing does not replace caching; it ensures that the first miss pays the cost and subsequent concurrent misses wait for that outcome instead of repeating it.

Single-flight semantics

Single-flight means: for a given key k, at most one execution of the expensive function load(k) runs at a time among peers that participate in the same coalescing scope.

Important distinctions:

ScopeWhat it deduplicatesTypical use
In-processConcurrent async work inside one Node.js processPer-pod stampedes
DistributedWork across replicas via shared coordinatorMulti-pod / multi-region

Not the same as memoizing forever: after completion, callers get the result; the map entry can be dropped. Not the same as a mutex around the entire handler unless you scope it per key—otherwise unrelated requests serialize unnecessarily.

Why Promise sharing works in Node.js

In a single event loop, multiple callers can await the same Promise instance. If the first caller starts load(k) and stores the pending promise in a map keyed by k, subsequent callers retrieve that promise and await it. When it settles, everyone receives the same fulfillment or rejection. That is the minimal in-process single-flight pattern—no extra dependency required.

The subtlety is cleanup: remove the key from the map in finally so the next miss after completion can start a fresh load. If you remove too early (before all waiters attached), you can still duplicate work—so the map must reference the in-flight promise until settlement.

Distributed coalescing with Redis

When multiple processes handle traffic, in-process maps do not talk to each other. Options:

  1. Push computation to one tier — only workers or a BFF populate cache; HTTP handlers read-through only after coordination. This reduces duplication but changes architecture.

  2. Distributed lock or lease per key — first acquirer loads; others wait or fast-fail and retry read. Redis SET key token NX PX ttl is a common lease pattern; release with Lua compare-and-del or let TTL expire.

  3. Let the datastore coordinate — some databases support advisory locks or SELECT FOR UPDATE on a synthetic “lock row”; this trades Redis ops for DB contention and must be sized carefully.

  4. Probabilistic early refresh — refresh hot keys before expiry with jitter so misses do not align, complementary to coalescing.

For Redis, many teams use a short-lived lock key lock:{cacheKey}: if acquisition fails, brief backoff and retry GET on the cache key another worker may have filled. The goal is not perfection on every edge case but reducing N to ~1 under burst conditions.

Trade-offs and limitations

Latency for waiters. Callers that hit during an in-flight load wait for the slowest path. Set timeouts so a stuck loader does not block the entire worker; propagate failures clearly (503 vs empty cache) according to product rules.

Failure propagation. If the single flight rejects, all waiters reject. Sometimes you prefer stale reads: serve expired cache while one caller refreshes (stale-while-revalidate). That is a different contract—still valuable, but not identical to single-flight.

Key granularity. Too coarse a key (e.g. one flight for user:*) serializes unrelated users; too fine and memory grows. Align coalescing keys with cache keys.

Cross-region. Distributed locks in one Redis cluster do not coordinate another region’s pods unless they share the store or you accept duplicate work per region (often acceptable for read caches).

Poison keys. If load(k) always throws, repeated retries can hammer the origin. Pair coalescing with circuit breakers and negative caching with short TTL for “known missing” entities—and monitor error rates per key.

Practical example: in-process single-flight in TypeScript

The following pattern is suitable for a Node.js service where each instance coalesces its own concurrent misses. Combine with Redis caching so repeat requests across instances eventually benefit from a filled cache.

type Loader<K, V> = (key: K) => Promise<V>;

export function createSingleFlight<K, V>() {
  const inflight = new Map<K, Promise<V>>();

  return async function run(key: K, load: Loader<K, V>): Promise<V> {
    const existing = inflight.get(key);
    if (existing) return existing;

    const promise = (async () => {
      try {
        return await load(key);
      } finally {
        inflight.delete(key);
      }
    })();

    inflight.set(key, promise);
    return promise;
  };
}

// Example: wrap cache-aside read
const sf = createSingleFlight<string, string>();

async function getProfile(userId: string, redis: Redis, db: Db): Promise<string> {
  const cacheKey = `profile:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached !== null) return cached;

  return sf(userId, async () => {
    const row = await db.users.findById(userId);
    const json = JSON.stringify(row);
    await redis.set(cacheKey, json, "EX", 300);
    return json;
  });
}

The finally block removes the in-flight entry after the promise completes; new concurrent requests after completion miss the map and start a new load only when the cache is empty again—no permanent serialization.

To add cross-pod coalescing, introduce a Redis lock around the body of load inside sf(userId, ...), or use a small library that implements Redis-based single-flight if your stack standardizes on one.

Common mistakes and pitfalls

Clearing the map on attach instead of settlement. If you delete the key before the async work finishes, a second caller may start a duplicate load while the first is still running.

Global lock for all keys. A single mutex around getProfile would serialize all users; always scope by cache key (or a stable hash bucket if you must cap map size).

Ignoring timeouts. A hung database connection blocks every waiter on that key; use AbortSignal, query timeouts, or race against a timeout promise at the loader boundary.

Assuming Redis SET NX is enough without retry-read. If lock acquisition fails, you need a strategy: wait with backoff, retry cache GET, or occasionally accept duplicate load under extreme contention—document which behavior you chose.

Skipping metrics. Track “coalesced waiters” vs “primary loaders” to verify the pattern actually activates during incidents rather than only in theory.

Conclusion

Cache stampedes are a coordination problem disguised as a performance problem. Single-flight request coalescing ensures concurrent misses for the same key collapse into one origin load per scope, cutting duplicate work when TTLs align or caches cold-start together. In-process Promise sharing is cheap and effective per pod; Redis leases and retry-read patterns extend the idea across replicas at the cost of operational complexity and tail latency for waiters.

Key takeaways:

  • Align coalescing scope with cache keys; avoid over-broad serialization
  • Clean up in-flight maps on settlement; attach timeouts to loaders
  • Combine with jittered TTLs, stale-while-revalidate, and breakers for a full defense-in-depth story

For teams shipping scalable, production-ready APIs and workers, investing once in coalescing and observability around hot keys pays off the first time traffic spikes without taking the database with it. If you want to discuss architecture for your stack or workloads, the contact page is the right place to start.

Newsletter abonnieren

E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.

Über Resend. Abmeldung in jeder E-Mail möglich.