Read-your-writes consistency: when CDNs and caches lie (and how to fix it)

Users refresh and still see stale data after a successful write. This article explains read-your-writes consistency, cache-control patterns, surrogate keys, and tokenized URLs for edge-cached APIs.

Autor: Matheus Palma16 de abril de 20269 min de lectura

Software engineeringBackendFrontendAPI designArchitectureCDN

You ship a profile update. The API returns 200 OK. The user navigates back to the settings page—or simply refreshes—and the old avatar and bio are still there. Nothing is “broken” in the logs: writes succeed, reads succeed, yet the product feels unreliable. In freelance and consulting engagements, this is one of the most common perceived correctness bugs after introducing a CDN or an HTTP cache in front of a BFF or REST API: the system is eventually consistent at the edge, while the product promise is read-your-writes for the actor who just changed data.

This article defines read-your-writes consistency in practical HTTP terms, shows how default caching defeats it, and outlines patterns (headers, surrogate keys, short-lived private caches, and session-scoped cache busting) that keep edge performance without gaslighting users.

What “read-your-writes” means in web systems

Read-your-writes (RYW) is a session-centric guarantee: if principal P performs write W, then any read R by P that happens after W completes should observe W’s effects (or later writes), not a stale prefix of the state.

It is weaker than linearizability and weaker than serializability across arbitrary clients. Two different users can still race; what you are protecting is the self-consistent narrative for the person who clicked “Save.”

Why it matters:

Trust — A success response implies persistence; if the next screen contradicts that, users file tickets and engineers burn time proving “the database is fine.”
Downstream workflows — Checkout, onboarding, and permission changes often chain UI steps. Stale reads cause duplicate submissions, wrong role gates, and “phantom” validation errors.
Support load — “I updated it but it didn’t save” is expensive to debug when the real issue is cache keying or TTL, not persistence.

RYW is not automatically provided by “using React” or “using PostgreSQL.” It is a property of the end-to-end read path: browser caches, service workers, CDNs, reverse proxies, application caches, and read replicas must all be aligned with the product contract.

Where stale reads come from (a layered picture)

A typical read after a write crosses several independent caches:

Browser HTTP cache — Honors Cache-Control, ETag, Last-Modified, and validators.
Service worker — May implement its own caching strategies (cache-first, stale-while-revalidate, etc.).
CDN edge — Often caches GET responses keyed by URL (and sometimes vary headers).
Origin reverse proxy — nginx, Envoy, or API gateways may cache responses or buffer SSE/WebSocket differently than you expect.
Application / ORM cache — In-process or Redis layers keyed by entity id without writer awareness.

Any layer can legally serve a cached object that is fresh according to HTTP but stale according to the user’s mental model. RYW failures are almost always policy bugs, not random packet loss.

HTTP caching semantics that interact with RYW

`Cache-Control: public` with a long `max-age`

If list and detail resources are marked public, max-age=3600, the CDN will happily serve an hour-old JSON document to the same user who updated the resource two seconds ago. The write path is correct; the read path is by design stale.

`ETag` / `Last-Modified` without revalidation on navigation

Validators help conditional requests, but the browser must actually send If-None-Match / If-Modified-Since. SPA navigations that use cache: 'force-cache' in fetch, or libraries that default to cached GETs, can skip validation unless you opt into no-cache behavior.

`stale-while-revalidate`

SWR improves perceived latency by returning stale content while refreshing in the background. That is a trade-off explicitly favoring staleness for speed. It is a poor default for authorization-shaped or user-authored payloads unless paired with stronger invalidation or scoped URLs.

Anonymous shared caches (`public`) for personalized JSON

If responses vary by Cookie or Authorization, caching them as public is hazardous: you risk cross-user leakage at shared caches. Even when that disaster is avoided, you may still violate RYW if the cache ignores vary dimensions you forgot to declare. Prefer private for personalized reads, and still validate TTL policy against RYW.

Design patterns that restore RYW without throwing away the CDN

Pattern A: classify endpoints and default to non-cacheable user state

A pragmatic baseline:

Static assets — Long max-age + fingerprinted filenames (content hash in the URL). Immutability gives you infinite caching without RYW issues because the URL changes when content changes.
Public catalog content — Moderate TTL or SWR where business accepts propagation delay.
User-private reads — Cache-Control: private, no-store or very short max-age with explicit revalidation, plus Vary: Cookie only when unavoidable (prefer separate BFF routes over vary explosion).

This is blunt but correct. Many teams under-cache personalized JSON until they have invalidation wired—that is a reasonable engineering sequence.

Pattern B: surrogate keys and admin/user-triggered purge

Fastly, Cloudflare Workers/KV, and similar stacks support surrogate keys: response headers such as Surrogate-Key: user 12345 profile attached at the origin, then purged by key when a user updates their profile.

Mechanically:

Origin sets Surrogate-Key (and often Cache-Control: public, s-maxage=… for edge TTL).
On successful write, the origin (or async worker) calls the CDN API to purge user:12345 (or enqueue purge with backoff).

RYW holds if purge latency is low relative to UI navigation, or if you combine purge with a short max-age so worst-case staleness is bounded.

Trade-offs:

Purge APIs are operational dependencies (rate limits, partial failures).
Key cardinality must be managed—per-row surrogate keys for high-cardinality tables can explode purge fan-out.

Pattern C: URL versioning / content-addressed reads for session scope

Instead of purging, change the cache key when the user mutates state. A classic pattern is a session generation counter or profile version incremented on write:

GET /api/v1/me?rv=42
Cache-Control: public, max-age=60, stale-while-revalidate=120
Surrogate-Key: user-12345 rv-42

After a profile update, the client navigates to rv=43 (from the write response). The CDN treats it as a new object; no purge is required for that user’s next read. Old URLs may remain cached but are no longer referenced.

This maps cleanly to read models in CQRS: expose monotonic read model versions per aggregate or per user.

Trade-offs:

Requires client discipline: the UI must persist the new revision token.
Public shared links need different rules (you cannot leak revision tokens in URLs if they become shareable secrets).

Pattern D: `Cache-Control: private, max-age=0, must-revalidate`

For browser-heavy SPAs that must not show stale JSON in the back-forward cache, a conservative directive is:

Cache-Control: private, max-age=0, must-revalidate

private keeps the response out of shared CDNs (unless you have enterprise features that respect private at the edge—verify your vendor semantics). must-revalidate forces revalidation once max-age expires (immediately, with max-age=0), shifting work to ETag handling at the origin.

This pattern favors correctness over edge offload. It is often the right stepping stone while you design surrogate keys.

Pattern E: POST-then-GET with `Pragma` / `Cache-Control: no-store` on the mutating navigation

For form posts that redirect to a GET page, legacy stacks used Pragma: no-cache to reduce history-cache issues. Modern SPAs should instead invalidate client caches (React Query, SWR, TanStack Query) on mutation success and refetch with cache: 'no-store' when needed.

The key idea: client libraries are caches too. If your mutation hook updates the server but not the normalized client cache, you can violate RYW without any CDN involved.

Practical example: profile API with CDN and RYW

Imagine a Node handler behind Fastly. You want edge caching for GET /users/:id/profile for public fields, but editors must see their own updates immediately.

Write handler increments a profile_revision in the database and returns it:

// Express-style pseudocode — not exhaustive error handling
app.patch("/users/:id/profile", requireAuth, async (req, res) => {
  const userId = req.params.id;
  if (req.auth.sub !== userId) return res.status(403).end();

  await db.transaction(async (tx) => {
    await tx.profile.update({ where: { userId }, data: req.body });
    await tx.user.update({
      where: { id: userId },
      data: { profileRevision: { increment: 1 } },
    });
  });

  const user = await db.user.findUnique({
    where: { id: userId },
    select: { profileRevision: true },
  });

  res.setHeader("Surrogate-Key", `user-${userId}`);
  res.json({ ok: true, profileRevision: user!.profileRevision });
});

Read handler requires the client to pass the revision it intends to read; the CDN caches each revision independently:

app.get("/users/:id/profile", async (req, res) => {
  const userId = req.params.id;
  const rv = z.coerce.number().int().parse(req.query.rv);

  const row = await db.profile.findUnique({ where: { userId } });
  if (!row) return res.status(404).end();

  // If client is behind, origin can optionally redirect or 400 with hint.
  if (row.revision !== rv) {
    return res.status(409).json({ error: "revision_mismatch", currentRevision: row.revision });
  }

  res.setHeader("Cache-Control", "public, s-maxage=300, stale-while-revalidate=600");
  res.setHeader("Surrogate-Key", `user-${userId} rv-${rv}`);
  res.json({ userId, bio: row.bio, avatarUrl: row.avatarUrl, revision: row.revision });
});

Client after a successful patch:

const patch = await fetch(`/users/${id}/profile`, { method: "PATCH", /* ... */ });
const { profileRevision } = await patch.json();

const doc = await fetch(`/users/${id}/profile?rv=${profileRevision}`, {
  headers: { Accept: "application/json" },
});

Why this works:

The CDN’s cache key includes rv, so the post-write read is a cold miss and fetches fresh origin content.
Old revisions can expire naturally via s-maxage, limiting storage pressure.
Purge on write is optional here; surrogate keys still help if you broadcast invalidation when admin tools mutate profiles without bumping rv (you would fix that inconsistency at the business rule layer).

If you cannot thread rv through the client yet, a bridging tactic is Cache-Control: private, max-age=0 for GET /users/self/profile while keeping public caching for other users’ strictly public projections.

Common mistakes and pitfalls

Treating ETag as magic — If clients never revalidate, ETag does not help. Verify actual traffic with CDN logs, not only origin unit tests.
Fingerprinted JS but unfingerprinted JSON — Teams nail asset hashing, then cache /api/me aggressively. Uniform rules for cache key evolution matter more than any single layer “best practice.”
Read replicas without replica lag awareness — After a write to the primary, a read routed to an async replica can return pre-write state. Mitigations include sticky primary reads for a short window, monotonic read tokens, or waiting for replication for critical screens. This is the database-side cousin of CDN staleness.
GraphQL GET caching footguns — Query strings can be enormous; CDNs may normalize or truncate; caches may accidentally collapse distinct operations. Prefer POST for authenticated GraphQL unless you have a deliberate, audited HTTP caching design.
Overusing stale-while-revalidate on identity — Great for marketing pages; risky for entitlements and account state unless paired with surrogate purges or URL versioning.

Conclusion

Read-your-writes consistency is the promise that a successful write reshapes the world the author sees on the next read. CDNs and HTTP caches are excellent at reducing latency and origin load, but their default contracts optimize for shared, time-shifted documents—not for per-session truth.

The durable fixes are boring and explicit: choose cache tiers per data class, prefer immutable URLs for static content, and for dynamic user state use private / short TTL, surrogate key purge, revision-tokenized read URLs, and client cache invalidation—often in combination with a small worst-case staleness budget. Getting this right is part of shipping production-ready web platforms: predictable behavior under load, fewer ghost bugs, and support channels that stay focused on real application defects rather than cache policy.

If you are designing a greenfield API surface or untangling edge behavior in an existing product, the about and contact pages outline how to reach out for architecture reviews and implementation support.

Recibe un correo cuando se publiquen artículos nuevos. Sin spam — solo entradas nuevas de este blog.

Con Resend. Puedes darte de baja en cualquier correo.