Change data capture for cache invalidation: PostgreSQL WAL, ordering, and eventual consistency
Invalidate Redis and CDN caches from database commits without dual writes. CDC from PostgreSQL WAL, Debezium-style relays, ordering traps, and production guardrails.
A product manager updates a pricing tier in PostgreSQL. Your API still serves the old price from Redis for thirty seconds—or worse, until someone manually flushes a key. You add DELETE FROM cache calls in the application layer, but a background job updates the same row through a different code path and forgets the invalidation. A deploy rolls out with a new field mapping; half your pods invalidate product:123 while the other half invalidate catalog:123. The database is correct; the edge is not.
This is the cache invalidation problem at scale: the moment you introduce a read-through or write-through cache between your source of truth and clients, every write path must remember to bust the right keys. In consulting work on multi-tenant SaaS APIs, the failure mode is rarely “we forgot caching exists.” It is inconsistent invalidation across services, jobs, and admin tools that all touch the same tables.
Change data capture (CDC) watches the database’s own change log—PostgreSQL’s write-ahead log (WAL)—and emits row-level events after commit. A dedicated consumer translates those events into cache deletes, search index updates, or read-model projections. The application stops being the only place that knows how to keep derived state fresh.
What CDC is (and what it is not)
CDC streams committed row changes (insert, update, delete) from a database to downstream systems. Implementations read the replication stream (logical decoding in PostgreSQL) rather than polling tables with updated_at triggers on every row.
CDC is not:
- A replacement for transactional outbox when you need arbitrary domain events with business semantics. CDC gives you row diffs; outbox gives you curated event payloads in the same transaction as the write.
- A guarantee of immediate consistency at the edge. Consumers are asynchronous; you still operate in an eventually consistent model with a bounded staleness window.
- Free of schema coupling. Downstream handlers must understand table and column names—or you must map them deliberately.
Used well, CDC is the backbone for invalidation: one pipeline, every writer, one set of rules.
Why application-level invalidation breaks down
Naive patterns look fine in a monolith:
await db.transaction(async (tx) => {
await tx.update(products).set({ priceCents: 1999 }).where(eq(products.id, id));
});
await redis.del(`product:${id}`);
Problems accumulate as the system grows:
Multiple writers
Admin consoles, ETL backfills, support tooling, and microservices may all mutate the same table. Each must duplicate invalidation logic—and integration tests rarely cover every path.
Partial failure
The DB commit succeeds; redis.del throws. You now serve stale data until TTL expires. Retrying del from the request handler does not help if the process crashed after commit.
Wrong granularity
Deleting product:123 is insufficient when you also cache category:42:products, search:facets, and a CDN URL keyed by content hash. Invalidation rules become a distributed graph, not a single key.
Thundering herds
Blind DEL on hot keys causes cache stampedes on the next wave of traffic. You may prefer version bumps, stale-while-revalidate, or tag-based invalidation—policy that belongs in one place.
CDC moves that policy into a consumer that sees every committed change, regardless of which service performed the write.
PostgreSQL CDC: logical decoding and the WAL
PostgreSQL durably records changes in the WAL before they are visible to readers. Logical replication decodes WAL records into row-level messages for subscribers.
A typical stack:
- Publication on selected tables (
CREATE PUBLICATION cdc_pub FOR TABLE products, categories;). - Replication slot so the server retains WAL until the consumer acknowledges progress.
- Connector (Debezium, PeerDB, Sequin, or a small custom reader using
pgoutput) that reads the slot and publishes to Kafka, Redis Streams, SQS, or an internal bus. - Invalidation worker that maps
(table, operation, key columns)→ cache actions.
Important properties:
- Events reflect committed transactions. Uncommitted writes do not leak.
- You receive before/after images for updates (depending on replica identity and configuration), which helps compute which keys to bust.
- Ordering is per transaction and generally per table partition in the stream—but not globally across all tables unless you design for it.
Operational note: replication slots hold WAL. A stalled consumer can fill the disk. Monitor lag bytes, slot age, and alert before the database is at risk.
Designing invalidation rules from row events
Treat each CDC event as input to a pure function: (event) => InvalidationPlan.
Key extraction
Derive cache keys from primary keys and foreign keys present in the row:
| Table change | Example invalidation |
|---|---|
products update | product:{id}, tenant:{tenant_id}:product:{id} |
products update (price) | above + category:{category_id}:listing |
categories rename | all products in category (tag bust) or version bump on category:{id} |
Prefer tag-based or namespace version keys when fan-out is large:
tenant:acme:catalog_version → increment on any catalog mutation
product:123@v{catalog_version}
Clients embed the version in keys or request headers; a single INCR invalidates an entire slice without deleting thousands of keys.
Deletes vs tombstones
On DELETE, remove keys or write tombstone placeholders with short TTL to prevent negative cache misses from repopulating stale reads from a lagging replica.
Idempotency
CDC is at-least-once. The same update may be delivered twice after crashes. Invalidation handlers must be idempotent: deleting an already absent key is fine; incrementing a version twice for one logical change is not—use transaction id or LSN deduplication.
type CdcEvent = {
lsn: string;
op: "c" | "u" | "d"; // create, update, delete
table: string;
before: Record<string, unknown> | null;
after: Record<string, unknown> | null;
};
const processedLsn = new Set<string>(); // Use Redis SET or DB in production
export function planInvalidation(event: CdcEvent): string[] {
if (processedLsn.has(event.lsn)) return [];
processedLsn.add(event.lsn);
if (event.table === "products" && event.after) {
const id = String(event.after.id);
const tenantId = String(event.after.tenant_id);
const categoryId = String(event.after.category_id);
return [
`product:${id}`,
`tenant:${tenantId}:product:${id}`,
`tenant:${tenantId}:category:${categoryId}:listing`,
];
}
if (event.table === "products" && event.op === "d" && event.before) {
const id = String(event.before.id);
return [`product:${id}`];
}
return [];
}
Keep this mapping in one module owned by the platform team, version-controlled, and covered by unit tests with fixture events—not scattered in ORM hooks.
Ordering, concurrency, and stale reads
CDC does not eliminate read-your-writes challenges by itself.
Consumer lag
If invalidation lags 500 ms behind commits, APIs that read from cache can return stale data for half a second after an update. Product requirements dictate whether that is acceptable or whether write path should also bump a session-scoped version (see the article on read-your-writes consistency).
Out-of-order delivery
Across partitions or multiple tables, events may arrive out of order relative to business causality. Mitigations:
- Per-aggregate sequencing: partition Kafka by
product_idso all product events serialize. - Monotonic versions on rows (
versioncolumn) and ignore older versions in the consumer. - Last-write-wins only when the business allows it.
Dual reads
A request that hits PostgreSQL replica A and cache filled from primary B can still see anomalies. CDC invalidation fixes cache vs primary drift; replica lag is a separate routing concern.
CDC vs alternatives
| Approach | Pros | Cons |
|---|---|---|
| App-level DEL | Simple in one service | Misses alternate writers; dual-write failure modes |
LISTEN/NOTIFY | Low latency, native PG | Payload limits; no persistence; easy to miss messages |
| Transactional outbox | Rich domain events; same-tx guarantee | Requires app code to insert outbox rows; relay to build |
| CDC (WAL) | Captures all writers; no app hooks | Row-level; schema coupling; slot ops burden |
| TTL only | Zero invalidation code | Stale windows; poor for fast-changing data |
Many production systems combine outbox for domain events with CDC for cache and search index hygiene, or use CDC alone when row-level granularity is enough.
Practical example: invalidation worker sketch
The following worker consumes normalized CDC events, deduplicates by LSN, executes Redis deletes, and records metrics. It is illustrative—wire your connector to Kafka, NATS, or a managed CDC product in production.
import { createClient } from "redis";
type InvalidationEvent = {
lsn: string;
table: string;
op: "c" | "u" | "d";
before: Record<string, unknown> | null;
after: Record<string, unknown> | null;
};
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
const DEDUPE_TTL_SEC = 86_400;
async function alreadyProcessed(lsn: string): Promise<boolean> {
const key = `cdc:seen:${lsn}`;
const set = await redis.set(key, "1", { NX: true, EX: DEDUPE_TTL_SEC });
return set === null;
}
function keysForProduct(row: Record<string, unknown>): string[] {
const id = String(row.id);
const tenantId = String(row.tenant_id);
const categoryId = String(row.category_id);
return [
`product:${id}`,
`tenant:${tenantId}:product:${id}`,
`tenant:${tenantId}:category:${categoryId}:listing`,
];
}
export async function handleInvalidation(event: InvalidationEvent): Promise<void> {
if (await alreadyProcessed(event.lsn)) return;
let keys: string[] = [];
if (event.table === "products") {
if (event.op === "d" && event.before) {
keys = keysForProduct(event.before);
} else if (event.after) {
keys = keysForProduct(event.after);
}
}
if (event.table === "categories" && event.after) {
const tenantId = String(event.after.tenant_id);
const categoryId = String(event.after.id);
keys.push(`tenant:${tenantId}:category:${categoryId}:listing`);
// Tag-style: bump catalog version instead of scanning all product keys
await redis.incr(`tenant:${tenantId}:catalog_version`);
}
if (keys.length > 0) {
await redis.del(keys);
}
// metrics: invalidation_keys_total, lag_ms, etc.
}
Pair this with single-flight or request coalescing on hot keys if deletes trigger stampedes—patterns covered in request coalescing.
Operating CDC in production
Slot and WAL monitoring
Alert on pg_replication_slots lag, disk usage, and consumer heartbeat. A broken deploy that stops the consumer is a database availability incident waiting to happen.
Schema migrations
ALTER TABLE can break logical decoding or change message shapes. Use expand-contract migrations and coordinate connector schema versions with deploy order—the same discipline as zero-downtime column renames.
Backfills
Historical imports generate a firehose of events. Pause invalidation or switch to bulk version bump during large backfills to avoid Redis CPU spikes.
Security
WAL streams contain row data. Treat the bus as sensitive: encrypt in transit, restrict ACLs, and avoid logging full payloads in cleartext.
Testing
Maintain a fixture library of Debezium JSON events for representative inserts, updates, deletes, and TOAST-heavy columns. Contract-test the invalidation planner without a live database.
Common mistakes and pitfalls
- Invalidating only from the happy-path API while migrations, support scripts, and async jobs write directly to the database. CDC exists precisely to close that gap—if you still skip alternate writers, you have not finished the design.
- Ignoring replication slot lag until disk fills. Monitor lag as a first-class SLO, not a dashboard afterthought.
- Assuming CDC means strong consistency. Clients can still read stale cache until the consumer runs. Set TTL as a safety net and document expected staleness.
- Fan-out DELETE storms on wide entities. Prefer version tokens or cache tags when one row maps to thousands of keys.
- Processing events without deduplication after consumer restarts, causing redundant work or double version increments.
- Coupling business logic to raw table shapes without a mapping layer. When you rename columns, you want one place to update—not twelve microservices.
Conclusion
Cache invalidation stops being a game of “remember to call Redis” when you derive bust lists from the committed truth in PostgreSQL. CDC turns WAL entries into a reliable, writer-agnostic event stream; your job is to implement clear key rules, idempotent consumers, and operational guardrails around replication slots and lag.
The pattern pairs naturally with work on transactional outbox messaging, read-your-writes routing, and expand-contract migrations—the same ingredients teams need when moving from a single-service cache to a production-grade, multi-writer data plane.
If you are designing or hardening cache layers, search indexes, or read models around PostgreSQL, it helps to treat invalidation as infrastructure, not an afterthought in each handler. For background on how this work fits into larger engagements, see About; for collaboration or inquiries, Contact.
Newsletter abonnieren
E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.
Über Resend. Abmeldung in jeder E-Mail möglich.