Change data capture for cache invalidation: PostgreSQL WAL, ordering, and eventual consistency

Invalidate Redis and CDN caches from database commits without dual writes. CDC from PostgreSQL WAL, Debezium-style relays, ordering traps, and production guardrails.

Autor: Matheus Palma10 min de leitura
Software engineeringArchitectureBackendPostgreSQLRedisReliabilityCaching

A product manager updates a pricing tier in PostgreSQL. Your API still serves the old price from Redis for thirty seconds—or worse, until someone manually flushes a key. You add DELETE FROM cache calls in the application layer, but a background job updates the same row through a different code path and forgets the invalidation. A deploy rolls out with a new field mapping; half your pods invalidate product:123 while the other half invalidate catalog:123. The database is correct; the edge is not.

This is the cache invalidation problem at scale: the moment you introduce a read-through or write-through cache between your source of truth and clients, every write path must remember to bust the right keys. In consulting work on multi-tenant SaaS APIs, the failure mode is rarely “we forgot caching exists.” It is inconsistent invalidation across services, jobs, and admin tools that all touch the same tables.

Change data capture (CDC) watches the database’s own change log—PostgreSQL’s write-ahead log (WAL)—and emits row-level events after commit. A dedicated consumer translates those events into cache deletes, search index updates, or read-model projections. The application stops being the only place that knows how to keep derived state fresh.

What CDC is (and what it is not)

CDC streams committed row changes (insert, update, delete) from a database to downstream systems. Implementations read the replication stream (logical decoding in PostgreSQL) rather than polling tables with updated_at triggers on every row.

CDC is not:

  • A replacement for transactional outbox when you need arbitrary domain events with business semantics. CDC gives you row diffs; outbox gives you curated event payloads in the same transaction as the write.
  • A guarantee of immediate consistency at the edge. Consumers are asynchronous; you still operate in an eventually consistent model with a bounded staleness window.
  • Free of schema coupling. Downstream handlers must understand table and column names—or you must map them deliberately.

Used well, CDC is the backbone for invalidation: one pipeline, every writer, one set of rules.

Why application-level invalidation breaks down

Naive patterns look fine in a monolith:

await db.transaction(async (tx) => {
  await tx.update(products).set({ priceCents: 1999 }).where(eq(products.id, id));
});
await redis.del(`product:${id}`);

Problems accumulate as the system grows:

Multiple writers

Admin consoles, ETL backfills, support tooling, and microservices may all mutate the same table. Each must duplicate invalidation logic—and integration tests rarely cover every path.

Partial failure

The DB commit succeeds; redis.del throws. You now serve stale data until TTL expires. Retrying del from the request handler does not help if the process crashed after commit.

Wrong granularity

Deleting product:123 is insufficient when you also cache category:42:products, search:facets, and a CDN URL keyed by content hash. Invalidation rules become a distributed graph, not a single key.

Thundering herds

Blind DEL on hot keys causes cache stampedes on the next wave of traffic. You may prefer version bumps, stale-while-revalidate, or tag-based invalidation—policy that belongs in one place.

CDC moves that policy into a consumer that sees every committed change, regardless of which service performed the write.

PostgreSQL CDC: logical decoding and the WAL

PostgreSQL durably records changes in the WAL before they are visible to readers. Logical replication decodes WAL records into row-level messages for subscribers.

A typical stack:

  1. Publication on selected tables (CREATE PUBLICATION cdc_pub FOR TABLE products, categories;).
  2. Replication slot so the server retains WAL until the consumer acknowledges progress.
  3. Connector (Debezium, PeerDB, Sequin, or a small custom reader using pgoutput) that reads the slot and publishes to Kafka, Redis Streams, SQS, or an internal bus.
  4. Invalidation worker that maps (table, operation, key columns) → cache actions.

Important properties:

  • Events reflect committed transactions. Uncommitted writes do not leak.
  • You receive before/after images for updates (depending on replica identity and configuration), which helps compute which keys to bust.
  • Ordering is per transaction and generally per table partition in the stream—but not globally across all tables unless you design for it.

Operational note: replication slots hold WAL. A stalled consumer can fill the disk. Monitor lag bytes, slot age, and alert before the database is at risk.

Designing invalidation rules from row events

Treat each CDC event as input to a pure function: (event) => InvalidationPlan.

Key extraction

Derive cache keys from primary keys and foreign keys present in the row:

Table changeExample invalidation
products updateproduct:{id}, tenant:{tenant_id}:product:{id}
products update (price)above + category:{category_id}:listing
categories renameall products in category (tag bust) or version bump on category:{id}

Prefer tag-based or namespace version keys when fan-out is large:

tenant:acme:catalog_version  →  increment on any catalog mutation
product:123@v{catalog_version}

Clients embed the version in keys or request headers; a single INCR invalidates an entire slice without deleting thousands of keys.

Deletes vs tombstones

On DELETE, remove keys or write tombstone placeholders with short TTL to prevent negative cache misses from repopulating stale reads from a lagging replica.

Idempotency

CDC is at-least-once. The same update may be delivered twice after crashes. Invalidation handlers must be idempotent: deleting an already absent key is fine; incrementing a version twice for one logical change is not—use transaction id or LSN deduplication.

type CdcEvent = {
  lsn: string;
  op: "c" | "u" | "d"; // create, update, delete
  table: string;
  before: Record<string, unknown> | null;
  after: Record<string, unknown> | null;
};

const processedLsn = new Set<string>(); // Use Redis SET or DB in production

export function planInvalidation(event: CdcEvent): string[] {
  if (processedLsn.has(event.lsn)) return [];
  processedLsn.add(event.lsn);

  if (event.table === "products" && event.after) {
    const id = String(event.after.id);
    const tenantId = String(event.after.tenant_id);
    const categoryId = String(event.after.category_id);
    return [
      `product:${id}`,
      `tenant:${tenantId}:product:${id}`,
      `tenant:${tenantId}:category:${categoryId}:listing`,
    ];
  }

  if (event.table === "products" && event.op === "d" && event.before) {
    const id = String(event.before.id);
    return [`product:${id}`];
  }

  return [];
}

Keep this mapping in one module owned by the platform team, version-controlled, and covered by unit tests with fixture events—not scattered in ORM hooks.

Ordering, concurrency, and stale reads

CDC does not eliminate read-your-writes challenges by itself.

Consumer lag

If invalidation lags 500 ms behind commits, APIs that read from cache can return stale data for half a second after an update. Product requirements dictate whether that is acceptable or whether write path should also bump a session-scoped version (see the article on read-your-writes consistency).

Out-of-order delivery

Across partitions or multiple tables, events may arrive out of order relative to business causality. Mitigations:

  • Per-aggregate sequencing: partition Kafka by product_id so all product events serialize.
  • Monotonic versions on rows (version column) and ignore older versions in the consumer.
  • Last-write-wins only when the business allows it.

Dual reads

A request that hits PostgreSQL replica A and cache filled from primary B can still see anomalies. CDC invalidation fixes cache vs primary drift; replica lag is a separate routing concern.

CDC vs alternatives

ApproachProsCons
App-level DELSimple in one serviceMisses alternate writers; dual-write failure modes
LISTEN/NOTIFYLow latency, native PGPayload limits; no persistence; easy to miss messages
Transactional outboxRich domain events; same-tx guaranteeRequires app code to insert outbox rows; relay to build
CDC (WAL)Captures all writers; no app hooksRow-level; schema coupling; slot ops burden
TTL onlyZero invalidation codeStale windows; poor for fast-changing data

Many production systems combine outbox for domain events with CDC for cache and search index hygiene, or use CDC alone when row-level granularity is enough.

Practical example: invalidation worker sketch

The following worker consumes normalized CDC events, deduplicates by LSN, executes Redis deletes, and records metrics. It is illustrative—wire your connector to Kafka, NATS, or a managed CDC product in production.

import { createClient } from "redis";

type InvalidationEvent = {
  lsn: string;
  table: string;
  op: "c" | "u" | "d";
  before: Record<string, unknown> | null;
  after: Record<string, unknown> | null;
};

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const DEDUPE_TTL_SEC = 86_400;

async function alreadyProcessed(lsn: string): Promise<boolean> {
  const key = `cdc:seen:${lsn}`;
  const set = await redis.set(key, "1", { NX: true, EX: DEDUPE_TTL_SEC });
  return set === null;
}

function keysForProduct(row: Record<string, unknown>): string[] {
  const id = String(row.id);
  const tenantId = String(row.tenant_id);
  const categoryId = String(row.category_id);
  return [
    `product:${id}`,
    `tenant:${tenantId}:product:${id}`,
    `tenant:${tenantId}:category:${categoryId}:listing`,
  ];
}

export async function handleInvalidation(event: InvalidationEvent): Promise<void> {
  if (await alreadyProcessed(event.lsn)) return;

  let keys: string[] = [];

  if (event.table === "products") {
    if (event.op === "d" && event.before) {
      keys = keysForProduct(event.before);
    } else if (event.after) {
      keys = keysForProduct(event.after);
    }
  }

  if (event.table === "categories" && event.after) {
    const tenantId = String(event.after.tenant_id);
    const categoryId = String(event.after.id);
    keys.push(`tenant:${tenantId}:category:${categoryId}:listing`);
    // Tag-style: bump catalog version instead of scanning all product keys
    await redis.incr(`tenant:${tenantId}:catalog_version`);
  }

  if (keys.length > 0) {
    await redis.del(keys);
  }

  // metrics: invalidation_keys_total, lag_ms, etc.
}

Pair this with single-flight or request coalescing on hot keys if deletes trigger stampedes—patterns covered in request coalescing.

Operating CDC in production

Slot and WAL monitoring

Alert on pg_replication_slots lag, disk usage, and consumer heartbeat. A broken deploy that stops the consumer is a database availability incident waiting to happen.

Schema migrations

ALTER TABLE can break logical decoding or change message shapes. Use expand-contract migrations and coordinate connector schema versions with deploy order—the same discipline as zero-downtime column renames.

Backfills

Historical imports generate a firehose of events. Pause invalidation or switch to bulk version bump during large backfills to avoid Redis CPU spikes.

Security

WAL streams contain row data. Treat the bus as sensitive: encrypt in transit, restrict ACLs, and avoid logging full payloads in cleartext.

Testing

Maintain a fixture library of Debezium JSON events for representative inserts, updates, deletes, and TOAST-heavy columns. Contract-test the invalidation planner without a live database.

Common mistakes and pitfalls

  • Invalidating only from the happy-path API while migrations, support scripts, and async jobs write directly to the database. CDC exists precisely to close that gap—if you still skip alternate writers, you have not finished the design.
  • Ignoring replication slot lag until disk fills. Monitor lag as a first-class SLO, not a dashboard afterthought.
  • Assuming CDC means strong consistency. Clients can still read stale cache until the consumer runs. Set TTL as a safety net and document expected staleness.
  • Fan-out DELETE storms on wide entities. Prefer version tokens or cache tags when one row maps to thousands of keys.
  • Processing events without deduplication after consumer restarts, causing redundant work or double version increments.
  • Coupling business logic to raw table shapes without a mapping layer. When you rename columns, you want one place to update—not twelve microservices.

Conclusion

Cache invalidation stops being a game of “remember to call Redis” when you derive bust lists from the committed truth in PostgreSQL. CDC turns WAL entries into a reliable, writer-agnostic event stream; your job is to implement clear key rules, idempotent consumers, and operational guardrails around replication slots and lag.

The pattern pairs naturally with work on transactional outbox messaging, read-your-writes routing, and expand-contract migrations—the same ingredients teams need when moving from a single-service cache to a production-grade, multi-writer data plane.

If you are designing or hardening cache layers, search indexes, or read models around PostgreSQL, it helps to treat invalidation as infrastructure, not an afterthought in each handler. For background on how this work fits into larger engagements, see About; for collaboration or inquiries, Contact.

Assine a newsletter

Receba um e-mail quando novos artigos forem publicados. Sem spam — apenas novos posts deste blog.

Via Resend. Você pode cancelar a inscrição em qualquer e-mail.