Change data capture for reactive backends: cache invalidation, read models, and operational reality

Use CDC to turn database writes into durable change streams for cache busting and derived views. WAL semantics, consumer design, backfills, and when simpler patterns still win.

Autor: Matheus Palma18. Juni 202610 Min. Lesezeit

Software engineeringArchitectureBackendPostgreSQLCachingDistributed systemsReliability

A product manager updates pricing in the admin panel. The write commits in PostgreSQL; the API still serves yesterday's numbers from Redis for thirty seconds because your TTL was a guess. You shorten the TTL and watch database load climb. You add LISTEN/NOTIFY and discover missed notifications after deploys. You wire an outbox table and a worker, but cache invalidation still requires every write path to remember to enqueue the right event—and a bulk import from a forgotten script bypasses application code entirely.

Change data capture (CDC) shifts the source of truth for "something changed" from application discipline to the database's own change log. Instead of hoping every code path emits the right signal, you consume a stream of row-level inserts, updates, and deletes and react: invalidate cache keys, refresh materialized views, project read models, or fan out to search indexes. In consulting work on multi-tenant SaaS and read-heavy APIs, CDC is the pattern I reach for when freshness must track writes without coupling every feature team to a bespoke invalidation checklist.

This article explains how CDC works in production, how it differs from adjacent patterns, and how to build consumers that stay correct under retries, schema changes, and backfills.

What CDC is—and what it is not

Change data capture is the practice of extracting ordered, durable change events from a datastore and delivering them to downstream systems.

Typical event shape (conceptual):

{
  "op": "u",
  "table": "products",
  "before": { "id": "p_42", "price_cents": 1999, "version": 7 },
  "after": { "id": "p_42", "price_cents": 2499, "version": 8 },
  "source": { "lsn": "0/16B3748", "ts_ms": 1718712000123 }
}

CDC is not:

Application event sourcing, where domain events are the primary write model. CDC observes physical row changes; it does not know user intent ("price changed because of promotion" vs "manual correction").
A replacement for the transactional outbox when you need rich domain payloads. CDC carries what the row looks like, not necessarily what the business meant. Many systems use both: outbox for workflow integration, CDC for projections and cache coherence.
LISTEN/NOTIFY at scale. Postgres NOTIFY is ephemeral and payload-limited; CDC reads the write-ahead log (WAL) or equivalent and can replay from a position after outages.

The value proposition is decoupling: writers stay simple (update rows); reactors subscribe to truth as it lands in storage.

Capture mechanisms and trade-offs

Log-based CDC (WAL, binlog, oplog)

Tools such as Debezium (Kafka Connect), AWS DMS, or managed logical replication slots read the database's replication stream.

Strengths:

Captures every committed change, including migrations, admin scripts, and services that bypass your API layer.
Durable offsets (LSN in Postgres, binlog position in MySQL) enable replay and at-least-once recovery.
Natural fit for event-driven architectures already running Kafka or similar.

Costs:

Operational complexity: connectors, schema registry, monitoring lag, slot disk retention.
Schema evolution must be managed—adding a NOT NULL column without a default can break consumers expecting old shapes.
Postgres logical replication slots hold WAL until consumed; a stalled consumer can fill disk and halt writes.

Trigger- or table-based CDC

A trigger writes each change into an audit or change_log table; a poller or LISTEN/NOTIFY fan-out processes it.

Strengths: simpler mental model, full control over payload shape, works when managed cloud DBs restrict logical replication.

Weaknesses: trigger overhead on hot tables, another write path to maintain, and you must ensure the log table itself is durable and indexed for polling.

Query-based (polling timestamps)

SELECT * FROM products WHERE updated_at > $cursor on an interval.

Strengths: trivial to prototype.

Weaknesses: misses deletes unless you soft-delete, struggles with clock skew, and creates read amplification. Fine for low-volume reporting; poor for cache invalidation at scale.

For production reactive backends on Postgres, log-based CDC is the default recommendation when infrastructure allows it. Trigger-based logs are a reasonable stepping stone; polling updated_at is a stopgap with known holes.

Designing consumers for cache invalidation

Cache invalidation via CDC is not "delete everything on any write." You want targeted, idempotent reactions keyed by what actually changed.

Map table changes to cache key namespaces

Define a invalidation map from (table, changed columns) to key patterns:

Table change	Invalidate
`products` row update (`price_cents`, `title`)	`product:{id}`, `catalog:tenant:{tenant_id}`
`order_items` insert	`order:{order_id}`, `cart_summary:{user_id}`
`tenants` settings update	`tenant_config:{tenant_id}` (broad but rare)

Column-level filtering avoids busting an entire product cache when only last_viewed_at changes.

Version stamps beat blind deletes

When possible, publish a monotonic version per aggregate alongside invalidation:

type InvalidationHint = {
  tenantId: string;
  resource: "product";
  resourceId: string;
  version: number;
};

function cacheKey(hint: InvalidationHint): string {
  return `product:${hint.tenantId}:${hint.resourceId}:v${hint.version}`;
}

Readers compare cached version to the version in the CDC event (or a lightweight metadata row). Stale entries self-identify without a global flush. This pattern pairs well with the read-your-writes techniques covered elsewhere: routing tokens and version checks converge faster than TTL alone.

Idempotency and ordering

CDC delivers at-least-once. The same update may be replayed after a consumer crash. Invalidation handlers must be idempotent: deleting a Redis key twice is fine; decrementing a counter twice is not.

Ordering matters per aggregate, not globally. Two unrelated product updates can be processed concurrently; two updates to the same product_id should be applied in log order (or use last-write-wins with a version column you trust).

Debouncing and coalescing

A bulk import updating 50,000 rows can emit 50,000 events. Coalesce in the consumer:

Buffer events per product_id for 50–200 ms and keep only the latest version.
Or invalidate segment keys (catalog:tenant:acme) once per batch window instead of per row.

The right window depends on freshness SLOs. A pricing page may tolerate 500 ms; a trading ledger does not.

Derived read models and projections

CDC shines when read paths need a shape different from normalized writes.

Example: a dashboard showing orders_count and revenue_cents per tenant. Maintaining counters in the write transaction works but couples hot paths. A projection consumer applies:

on orders INSERT where status = 'paid':
  increment tenant_stats.revenue_cents by amount
  increment tenant_stats.orders_count by 1

The write path stays a single INSERT INTO orders; the projector catches up asynchronously. Lag becomes a metric (projection_lag_seconds) you alert on, not a hidden correctness bug.

Backfill strategy: when you add a new projection table, snapshot current state with a batch query, then start the CDC consumer from a known LSN after the snapshot transaction commits—or use a two-phase cutover documented in your runbook. Starting the consumer before the snapshot finishes produces gaps or double-counts unless you reconcile.

Placement in the architecture

A typical flow:

┌─────────────┐     WAL / logical repl     ┌──────────────┐
│  PostgreSQL │ ─────────────────────────► │ CDC connector │
└─────────────┘                          └──────┬───────┘
                                                │
                     ┌──────────────────────────┼──────────────────────────┐
                     ▼                          ▼                          ▼
              ┌─────────────┐           ┌─────────────┐            ┌─────────────┐
              │ Kafka topic │           │ Cache       │            │ Search /    │
              │ (optional)  │           │ invalidator │            │ analytics   │
              └─────────────┘           └─────────────┘            └─────────────┘

You do not always need Kafka. Smaller teams run the connector directly into a worker process that invalidates Redis and updates projections—fewer moving parts, but less buffer during spikes.

Practical example: TypeScript consumer with coalescing

The following sketch processes Debezium-style change events, filters relevant columns, coalesces per product, and invalidates Redis keys. It assumes events arrive on a queue you already operate.

import { createClient } from "redis";

type CdcEvent = {
  op: "c" | "u" | "d";
  table: string;
  before: Record<string, unknown> | null;
  after: Record<string, unknown> | null;
};

const PRICE_FIELDS = new Set(["price_cents", "title", "status"]);
const COALESCE_MS = 100;

const pending = new Map<string, { tenantId: string; productId: string; version: number }>();
let flushTimer: NodeJS.Timeout | null = null;

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

function scheduleFlush() {
  if (flushTimer) return;
  flushTimer = setTimeout(flushPending, COALESCE_MS);
}

function relevantProductChange(event: CdcEvent): boolean {
  if (event.table !== "products") return false;
  if (event.op === "d") return true;
  if (event.op === "c") return true;
  const before = event.before ?? {};
  const after = event.after ?? {};
  return [...PRICE_FIELDS].some((f) => before[f] !== after[f]);
}

export async function handleCdcEvent(event: CdcEvent): Promise<void> {
  if (!relevantProductChange(event)) return;

  const row = event.op === "d" ? event.before : event.after;
  if (!row) return;

  const tenantId = String(row.tenant_id);
  const productId = String(row.id);
  const version = Number(row.version ?? 0);
  const key = `${tenantId}:${productId}`;

  const existing = pending.get(key);
  if (!existing || version >= existing.version) {
    pending.set(key, { tenantId, productId, version });
  }
  scheduleFlush();
}

async function flushPending(): Promise<void> {
  flushTimer = null;
  const batch = [...pending.values()];
  pending.clear();

  const pipeline = redis.multi();
  const tenantSegments = new Set<string>();

  for (const item of batch) {
    pipeline.del(`product:${item.tenantId}:${item.productId}`);
    tenantSegments.add(`catalog:${item.tenantId}`);
  }

  for (const segment of tenantSegments) {
    pipeline.incr(`${segment}:gen`);
  }

  await pipeline.exec();
}

Notes on this design:

Column filter avoids invalidation noise.
Coalescing collapses bursty imports.
Segment generation counters (catalog:tenant:gen) let edge caches check freshness without listing every product key—useful when CDN or in-process caches key off tenant catalogs.
Handlers remain idempotent: repeated events only delete keys that are already gone.

Wire this behind your queue's consumer group with manual offset commits after successful flush, or accept duplicate invalidations (safe here) with auto-commit.

Operational concerns you cannot skip

Lag and alerting

Track end-to-end lag: event.source.ts_ms to processing time. Alert when lag exceeds product thresholds (for example, p95 > 5 s for catalog pages). Users experience lag as "the bug is still there" even when the pipeline is technically healthy but slow.

Schema changes

Use expand-contract migrations: add nullable columns first, deploy consumers that tolerate both shapes, backfill, then enforce NOT NULL and remove old fields. CDC will emit before/after images; consumers breaking on unknown columns is a common outage during rapid schema iteration.

Deletes and tombstones

Hard deletes emit op: "d" with before populated. Soft deletes look like updates (deleted_at set). Your invalidation map must handle both. Search indexes often need explicit tombstone events to remove documents.

Security and PII

CDC streams replicate row contents. Restrict topic access, encrypt at rest, and scrub fields in the connector when regulations require it. A cache invalidator rarely needs full row payloads—a compact { table, id, version } envelope is safer.

When CDC is overkill

Skip CDC when:

Traffic is low and short TTLs plus occasional stale reads are acceptable.
You have one monolithic service and can invalidate in-process after every write.
You need rich domain events for workflows—use the transactional outbox instead or in addition.
Your team cannot operate replication slots and monitor WAL retention yet—fix operational readiness first or use trigger-based change logs as training wheels.

Common mistakes and pitfalls

Treating CDC as exactly-once. Connectors and consumers replay. Design for at-least-once with idempotent side effects.
Invalidating without column filters. Every UPDATE touching updated_at on a wide table floods Redis and negates cache benefits.
Ignoring bulk operations. Imports and UPDATE ... WHERE tenant_id = $1 generate event storms; coalesce or pause projections during maintenance windows.
Starting consumers before backfill completes. New projections show partial state forever unless you reconcile.
Letting replication slots stall. A forgotten dev connector against production can halt writes when WAL disk fills. Automate slot lag alerts and ownership tags.
Coupling business workflows to raw CDC. Row-level events are poor substitutes for "OrderShipped" domain semantics—keep workflow integration on the outbox or explicit domain events.

Conclusion

Change data capture turns the database into a durable broadcast channel for everything that actually committed—scripts, admin tools, and microservices included. Used well, it gives you targeted cache invalidation and maintainable read models without scattering invalidation calls across every write path. Used poorly, it becomes another distributed pipeline you do not monitor until pricing is wrong in production.

The durable ingredients are the same as for any reactive system: idempotent consumers, explicit freshness metrics, schema discipline, and a clear map from table changes to cache and projection boundaries. If you are designing read-heavy backends or untangling stale-cache incidents, it helps to choose capture mechanics and consumer semantics deliberately rather than copying a Kafka diagram from a conference talk.

For more on how this site approaches production engineering, see About; for collaboration on scalable systems, Contact.

E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.

Über Resend. Abmeldung in jeder E-Mail möglich.