Change data capture for event-driven backends: PostgreSQL, the WAL, and production trade-offs

How log-based CDC turns database writes into reliable integration events, when it beats polling or the transactional outbox relay, and what breaks in production if you skip schema discipline.

Autor: Matheus Palma11 Min. Lesezeit
Software engineeringArchitectureBackendPostgreSQLDistributed systemsMessagingReliability

Your analytics pipeline is six hours behind. Search indexes drift from the source of truth. A downstream billing service missed a subscription upgrade because the nightly batch job failed silently. The common thread is not “we need more cron jobs.” It is that state changed in PostgreSQL, but other systems learned about it too late, too indirectly, or not at all.

Change data capture (CDC) reads the database’s own replication log—the write-ahead log (WAL) in PostgreSQL—and emits a stream of row-level changes (INSERT, UPDATE, DELETE) as events. Instead of application code remembering to publish after every mutation, the database becomes the source of truth for what changed, and a connector turns those changes into messages your event bus can consume.

In consulting and product work, CDC shows up when teams outgrow ad hoc triggers, brittle polling, or hand-rolled outbox relays that add latency and database load. This article explains how log-based CDC works, how it relates to the transactional outbox, where it shines, and where it creates operational debt if you treat it as magic plumbing.

What CDC is—and what it is not

CDC is the practice of observing data mutations and making them available to external consumers in near real time. Implementations fall into three families:

ApproachMechanismLatencyDB loadTypical use
Query-basedSELECT ... WHERE updated_at > $cursor on a scheduleSeconds to minutesHigh (repeated scans)Legacy systems, low volume
Trigger-basedAFTER INSERT/UPDATE/DELETE writes to a side tableLowMedium (per-row trigger cost)Small domains, strict control
Log-basedRead WAL / binlog / redo log via replication protocolSub-second to low secondsLow (sequential read)Production event pipelines

This article focuses on log-based CDC, because that is what teams adopt when they need durable, ordered, low-overhead change streams at scale.

CDC is not:

  • A replacement for domain events with rich semantics. A WAL row change is a technical fact (“users.email changed from A to B”), not necessarily a business fact (“UserEmailChanged because the user verified a new address”).
  • A guarantee of exactly-once processing end to end. You get at-least-once from the log; consumers still need idempotency.
  • Free from schema coupling. Downstream services will deserialize payloads; sloppy schema evolution breaks them the same way API breaking changes do.

Understanding those boundaries keeps CDC in the right layer of your architecture: integration and projection, not domain modeling—unless you deliberately design for it.

How log-based CDC works in PostgreSQL

PostgreSQL durably records every committed change in the WAL before it applies to heap pages. Standby replicas consume the WAL through physical replication. Logical decoding reinterprets that stream as logical change records: which table, which operation, which column values (subject to replica identity settings).

At a high level:

Application ──► PostgreSQL (commit) ──► WAL
                                           │
                                           ▼
                              Logical decoding plugin
                                           │
                                           ▼
                              CDC connector (e.g. Debezium)
                                           │
                                           ▼
                              Kafka / Kinesis / Pub/Sub
                                           │
                                           ▼
                         Search index, warehouse, cache, microservice

The application does not call publish() on the hot path. The connector maintains a replication slot, reads decoded changes, serializes them (often as JSON or Avro), and forwards them to a message broker. If the connector falls behind, PostgreSQL retains WAL segments for that slot—so monitoring slot lag is as important as monitoring consumer lag.

Replica identity and what downstream actually sees

For UPDATE and DELETE, PostgreSQL must know which row changed and, for some consumers, what the before image looked like. That is controlled by REPLICA IDENTITY:

  • DEFAULT (primary key): sufficient for most tables with a PK.
  • FULL: includes all columns in the old row—needed when no PK exists or when you must diff wide rows without one.
  • USING INDEX: niche cases with a unique index instead of a PK.

If you omit the right replica identity, your CDC stream will contain updates with incomplete key material, and idempotent upserts downstream will fail or corrupt data silently.

Ordering and partitioning

Changes arrive in commit order on a single PostgreSQL instance. That is not global ordering across shards or multiple databases. When you publish to Kafka, you typically key messages by primary key so all changes for one entity land in one partition and preserve per-entity order.

Cross-table workflows (“order paid” involves orders and payments) still require correlation in application logic or outbox-style domain events—CDC alone will not atomically bundle two table commits into one message unless they share a transaction and you build that semantics layer yourself.

CDC versus the transactional outbox

These patterns overlap in goal—reliable downstream notification—but differ in shape. The transactional outbox stores explicit domain events in an outbox table inside the same transaction as business writes; a relay polls or CDC-reads that table and publishes.

DimensionTransactional outboxFull-table CDC
Event shapeCurated JSON you designRaw row images
NoiseOnly what you insertEvery column change, including backfills
CouplingApplication owns event schemaDatabase schema is the contract
Operational pathPoll or CDC on one tableCDC on many tables
Best forDomain events, integration boundariesProjections, analytics, search sync, cache invalidation

In practice, mature systems combine both: outbox (or application-emitted events) for bounded context boundaries, and CDC for read models, warehouses, and operational mirrors where seeing every column change is acceptable or desirable.

When a client asked me to reduce search index lag from minutes to seconds, we kept domain events on the outbox for billing and notifications, and added CDC on catalog tables so the search projection could rebuild from the log without custom publish hooks in every code path. The split kept business semantics clean while removing N+1 “remember to emit” failure modes in CRUD-heavy areas.

Connector topology and delivery semantics

Debezium is the de facto open-source stack for PostgreSQL CDC into Kafka. Alternatives (PeerDB, Estuary, managed offerings from cloud vendors) differ in ops model, but the invariants are the same:

  1. Replication slot on PostgreSQL—one per logical consumer group you cannot share blindly.
  2. Schema history topic (Debezium) or equivalent—so consumers know how to decode payload across migrations.
  3. At-least-once delivery to the broker; duplicates on restart or network retry.
  4. Offset / LSN tracking—the connector resumes from the last acknowledged log position.

Delivery guarantees stop at the broker unless you implement idempotent consumers (natural keys, upsert by PK, or deduplication stores). Treat every handler as safe under redelivery.

Initial snapshot versus streaming

First deploy usually requires a baseline snapshot of existing rows, then a switch to streaming changes from the current LSN. Connectors typically emit READ events during snapshot and CREATE/UPDATE/DELETE during streaming. Downstream jobs must handle snapshot + stream overlap (dedupe by PK and event timestamp, or use connector-specific signaling).

Skipping snapshot design is a common source of “CDC went live but half our rows are missing” incidents.

Practical example: catalog table to search projection

Consider a products table in PostgreSQL and a search service that maintains an Elasticsearch index. The goal: every insert, update, or delete in products is reflected in search within seconds, without adding publish calls to every repository method.

PostgreSQL prerequisites

-- Logical decoding requires wal_level >= logical (postgresql.conf)
-- SHOW wal_level;

ALTER TABLE products REPLICA IDENTITY DEFAULT; -- PK-based old tuple for updates/deletes

-- Optional: publication for pgoutput (native logical replication)
CREATE PUBLICATION products_pub FOR TABLE products;

Ensure the CDC database user has REPLICATION privilege and SELECT on published tables.

Debezium connector sketch (Kafka Connect)

{
  "name": "products-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "cdc_replicator",
    "database.password": "${secrets:cdc_replicator_password}",
    "database.dbname": "commerce",
    "topic.prefix": "commerce",
    "table.include.list": "public.products",
    "plugin.name": "pgoutput",
    "publication.name": "products_pub",
    "slot.name": "products_cdc_slot",
    "tombstones.on.delete": "true",
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter"
  }
}

Messages land on a topic like commerce.public.products. The key is the product PK; the value wraps before, after, and op (c, u, d, r).

Consumer: idempotent upsert and tombstones

A Node.js consumer (KafkaJS shown; principles apply everywhere):

import { Kafka } from "kafkajs";

type DebeziumPayload = {
  op: "c" | "u" | "d" | "r";
  before: Record<string, unknown> | null;
  after: Record<string, unknown> | null;
};

type ProductRow = {
  id: string;
  name: string;
  description: string;
  price_cents: number;
  updated_at: string;
};

const kafka = new Kafka({ brokers: ["kafka:9092"] });
const consumer = kafka.consumer({ groupId: "search-products-projector" });

function toSearchDoc(row: ProductRow) {
  return {
    id: row.id,
    name: row.name,
    description: row.description,
    priceCents: row.price_cents,
    updatedAt: row.updated_at,
  };
}

export async function runProjector() {
  await consumer.connect();
  await consumer.subscribe({ topic: "commerce.public.products", fromBeginning: false });

  await consumer.run({
    eachMessage: async ({ message }) => {
      if (!message.value) {
        // Tombstone on delete when tombstones.on.delete=true
        const id = message.key?.toString();
        if (id) await searchIndex.delete(id);
        return;
      }

      const envelope = JSON.parse(message.value.toString()) as { payload: DebeziumPayload };
      const { op, after } = envelope.payload;

      if (op === "d") {
        const id = envelope.payload.before?.id as string | undefined;
        if (id) await searchIndex.delete(id);
        return;
      }

      if (!after) return;

      const row = after as unknown as ProductRow;
      await searchIndex.upsert(row.id, toSearchDoc(row));
    },
  });
}

// Stub — replace with your Elasticsearch/OpenSearch client
const searchIndex = {
  async upsert(id: string, doc: Record<string, unknown>) {
    /* PUT /products/_doc/{id} */
  },
  async delete(id: string) {
    /* DELETE /products/_doc/{id} */
  },
};

Note what this handler does not assume: exactly-once delivery, rich domain language, or cross-table atomicity. It does assume idempotent upsert by PK and explicit delete handling—mandatory under redelivery.

For production, add structured logging with the Kafka offset and product id, metrics on lag and processing latency, and a dead-letter path for poison payloads (malformed JSON, unexpected null after on update).

Schema evolution without breaking consumers

CDC couples downstream services to database column names and types. When you ALTER TABLE ADD COLUMN, new fields appear in events; when you rename or drop columns, you can break deserializers that expect the old shape.

Disciplined approaches:

  1. Expand–contract migrations: add new column, dual-write or backfill, switch consumers, drop old column, as described in zero-downtime migration patterns.
  2. Schema registry (Avro/Protobuf with Debezium): enforce compatible evolution rules.
  3. Transformation layer: a stream processor maps raw CDC to a versioned integration schema before fan-out—extra moving parts, but it isolates domain services from physical table layout.

Never rename a column in one deploy without a consumer migration plan. WAL events do not wait for your deploy train.

Operational concerns that decide success

Replication slot lag and disk pressure

If the connector stops, PostgreSQL retains WAL for the slot. Unchecked lag can fill the disk and take down the primary. Alert on:

  • Replication slot lag bytes and confirmed flush LSN distance
  • Oldest unacknowledged event age on the consumer side

Have a runbook: pause non-critical consumers, scale connector throughput, or temporarily drop and recreate the slot (accepting a re-snapshot cost) under controlled change windows.

High-churn tables and TOAST

Wide rows with large text or JSON blobs generate heavy payloads. Consider separate storage for blobs (reference by id in the row CDC carries) or exclude columns via column filters where the connector supports them.

Security and PII

CDC streams often contain full row images, including fields never exposed on public APIs. Encrypt topics at rest, restrict ACLs, mask in stream processors, and treat CDC topics with the same sensitivity as the source table—not as “internal, therefore safe.”

Multi-tenant filtering

Row-level security in PostgreSQL affects what application roles see; CDC runs as a replication role that bypasses RLS. If tenants must be isolated in downstream systems, enforce tenant_id in the payload and filter in the consumer, or use separate publications per tenant schema in extreme cases.

Common mistakes and pitfalls

Treating CDC events as domain events without translation. Downstream billing should not infer business rules from raw subscriptions.status flips alone; pair with explicit domain events or a well-documented state machine mapping.

No idempotency on consumers. Connector restarts duplicate messages. Handlers that increment counters or send emails without dedupe keys will double-charge or double-notify.

Ignoring snapshot overlap. Baseline snapshots interleave with live changes; projectors that only handle streaming ops miss rows or apply stale state.

Sharing one replication slot across environments. Slots are production resources; a mispointed staging connector can stall WAL retention on prod.

Deploying CDC before fixing replica identity. Updates without PK old tuples look like mystery partial events and are painful to debug under load.

Oversubscribing every table. Full-database CDC creates noise, cost, and compliance surface. Start with tables that have clear projection consumers; expand deliberately.

Conclusion

Log-based change data capture turns PostgreSQL’s WAL into a durable, low-latency integration bus for row-level truth. It excels at search indexes, analytics pipelines, cache invalidation, and cross-service read models where seeing what the database actually stored is the right abstraction. It complements—not replaces—the transactional outbox when you need curated domain language at service boundaries.

Key takeaways:

  • Prefer log-based CDC over polling when volume, latency, or database load matter
  • Configure replica identity, publications, and slot monitoring before production traffic
  • Design consumers for at-least-once delivery with idempotent upserts and explicit deletes
  • Plan schema evolution and initial snapshots as first-class migration work
  • Combine CDC with outbox domain events where business semantics and coupling discipline matter

Teams building scalable, production-ready event pipelines often adopt CDC after the first painful rewrite of “just add another nightly sync.” If you are evaluating how PostgreSQL changes should flow through your architecture—without turning every repository into a messaging client—the contact page is a straightforward way to reach out.

Newsletter abonnieren

E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.

Über Resend. Abmeldung in jeder E-Mail möglich.