PostgreSQL transaction isolation in production APIs: what READ COMMITTED actually guarantees

How PostgreSQL isolation levels behave under concurrent writes, which anomalies still appear at READ COMMITTED, and when optimistic locking or SERIALIZABLE is worth the cost.

作者: Matheus Palma2026年6月23日约 12 分钟阅读

Software engineeringBackendPostgreSQLAPI designArchitectureReliability

Two API workers process refunds against the same order at the same time. Each reads refunded_cents = 0, computes a partial refund, and writes back refunded_cents = 2500. The business expected at most one refund of $25; the database now shows $50 returned and finance is reconciling a discrepancy that only appears under load. The handlers used a transaction. They even wrapped the read and write in BEGIN / COMMIT. So what went wrong?

The answer is almost never "PostgreSQL is broken." It is that transaction isolation is a contract about which concurrent behaviors you allow, and PostgreSQL's default—READ COMMITTED—is narrower than many developers assume. In consulting work on billing, inventory, and seat-allocation APIs, this gap between intuition and engine behavior is one of the most common sources of production bugs that are maddening to reproduce locally.

This article explains how PostgreSQL implements isolation, which anomalies remain at READ COMMITTED, the patterns that fix them without jumping straight to SERIALIZABLE, and when stronger isolation is justified.

Isolation as a contract, not a switch

A database transaction groups reads and writes into a unit that should appear atomic from the application's perspective. Isolation defines how much one transaction's in-progress work is visible to others while it runs.

The SQL standard names four levels—READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE—and classifies anomalies:

Anomaly	What the client observes
Dirty read	Reading another transaction's uncommitted writes
Non-repeatable read	Reading the same row twice in one transaction and seeing different values
Phantom read	Re-running a range query and seeing new matching rows
Write skew	Two transactions read overlapping state, each updates a disjoint part, and together they violate an invariant

PostgreSQL maps these levels to concrete behavior. The important detail for API engineers: your ORM's transaction() helper does not automatically prevent lost updates unless you choose the right isolation level and the right write pattern.

What PostgreSQL actually does at each level

READ UNCOMMITTED

PostgreSQL treats READ UNCOMMITTED as READ COMMITTED. Dirty reads do not occur. You cannot opt into them on Postgres.

READ COMMITTED (the default)

Every statement in a READ COMMITTED transaction sees a snapshot of committed data as of the moment that statement begins. That has two consequences developers routinely miss:

Non-repeatable reads are allowed within a single transaction. SELECT at the start and SELECT at the end of a long transaction can return different rows for the same predicate.
Each statement gets a fresh snapshot. A multi-statement transaction is not one frozen view of the database.

READ COMMITTED also uses row-level locks on write: when an UPDATE or DELETE targets a row, it waits for concurrent writers to finish, then re-evaluates the row against the WHERE clause. If the row no longer matches, the statement affects zero rows. That behavior is why a naive UPDATE ... WHERE status = 'pending' can silently update nothing after a race—not because the transaction failed, but because the predicate stopped matching.

REPEATABLE READ

The transaction sees a single snapshot taken at the first non-transaction-local statement. Non-repeatable reads and phantom reads (as classically defined) do not occur for plain SELECT within that snapshot.

However, concurrent inserts that would create phantoms can still cause serialization failures on write: PostgreSQL detects snapshot conflicts and aborts one transaction with SQLSTATE 40001 (serialization_failure). Application code must retry.

REPEATABLE READ in PostgreSQL is stronger than the standard's definition in some ways and uses Serializable Snapshot Isolation (SSI) machinery for conflict detection.

SERIALIZABLE

The strictest level. PostgreSQL implements it with SSI, tracking read/write dependencies to detect cycles that would correspond to a non-serializable schedule. Conflicts raise serialization_failure; successful commits behave as if transactions ran one at a time.

Cost: more monitoring overhead, more aborts under contention, and mandatory retry logic in the application for any code path that can hit 40001.

The lost update: READ COMMITTED's most expensive surprise

The refund scenario at the top is a lost update (read-modify-write race):

Tx A: SELECT refunded_cents → 0
Tx B: SELECT refunded_cents → 0
Tx A: UPDATE orders SET refunded_cents = 2500 WHERE id = $1
Tx B: UPDATE orders SET refunded_cents = 2500 WHERE id = $1
-- Both "succeed"; final value is 2500, not 5000, but business logic may have
-- issued two external refund calls—worse than the column value alone suggests.

If the code reads, calls Stripe, then writes, the database transaction boundary does not serialize the external side effect. Even a perfect database fix does not undo a duplicate API call unless idempotency keys guard the outbound request—a topic that belongs in the same design conversation.

Inside the database, the fix is to stop doing blind read-modify-write.

Pattern 1: Single-statement atomic update

Express the change as one SQL statement so the read and write happen under the row lock:

UPDATE orders
SET refunded_cents = refunded_cents + $2,
    updated_at = now()
WHERE id = $1
  AND refunded_cents + $2 <= total_cents
RETURNING refunded_cents;

If RETURNING yields no row, the invariant failed or the row was missing—return 409 Conflict or 422 Unprocessable Entity to the client. This pattern is the default choice for counters, balances, and inventory decrements when the update is expressible in SQL.

Pattern 2: Optimistic concurrency control

When the client edits a document-shaped resource and you need to reject stale writes, use a version column or ETag semantics:

UPDATE orders
SET refunded_cents = $3,
    version = version + 1,
    updated_at = now()
WHERE id = $1
  AND version = $2
RETURNING version;

Zero rows updated means another writer won—surface 409 and let the client refresh. This pairs naturally with HTTP If-Match headers; see also dedicated treatment of optimistic concurrency with ETags on this site.

Pattern 3: Pessimistic locking

When you must run multi-step logic in application code before writing:

SELECT id, refunded_cents, total_cents
FROM orders
WHERE id = $1
FOR UPDATE;

FOR UPDATE blocks other lockers until commit. Use sparingly: held locks reduce throughput and increase deadlock risk. Prefer single-statement updates or optimistic versioning when possible.

READ COMMITTED plus FOR UPDATE or atomic UPDATE is how most production APIs should start. Escalating isolation is for when predicate-based locking is awkward or invariants span multiple rows without a natural lock target.

Write skew: when row locks are not enough

Write skew appears when two transactions read overlapping state, make decisions based on what they read, and write to different rows such that a global rule breaks.

Classic example: on-call scheduling. Rule: at least one doctor must be on call. Two doctors are on call; Alice and Bob each check "am I the only one?" read that the other is still on call, and both go off call—now zero doctors on call.

Row-level FOR UPDATE on a single row does not help if each transaction updates its own row. Fixes include:

Materialize the invariant into a single row (e.g. on_call_count) updated atomically
Advisory locks for coarse coordination when the domain maps to a named lock key—see PostgreSQL advisory locks for trade-offs
SERIALIZABLE when the predicate spans many rows and refactoring the schema is costly

Write skew is the anomaly most teams discover only after shipping READ COMMITTED everywhere.

When to reach for REPEATABLE READ or SERIALIZABLE

Stronger isolation is justified when:

Multi-statement reads must be consistent for reporting or batch decisions inside one transaction (e.g. allocate inventory across several line items based on a single snapshot of stock levels)
Predicate locks are hard to express and refactoring to counters or summary rows is not practical
Correctness dominates throughput for a narrow, high-value path (financial reconciliation, seat holds during checkout)

It is not justified as a blanket replacement for READ COMMITTED because:

serialization_failure is a normal outcome, not an exceptional bug. Retry budgets, jittered backoff, and metrics on 40001 rates are required.
Hot keys become worse. Serializable conflicts scale with overlap; a popular inventory SKU under flash traffic may spend more time retrying than serving.
Long transactions amplify conflicts. Keep serializable units short—no external HTTP calls inside the transaction.

A practical escalation path in client projects:

Default READ COMMITTED
Fix races with atomic SQL, versions, or FOR UPDATE
Introduce SERIALIZABLE only on specific transactions after reproducing a write-skew or multi-row invariant failure in tests
Load-test the conflict rate before promoting to production

Interaction with connection pooling and ORMs

Poolers and prepared statements: Transaction isolation is per connection. PgBouncer transaction pooling mode returns connections to the pool after each transaction; isolation level set with SET TRANSACTION applies to the next transaction on that checkout. Document whether middleware sets isolation per request.

ORM unit-of-work: Frameworks often open READ COMMITTED implicitly. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE must run before the first query in the transaction. In Node.js with pg:

await client.query("BEGIN");
await client.query("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
// ... statements ...
await client.query("COMMIT");

Nested transactions (savepoints) do not weaken isolation on the outer transaction, but error handling around ROLLBACK TO SAVEPOINT must not swallow serialization_failure if the outer commit would still be unsafe—usually you abort the whole transaction on 40001.

Read-only replicas: Isolation guarantees apply on the primary writer. Read-your-writes routing is a separate consistency problem; do not confuse snapshot isolation on the primary with what a stale replica returns.

Practical example: seat hold with retry on serialization failure

The following sketch implements a checkout seat hold across two tables: events (capacity) and seat_holds (reserved seats). The invariant: total holds cannot exceed events.capacity. A naive read-count-then-insert under READ COMMITTED allows overbooking when two transactions both see "room left."

Using SERIALIZABLE for the hold transaction makes conflicting schedules abort; the API retries a few times with backoff.

import type { PoolClient } from "pg";

const MAX_SERIALIZATION_RETRIES = 3;

async function withSerializable<T>(
  client: PoolClient,
  fn: () => Promise<T>,
): Promise<T> {
  for (let attempt = 0; attempt < MAX_SERIALIZATION_RETRIES; attempt++) {
    await client.query("BEGIN");
    await client.query("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");
    try {
      const result = await fn();
      await client.query("COMMIT");
      return result;
    } catch (err: unknown) {
      await client.query("ROLLBACK");
      const code =
        typeof err === "object" &&
        err !== null &&
        "code" in err &&
        (err as { code?: string }).code;
      if (code === "40001" && attempt < MAX_SERIALIZATION_RETRIES - 1) {
        await sleep(10 * 2 ** attempt + Math.random() * 10);
        continue;
      }
      throw err;
    }
  }
  throw new Error("unreachable");
}

function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

export async function createSeatHold(
  client: PoolClient,
  input: { eventId: string; userId: string; seats: number },
): Promise<{ holdId: string }> {
  return withSerializable(client, async () => {
    const { rows: capRows } = await client.query<{ capacity: number }>(
      `SELECT capacity FROM events WHERE id = $1`,
      [input.eventId],
    );
    const capacity = capRows[0]?.capacity;
    if (capacity == null) {
      throw new Error("event_not_found");
    }

    const { rows: sumRows } = await client.query<{ held: string }>(
      `SELECT COALESCE(SUM(seats), 0)::text AS held
       FROM seat_holds
       WHERE event_id = $1 AND expires_at > now()`,
      [input.eventId],
    );
    const held = Number(sumRows[0]?.held ?? 0);
    if (held + input.seats > capacity) {
      throw new Error("sold_out");
    }

    const { rows: insertRows } = await client.query<{ id: string }>(
      `INSERT INTO seat_holds (event_id, user_id, seats, expires_at)
       VALUES ($1, $2, $3, now() + interval '15 minutes')
       RETURNING id`,
      [input.eventId, input.userId, input.seats],
    );

    return { holdId: insertRows[0].id };
  });
}

Production code would map sold_out to 409, add idempotency keys for the hold endpoint, emit metrics on 40001 retries, and likely add a partial unique index or capacity counter row if serializable conflict rates rise under peak sales. The example shows the control flow: strong isolation plus explicit retries, not stronger isolation alone.

An alternative schema-level approach—often better at scale—is a single events.seats_remaining column decremented with:

UPDATE events
SET seats_remaining = seats_remaining - $2
WHERE id = $1 AND seats_remaining >= $2
RETURNING seats_remaining;

That stays on READ COMMITTED and avoids serializable retries. Choose based on measured conflict rates and how naturally the domain fits atomic decrements.

Common mistakes and pitfalls

Assuming transaction { } prevents races. Wrapping read-modify-write in a transaction without locking or versioning still loses updates under READ COMMITTED.
Ignoring UPDATE row counts. A zero-row UPDATE is often success from the driver's perspective. Check rowCount and fail loudly.
Calling external APIs inside serializable transactions. Long-held snapshots increase aborts and tie DB locks to network latency—commit database state first, then enqueue side effects (transactional outbox helps).
Using SERIALIZABLE without retry logic. Clients will see intermittent 500 errors that disappear on manual retry—exactly the failure mode users hate.
Mixing isolation levels ad hoc. Document which code paths use which level; integration tests should hammer concurrent workers on critical invariants.
Testing only on empty tables. Lost updates and write skew need concurrent sessions; a single Jest worker against a local Postgres with one connection hides them.

Conclusion

PostgreSQL's default READ COMMITTED is a deliberate trade-off: good throughput, no dirty reads, and no guarantee that your multi-statement business logic executes as if alone unless you design for it. Most API bugs attributed to "race conditions" are really missing write patterns—atomic updates, version checks, or targeted row locks—not a missing SERIALIZABLE everywhere.

Escalate isolation when tests prove multi-row invariants still break after schema and SQL patterns are exhausted, and pair REPEATABLE READ / SERIALIZABLE with retry-aware application code and monitoring on serialization_failure. Getting this right early in billing, inventory, and allocation services saves weeks of forensic SQL later.

For teams shipping production Postgres-backed APIs and want a second pair of eyes on concurrency design, I work with clients on exactly these boundaries—schema shape, transaction boundaries, and HTTP semantics that match what the database can promise. See About and Contact for more on how I help teams build systems that stay correct under load.

新文章发布时收到邮件。无垃圾信息 — 仅本博客的新文章通知。

由 Resend 发送，可在邮件中退订。