PostgreSQL connection pooling in Kubernetes: sizing pools, PgBouncer, and Node.js pitfalls

Right-size Node.js DB pools behind replicas so you do not exhaust PostgreSQL max_connections. PgBouncer modes, prepared statements, checkout timeouts, and what to measure in production.

Autor: Matheus Palma7 min de leitura
Software engineeringBackendPostgreSQLKubernetesNode.jsInfrastructure

You scale a Node.js API from four pods to forty. Traffic is healthy, CPU is fine, and then Postgres starts rejecting connections: FATAL: sorry, too many clients already. The fix is rarely “raise max_connections” forever—each connection consumes memory on the server, and idle clients still count. The durable pattern is two layers of pooling: a small, bounded pool inside each application process, and often a shared pooler (PgBouncer, RDS Proxy, or the cloud vendor’s proxy) in front of the database so hundreds of pods do not map one-to-one to server backends.

This article walks through why connection storms happen in containerized backends, how to size pools from real constraints (max_connections, memory, transaction duration), and the sharp edges around PgBouncer transaction pooling and prepared statements—topics that come up repeatedly when helping teams harden APIs for steady production load rather than laptop-scale demos.

Why “one connection per request” breaks at the edge

PostgreSQL uses a process per connection model. Each backend holds buffers, lock state, and catalog caches. Opening a connection is also comparatively expensive (TCP + TLS + auth + startup). Application frameworks therefore reuse TCP sessions via a client-side pool (pg.Pool in node-postgres, similar abstractions in ORMs).

In Kubernetes, the arithmetic is brutal:

  • replicas × processes_per_pod × pool_max must stay below what the database can accept after you reserve headroom for admin tools, migrations, and other services.
  • Autoscaling multiplies the same configured pool. If pool.max is 20 and HPA runs 50 pods, you are asking for 1,000 server backends before any background job or analyst tool connects.

The failure mode is not gradual: once you approach the limit, latency spikes as clients block waiting for a slot; then you hit hard errors. That is why pooling is a capacity topic, not a micro-optimization.

Client pools in Node.js: what pg.Pool is really doing

pg.Pool maintains a set of established connections and hands them to callers for the duration of a query or transaction. Important properties:

  • max caps concurrent server backends from that pool instance, not “total queries”—short queries release the connection quickly; long transactions hold it.
  • idleTimeoutMillis evicts unused sockets so you do not keep warm connections forever after traffic drops (useful, but tune with care if you churn connections too aggressively and pay setup cost on every burst).
  • connectionTimeoutMillis bounds how long your route waits for a free pooled connection. Without this, a saturated pool can block until the OS TCP stack gives up, which often exceeds your HTTP deadline.

For HTTP handlers, a practical discipline is: acquire late, release early. Run everything else (validation, cache reads, external HTTP) outside a held DB connection. In freelance and consulting engagements, a surprising share of “Postgres cannot keep up” traces comes from holding a connection while awaiting a slow third-party API.

When to add PgBouncer (or a managed proxy)

Client pools alone still multiply by pod count. A server-side pooler multiplexes many client sessions onto fewer Postgres backends.

PgBouncer is the common self-managed choice. Managed offerings (Amazon RDS Proxy, Google Cloud SQL Auth Proxy patterns, Azure flexible server connection pooling) solve a similar problem with vendor-specific trade-offs.

Core modes:

ModeSemanticsTypical fit
SessionOne server connection for the whole client sessionLegacy apps, lots of session-level features
TransactionServer connection only for the duration of an explicit transactionMost stateless HTTP APIs
StatementAggressive; breaks multi-statement transactionsRare; special cases only

Transaction pooling fits typical REST/GraphQL handlers: BEGIN … work … COMMIT/ROLLBACK, then the backend is returned to the pool. Anything that must persist for longer than a transaction—LISTEN/NOTIFY, SET without SET LOCAL, advisory locks held across requests—needs redesign or session mode.

Prepared statements and transaction pooling: the foot-gun

Drivers and ORMs often use extended query protocol prepared statements (parse / bind / execute). In transaction mode, PgBouncer may assign different server backends across statements unless they are wrapped in a transaction. A prepared plan created on backend A is not visible on backend B, which surfaces as mysterious prepared statement "xyz" does not exist errors under concurrency.

Mitigations teams use in production:

  • Prefer transactions for any route that relies on prepared statements implicitly created by the driver.
  • Configure the driver to use simple protocol or disable prepared statement caching when the pooler requires it (trade: slightly more parse work on the server).
  • Use session pooling for workloads that genuinely need long-lived prepared plans—accepting higher backend count.

The “right” choice depends on your ORM defaults; the wrong choice is assuming the ORM’s laptop defaults remain safe behind PgBouncer at scale.

Sizing: a back-of-the-envelope that actually lands

Start from Postgres:

  1. Read max_connections and subtract non-application usage (replication slots, admin, BI, migration tools).
  2. Decide how much of the remainder is for PgBouncer → Postgres backends (often most of it).

Then application pools:

  • default_pool_size in PgBouncer (or equivalent) should reflect sustained concurrent transactions, not peak HTTP concurrency. Many APIs spend most of the request time not in SQL.
  • Per-pod pool.max should be small enough that worst_case_replicas × pool.max ≤ what PgBouncer and Postgres can absorb, with burst headroom.

Example: Postgres allows 200 application backends. PgBouncer is configured for 80 server connections. Your pods should not be able to sum to more than 80 active backends through PgBouncer; client pools should be sized for queueing under burst rather than opening 80 each.

Also align with connection acquisition timeouts and HTTP deadlines: if your p99 route budget is 300 ms, a 30 s pool wait is meaningless except as a way to exhaust worker threads.

Practical example: Express + pg + transaction-scoped work

The following pattern keeps work inside an explicit transaction so transaction-mode poolers see a coherent backend, sets a statement timeout as a guardrail, and enforces a pool checkout timeout. Adjust types and error mapping for your stack.

import { Pool, PoolClient } from "pg";
import express from "express";

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: Number(process.env.PG_POOL_MAX ?? 8),
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,
});

async function withTransaction<T>(
  fn: (client: PoolClient) => Promise<T>
): Promise<T> {
  const client = await pool.connect();
  try {
    await client.query("BEGIN");
    await client.query("SET LOCAL statement_timeout = '5s'");
    const result = await fn(client);
    await client.query("COMMIT");
    return result;
  } catch (e) {
    try {
      await client.query("ROLLBACK");
    } catch {
      // ignore rollback errors; rethrow primary failure
    }
    throw e;
  } finally {
    client.release();
  }
}

const app = express();

app.get("/users/:id", async (req, res) => {
  const id = req.params.id;
  try {
    const row = await withTransaction(async (c) => {
      const { rows } = await c.query<{ id: string; email: string }>(
        "SELECT id, email FROM users WHERE id = $1",
        [id]
      );
      return rows[0] ?? null;
    });
    if (!row) {
      res.status(404).json({ error: "not_found" });
      return;
    }
    res.json(row);
  } catch (err) {
    // map query timeout, pool timeout, etc.
    res.status(503).json({ error: "upstream_unavailable" });
  }
});

// Kubernetes preStop + graceful shutdown should call pool.end() after draining HTTP

Operationally, pair this with metrics: pool waiting count, checkout time, query duration histograms, Postgres pg_stat_activity states, and PgBouncer SHOW POOLS / SHOW STATS. Those signals tell you whether to scale pods, widen the database, or shrink per-pod pools—not guesswork.

For teams building scalable, production-ready APIs, the recurring recommendation is to treat connection limits like any other shared quota: explicit budgets, timeouts at every wait, and load tests that multiply replicas before Black Friday does it for you.

Common mistakes and pitfalls

  • Setting pool.max to “whatever the default is” on every new service, then deploying 30 copies. Defaults are not universal constants; they are last-resort guesses.
  • Raising max_connections to silence errors without measuring server RAM. You may trade a connection error for swap storms and worse tail latency.
  • Long transactions holding connections while calling external APIs or waiting on user think-time in the same request lifecycle.
  • Ignoring prepared statement semantics behind transaction pooling and blaming “random” 500s under load.
  • No pool checkout timeout, so backpressure becomes unbounded latency instead of a fast, structured failure your caller can retry.
  • Health checks that open a new connection per probe without pooling or with an oversized pool “just for /healthz”.

Conclusion

PostgreSQL connection management in Kubernetes is a multiplication problem: replicas, pool sizes, and concurrent transactions interact with hard server limits. Client pools keep each process efficient; PgBouncer or a managed proxy collapses many pods onto a sustainable number of server backends. The implementation details—transaction scope, prepared statements, acquisition timeouts—determine whether that architecture survives real traffic.

The takeaway is conservative sizing, explicit timeouts, and observability on waits, not only errors. Those practices are what turn connection pooling from a checklist item into something you can defend in an incident review—and what keeps systems maintainable when it is time to scale further or bring in outside help for a deeper performance pass.

Assine a newsletter

Receba um e-mail quando novos artigos forem publicados. Sem spam — apenas novos posts deste blog.

Via Resend. Você pode cancelar a inscrição em qualquer e-mail.