Keyset pagination: building stable, scalable list APIs
Why OFFSET-based pagination breaks at scale and how cursor (keyset) pagination uses indexed columns for predictable latency, with SQL patterns and API design.
Your admin UI loads page 47 of a sortable orders table. The product manager asks for infinite scroll on the public activity feed. A mobile client requests the next chunk of notifications after a reconnect. In each case, the naive answer—LIMIT with OFFSET—works in development and collapses in production: page loads get slower as users go deeper, rows appear twice or vanish when the underlying data changes between requests, and the database scans and discards rows you never return.
Keyset pagination (also called seek or cursor-based pagination) fixes this by naming a position in the result set with the values that define ordering, not with a row offset. Teams building production-ready list APIs and feeds lean on this pattern whenever lists are large, concurrent writes are common, or latency must stay flat. This article walks through the failure modes of offset pagination, the mechanics of keyset queries, trade-offs, and a concrete API and SQL shape you can adapt.
Why OFFSET pagination degrades
Cost grows with depth
ORDER BY created_at DESC OFFSET 100000 LIMIT 20 tells the database to sort, skip one hundred thousand rows, then return twenty. On many engines, the skip is not free: the planner may still walk past those rows. As OFFSET grows, work grows. Users on “later” pages pay more than users on the first page for the same twenty rows.
Unstable snapshots
Between request A (OFFSET 20) and request B (OFFSET 40), a new row can insert at the top of the ordering. Every row shifts down by one index. The client never saw the row that “fell off” page 1 onto page 2; a row can be skipped entirely across page boundaries. Deletes cause the opposite. For live feeds and ops dashboards, that inconsistency is hard to explain away.
Misleading totals
COUNT(*) for “Page 3 of 12,482” is expensive on big tables and often wrong under concurrent writes anyway. Offset-based UX encourages total counts; keyset-based APIs more naturally expose “next page exists” without scanning the table.
Keyset pagination: anchor on indexed columns
Instead of “give me rows 80–99,” the client says: “give me the next rows after this anchor,” where the anchor is the last row the client already saw, expressed in the same columns as ORDER BY.
Example: order by (created_at DESC, id DESC) (tie-breaker id avoids ambiguity when timestamps collide). The first request returns twenty rows. The client stores the last row’s (created_at, id). The next request asks for rows strictly before that pair in sort order—using a compound condition, not OFFSET.
Conceptually:
WHERE (created_at, id) < (:cursor_created_at, :cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 20
The database uses an index that matches the order (e.g. (created_at DESC, id DESC)), seeks to the starting position, and reads the next slice. Cost is driven by page size and selectivity, not by how “deep” the user scrolled in absolute terms.
Why the tie-breaker matters
If you order only by created_at and two rows share the same timestamp, their relative order is undefined unless you add a unique column. Without it, cursors are ambiguous and rows can duplicate or drop across pages. A stable unique key (often the primary key) appended to ORDER BY is standard.
Forward and backward pagination
Forward (“older” items): use < or > comparisons consistent with DESC vs ASC.
Backward (“newer” items): either swap the comparison direction or swap ORDER BY and reverse results in the app. Many APIs expose direction=next|prev and map to the appropriate predicate.
Bidirectional infinite scroll needs a clear rule for symmetry: the same cursor format should work for both directions with well-defined semantics.
API design: encoding the cursor
Clients should not assemble raw SQL tuples. Typical approaches:
- Opaque string: server serializes
(created_at, id)(and optionally sort/version metadata), signs or HMACs it, base64url-encodes it. The client passescursor=eyJ...on the next request. - Structured query params:
after_created_at=...&after_id=...— simpler to debug, easier to tamper with; validate ranges and tie to allowed sort keys.
Document:
- Sort fields and direction (fixed per endpoint or explicit
sort=parameter). - Maximum page size.
- That cursors are short-lived if you change sort options or schema; old cursors may return
400with a message to restart from the first page.
For public feeds, opaque cursors also avoid leaking internal ids if you prefer not to expose raw primary keys (though often the surrogate id is already public).
SQL patterns (PostgreSQL-oriented)
Composite row comparison
PostgreSQL supports row comparisons aligned with index order:
SELECT id, created_at, title
FROM posts
WHERE (created_at, id) < ($1::timestamptz, $2::uuid)
ORDER BY created_at DESC, id DESC
LIMIT 20;
Ensure a matching index, e.g. CREATE INDEX ON posts (created_at DESC, id DESC);
Nullable sort columns
If a nullable column participates in ORDER BY, normalize where nulls sort (NULLS FIRST / NULLS LAST) and use comparisons that match. Mixed null handling bugs show up as duplicate or missing rows at boundaries.
Filtering and secondary predicates
Keyset pagination composes with filters: WHERE tenant_id = $t AND status = 'open' AND (created_at, id) < (...). The index should support the filter and the order—often a composite starting with equality columns: (tenant_id, status, created_at DESC, id DESC).
Trade-offs and limitations
No arbitrary jump to page N. Keyset fits sequential navigation (infinite scroll, “load more”). Jumping to “page 47” without walking cursors requires either caching, materialized positions, or accepting offset costs for that rare admin use case.
Sort stability. Changing the default sort invalidates old cursors. Version your API or namespace cursors by sort signature.
Total counts. If the product truly needs “result 8,901 of 900,000,” you are back to expensive counts or approximate counts; keyset does not remove that product cost.
Complex sorts. Multi-column sorts with mixed directions or user-defined order are still doable but demand careful encoding and indexes per supported mode.
Practical example: Express + PostgreSQL sketch
The following illustrates one endpoint: list posts newest-first with an opaque cursor. Production code would add auth, input validation, metrics, and migration-managed indexes.
import crypto from "crypto";
import type { Pool } from "pg";
type PostRow = { id: string; created_at: Date; title: string };
const SECRET = process.env.CURSOR_HMAC_SECRET!; // shared server secret
function encodeCursor(createdAt: Date, id: string): string {
const payload = JSON.stringify({
ca: createdAt.toISOString(),
id,
v: 1,
});
const sig = crypto.createHmac("sha256", SECRET).update(payload).digest("base64url");
return Buffer.from(JSON.stringify({ p: payload, s: sig }), "utf8").toString("base64url");
}
function decodeCursor(cursor: string): { createdAt: Date; id: string } | null {
try {
const raw = JSON.parse(Buffer.from(cursor, "base64url").toString("utf8"));
const sig = crypto.createHmac("sha256", SECRET).update(raw.p).digest("base64url");
if (sig !== raw.s) return null;
const data = JSON.parse(raw.p);
return { createdAt: new Date(data.ca), id: data.id };
} catch {
return null;
}
}
export async function listPosts(
pool: Pool,
opts: { limit: number; cursor?: string },
): Promise<{ rows: PostRow[]; nextCursor: string | null }> {
const limit = Math.min(Math.max(opts.limit, 1), 100);
const decoded = opts.cursor ? decodeCursor(opts.cursor) : null;
if (opts.cursor && !decoded) {
throw new Error("Invalid cursor");
}
const params: unknown[] = [limit + 1];
let where = "";
if (decoded) {
// $2 = created_at, $3 = id — must match (created_at, id) in WHERE and ORDER BY
params.unshift(decoded.createdAt, decoded.id);
where = `WHERE (created_at, id) < ($2::timestamptz, $3::uuid)`;
}
const { rows } = await pool.query<PostRow>(
`
SELECT id, created_at, title
FROM posts
${where}
ORDER BY created_at DESC, id DESC
LIMIT $1
`,
params,
);
const hasMore = rows.length > limit;
const page = hasMore ? rows.slice(0, limit) : rows;
const last = page[page.length - 1];
const nextCursor =
hasMore && last ? encodeCursor(last.created_at, last.id) : null;
return { rows: page, nextCursor };
}
The limit + 1 trick detects a next page without a separate count query.
Common mistakes and pitfalls
- Missing tie-breaker: ordering by a non-unique column alone produces unstable cursors.
- Encoding only half the key: sending
created_atwithoutidwhen timestamps can collide. - Ignoring index alignment: predicates must match
ORDER BYso the planner can use the index seek. - Reusing cursors after data mutations: generally fine for append-only feeds; for mutable rows, document that edits can reorder or disappear from a filtered view—same as any live list.
- Exposing internal tuples to clients without validation, allowing crafted cursors to probe other ranges—mitigate with signing or server-side session state.
Conclusion
Offset pagination is acceptable for small, stable admin tables; keyset pagination is the default choice for high-volume lists and feeds where stable ordering, flat latency, and honest “load more” semantics matter. The implementation cost—compound sort, tie-breaker column, matching index, and opaque or validated cursors—is repaid in predictable database behavior and fewer impossible-to-reproduce “missing row on page 2” tickets.
In client projects focused on scalable APIs, getting list endpoints right early avoids costly rework when traffic and data volume grow. If you are evaluating pagination for a new service or refactoring a slow listing, the patterns above are a solid baseline; for background on how this site is built and how to get in touch, see About and Contact.
订阅邮件通讯
新文章发布时收到邮件。无垃圾信息 — 仅本博客的新文章通知。
由 Resend 发送,可在邮件中退订。