Production webhook receivers: signatures, replay protection, and idempotent delivery

How to build webhook HTTP endpoints that survive retries, clock skew, and malicious traffic: HMAC verification, timestamp windows, deduplication, and clear response contracts.

Autor: Matheus Palma7. April 20267 Min. Lesezeit

Software engineeringBackendAPI designSecurityReliabilityWebhooks

Your billing provider sends invoice.paid to your API. One night the load balancer returns 504 to the provider’s client after your handler already marked the invoice paid and granted credits. The provider retries with the same payload and Idempotency-Key—or does it? Another integration never retries but duplicates delivery when their queue replays after a crash. In consulting work, webhook endpoints are where “HTTP 200 means success” collides with distributed systems reality: duplicate deliveries, delayed retries, forged requests, and ambiguous failure semantics.

This article is about designing webhook receivers the way you would design a payment callback: cryptographic verification, bounded replay windows, idempotent handlers, and HTTP responses that tell the sender whether to retry. The patterns apply to Stripe-style signing secrets, GitHub HMAC headers, and any partner that documents a shared secret and a delivery identifier.

Why a plain HTTPS POST is not enough

A webhook is an unauthenticated inbound HTTP request unless you verify it. Without verification, anyone who discovers your URL can POST arbitrary JSON and trigger your side effects. Even with TLS, confidentiality does not imply authenticity—the attacker is not decrypting traffic; they are sending their own.

Providers therefore ship:

A shared secret (or asymmetric keys) used to compute a signature over the raw body (and sometimes headers or timestamps).
Optional delivery IDs for deduplication.
Retry policies when your endpoint returns 5xx or times out.

Your job is to turn that into at-least-once delivery with exactly-once business effects on your side, or as close as your domain allows.

Core design: verify, bound, dedupe, respond

Signature verification: always over the raw body

Most providers send a signature in a header (e.g. X-Signature, Stripe-Signature) computed as HMAC-SHA256 over the exact raw request bytes plus sometimes a timestamp. Common mistakes:

Parsing JSON first and re-serializing for verification — whitespace and key order change the digest.
Using a framework that mutates the body before your handler runs.

Correct pattern: read the body as a Buffer or Uint8Array, verify the signature against that buffer, then parse JSON. In Node.js (Express-style), use express.raw({ type: 'application/json' }) on the webhook route, or a framework hook that preserves the raw payload.

import crypto from "node:crypto";

function timingSafeEqual(a: string, b: string): boolean {
  const ab = Buffer.from(a, "utf8");
  const bb = Buffer.from(b, "utf8");
  if (ab.length !== bb.length) return false;
  return crypto.timingSafeEqual(ab, bb);
}

export function verifyHmacSha256Hex(secret: string, rawBody: Buffer, providedHex: string): boolean {
  const expected = crypto.createHmac("sha256", secret).update(rawBody).digest("hex");
  return timingSafeEqual(expected.toLowerCase(), providedHex.toLowerCase());
}

Use constant-time comparison for hex or base64 signatures so an attacker cannot infer the digest byte-by-byte. If the provider uses a prefix scheme (e.g. v1=...), strip and compare only the relevant portion per their docs.

Timestamp and replay windows

Many schemes include a Unix timestamp in the signed material or as a separate header. Reject requests whose timestamp is outside a small window (often five minutes) relative to your server clock. That limits how long a captured request remains valid if someone replays it.

Trade-offs:

Tight window — better security; more false rejects if your NTP or the provider’s clock drifts.
Loose window — fewer spurious failures; wider replay opportunity.

In production systems, monitor verification failures and alert on spikes—often they indicate clock skew or a secret rotation mismatch, not attacks.

Idempotency: natural keys from the provider

After verification, deduplicate using a stable identifier:

Provider-supplied event ID or delivery ID (preferred).
If only payload content exists, a hash of (event type + primary business id + logical timestamp) can work but is more fragile.

Store processed IDs in a durable store with a TTL at least as long as the provider’s retry horizon (sometimes days). On duplicate delivery, return 200 quickly without re-running side effects.

// Sketch: Redis SET key NX with TTL, or DB unique constraint on event_id.

async function handleVerifiedEvent(eventId: string, process: () => Promise<void>): Promise<"processed" | "duplicate"> {
  const acquired = await idempotencyStore.tryClaim(`webhook:${eventId}`, ttlSeconds);
  if (!acquired) return "duplicate";
  await process();
  return "processed";
}

If your datastore does not support atomic claim, use insert-first with a unique index on event_id and treat unique violations as duplicates—simpler than compare-and-swap in many stacks.

Response contract and status codes

Senders usually interpret:

2xx — success; do not retry this delivery (some providers still retry on certain codes; read their docs).
4xx — often treated as permanent failure or bad payload; may disable the endpoint after repeated failures.
5xx / timeout — retry with backoff.

Therefore:

Return 200 after you have persisted the intent to process (e.g. enqueue to an internal queue) or completed processing—pick one strategy and document it for your team.
Do not use 200 for “I ignored this event type” if the provider expects you to acknowledge only known types; some APIs want 2xx anyway to stop retries—align with their documentation.

A robust pattern is accept fast, process async: verify signature, enqueue a job with the raw payload reference, return 200. Your worker then retries internally with standard job semantics. This reduces duplicate business effects from HTTP retries while keeping the HTTP layer simple.

Practical example: Express-style route with raw body and queue handoff

The following example wires verification, timestamp check, idempotency claim, and a minimal queue handoff. Adapt headers and signature format to your provider.

import express from "express";
import crypto from "node:crypto";

const app = express();
const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET!; // from KMS or secrets manager
const MAX_SKEW_SEC = 300;

app.post(
  "/webhooks/partner",
  express.raw({ type: "application/json" }),
  async (req, res) => {
    const raw = req.body as Buffer;
    const sig = req.header("X-Partner-Signature");
    const ts = req.header("X-Partner-Timestamp");
    const eventId = req.header("X-Partner-Event-Id");

    if (!sig || !ts || !eventId) {
      res.status(400).send("missing_headers");
      return;
    }

    const t = Number(ts);
    if (!Number.isFinite(t) || Math.abs(Date.now() / 1000 - t) > MAX_SKEW_SEC) {
      res.status(400).send("stale_timestamp");
      return;
    }

    const payloadToSign = `${t}.${raw.toString("utf8")}`;
    const expected = crypto.createHmac("sha256", WEBHOOK_SECRET).update(payloadToSign).digest("hex");
    if (!crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(sig))) {
      res.status(401).send("bad_signature");
      return;
    }

    const dup = await idempotencyStore.has(`partner:${eventId}`);
    if (dup) {
      res.status(200).json({ status: "duplicate" });
      return;
    }

    await internalQueue.enqueue({
      source: "partner",
      eventId,
      receivedAt: new Date().toISOString(),
      rawBase64: raw.toString("base64"),
    });

    await idempotencyStore.remember(`partner:${eventId}`, 86400 * 7);

    res.status(200).json({ status: "accepted" });
  }
);

Key points:

Timestamp in the signed string ties the signature to a specific instant (t.body pattern is common; match your provider exactly).
401 vs 400 — use 401 for bad crypto; 400 for malformed requests. Some senders only retry 5xx, so wrong secret should not produce 5xx.
Idempotency after enqueue — if enqueue fails after remember, you can lose events; production code often uses outbox or transactional enqueue so commit and queue publish are atomic (see articles on transactional outbox for the same database).

Trade-offs and limitations

Synchronous processing in the HTTP handler — Simpler to reason about, but long work causes timeouts and provider retries, which multiply deliveries. Prefer short acceptance + async processing for non-trivial workflows.

Secret rotation — Supporting two active secrets during rotation avoids hard downtime. Verify with either key until the old one is retired.

Multi-region — Idempotency stores must be consistent across regions if the same event could hit different clusters, or you must route webhooks to a single region.

Payload size — Huge JSON bodies can exhaust memory; some providers sign a digest or use detached signatures—still verify against the bytes you received.

Common mistakes and pitfalls

Verifying after JSON parse — Breaks HMAC; always verify the raw body.

Returning 500 for signature failures — Can trigger aggressive retries and provider alerts; use 4xx for client/secret problems.

No idempotency on success path — Duplicate HTTP deliveries duplicate charges, emails, or provisioning.

Ignoring ordering — Some streams are ordered per object; if you process deleted before created due to concurrency, you need per-resource sequencing or version fields.

Logging full payloads — May contain PII; log event IDs and redacted summaries.

Conclusion

Production webhook endpoints are small HTTP services with security and consistency requirements, not “just another route.” Verify signatures on raw bytes, bound replay with timestamps, deduplicate with provider IDs in durable storage, and return status codes that match your retry story. The teams that operate these endpoints calmly under load are the ones that treat inbound webhooks like outbound payments: explicit contracts, observability on verification failures, and idempotent side effects.

Key takeaways:

Raw body + HMAC + constant-time compare is the baseline for authenticity.
Timestamp windows limit replay; tune skew against your monitoring.
Idempotent processing turns at-least-once HTTP into at-most-once business outcomes.
Fast accept + async process aligns HTTP timeouts with provider retries.

In freelance and consulting engagements, webhook design often surfaces early when integrating billing, CRM, or CI systems—getting the receiver right avoids duplicate revenue recognition and noisy incident pages. For collaborations focused on scalable, production-ready integrations, contact is the right channel; background on experience is on About.

E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.

Über Resend. Abmeldung in jeder E-Mail möglich.