Human-in-the-loop approval for LLM tool actions: policies, queues, and production UX

Gate high-risk agent tools behind durable approval workflows: policy engines, idempotent side effects, timeout semantics, and UX that keeps assistants useful without silent autonomy.

作者: Matheus Palma2026年6月3日约 10 分钟阅读

Software engineeringArtificial intelligenceBackendAPI designTypeScriptPostgreSQLArchitecture

Your assistant can refund orders, rotate API keys, and post to Slack. In staging, the demo is magical: the model picks the right tool, the UI updates, everyone applauds. In production, the first incident is not prompt injection—it is a correct-looking tool call on the wrong account after a long thread, approved implicitly because nobody defined what “autonomous” means. Legal asks who clicked approve; engineering discovers there was no click, only a model completion that your server executed.

Human-in-the-loop (HITL) approval is how you keep agentic features shippable: the model may propose side effects, but your backend decides whether they run now, later, or never—based on policy, role, amount, environment, and audit requirements. This article covers the control plane (not the prompt tricks): durable approval records, idempotent execution, and UX patterns that do not stall every harmless read.

Why “ask the user in chat” is not approval

Chat UIs tempt you to treat “Should I proceed?” as consent. That fails in production for predictable reasons:

No durable witness — Chat messages are not a legal or security audit trail unless you model them explicitly.
Ambiguous scope — The user said “yes” three turns ago; the model now proposes a different amount, recipient, or tenant.
Concurrent sessions — Mobile plus web means the approving principal may not be the session that triggered the tool.
Automation bypass — A compromised prompt or retrieval chunk can mimic affirmative answers in the transcript.

HITL belongs in your domain layer: an approval_request row, a signed link or in-app inbox, and an execution gate that refuses to call side-effecting tools until status is approved (or policy auto-approves).

Teams I work with on production assistants usually already have tool routing; what they lack is a state machine between proposed and executed.

Classify tools: read, write, and irreversible

Before building UI, tag every tool with a risk tier. Keep the taxonomy small so product and security can reason about it.

Tier	Examples	Default behavior
Read	`getOrder`, `searchDocs`, `listInvoices`	Auto-execute; log for audit
Write	`updateShippingAddress`, `addComment`	Policy-based: auto if low risk, else approval
Irreversible / high impact	`issueRefund`, `deleteUser`, `transferFunds`, `sendExternalEmail`	Approval required; optional second factor

Encode tier in the tool registry—not only in documentation—so the orchestrator cannot “forget” to check:

export type ToolRisk = "read" | "write" | "irreversible";

export type RegisteredTool = {
  name: string;
  risk: ToolRisk;
  /** Stable id for policy rules (refunds, exports, …) */
  actionType: string;
  execute: (args: unknown, ctx: ToolContext) => Promise<unknown>;
};

Why action types matter: Policy rules attach to actionType, not function names. Renaming refundOrder to createRefund should not silently bypass compliance rules.

Policy engine: when approval is required

Policies should be deterministic code or data, not model judgment. The model proposes; code decides.

Typical inputs:

tool.actionType, tool.risk
Principal — user id, roles, tenant, impersonation flag
Arguments — amount, currency, destination domain, record id
Environment — production vs sandbox
Session signals — new device, elevated risk score, rate limits

Example rule sketch:

export type PolicyDecision =
  | { effect: "allow" }
  | { effect: "deny"; reason: string }
  | { effect: "require_approval"; reason: string; expiresInSec?: number };

export function evaluateToolPolicy(
  tool: RegisteredTool,
  args: Record<string, unknown>,
  ctx: ToolContext,
): PolicyDecision {
  if (tool.risk === "read") return { effect: "allow" };

  if (tool.actionType === "refund.create") {
    const amount = Number(args.amountCents ?? 0);
    if (amount > ctx.autoApproveRefundCents) {
      return {
        effect: "require_approval",
        reason: `Refund ${amount} exceeds auto-approve limit`,
        expiresInSec: 3600,
      };
    }
  }

  if (tool.risk === "irreversible") {
    return { effect: "require_approval", reason: "Irreversible action" };
  }

  return { effect: "allow" };
}

Trade-off: Hard-coded thresholds are easy to ship; versioned policy documents (JSON/YAML in git, evaluated in CI) scale better for regulated tenants. Either way, log the policy version on every decision for audits.

Auto-approve is still a policy outcome

“No human” is not the absence of HITL—it is machine approval with explicit bounds. Document those bounds for security reviews and set monitoring on auto-approve rates per action type.

Durable approval records

Treat each gated tool call as a workflow entity, not a chat line.

Suggested fields:

id (uuid), session_id, tenant_id
requested_by (user), action_type, tool_name
arguments_json — canonical JSON; hash for integrity
arguments_hash — detect tampering between propose and execute
status — pending | approved | rejected | expired | executed | failed
policy_reason, policy_version
approved_by, approved_at, rejection_reason
idempotency_key — ties to tool round / client retry
expires_at — pending approvals must not linger forever
execution_result_json, executed_at

Store a human-readable summary generated at propose time (“Refund $240.00 to card •••• 4242 for order #8821”). Approvers should not parse raw JSON under pressure.

In consulting engagements, the mistake I see most often is storing approvals only in Redis: you lose audit history and make incident response painful. PostgreSQL (or your system of record) is the source of truth; Redis can cache pending counts for the inbox UI.

Orchestration: integrate with the tool loop

Your multi-turn orchestrator already runs rounds: model → tool calls → results → model. Insert a gate before execute:

flowchart TD
  A[Model proposes tool calls] --> B{Policy evaluate}
  B -->|allow| C[Execute tool]
  B -->|deny| D[Return denial to model]
  B -->|require_approval| E[Persist approval_request]
  E --> F[Notify approver]
  F --> G[Return pending status to model / UI]
  G --> H{Approver decision}
  H -->|approved| I[Execute with same args hash]
  H -->|rejected| J[Record rejection]
  I --> C

Critical invariant: Execution uses stored arguments, not a fresh model re-generation. If the user edits the order amount in the UI after approval, invalidate the pending request.

What the model should see while pending

Return a structured tool result, not silence:

{
  "status": "pending_approval",
  "approvalId": "apr_01H…",
  "summary": "Refund $240.00 for order #8821 awaiting manager approval",
  "expiresAt": "2026-06-03T15:00:00Z"
}

System instructions should tell the model to inform the user, avoid duplicate proposals for the same idempotency_key, and not claim the refund completed.

Notifications and approver UX

Approvers are busy; optimize for decide in under 30 seconds.

In-app inbox with filters (tenant, action type, age)
Deep links with signed tokens (approve / reject one-time actions)
Slack/email with the summary and buttons—ensure buttons hit your API, not the model
Mobile — irreversible actions often need mobile-friendly approval; chat desktop alone is insufficient

Show diff context: what changed since last state, linked CRM account, fraud score. Approvers are performing operational work, not chatting.

For teams building scalable, production-ready systems, invest early in separation of duties: the requester should not be the sole approver for high-impact actions unless policy explicitly allows it.

Timeouts, expiry, and user expectations

Pending approvals need expiry (e.g. 1–24 hours depending on action). When expires_at passes:

Mark expired
Do not execute
Notify the requester session on next poll or via websocket

If the user still wants the action, the model must create a new proposal with a new idempotency key after re-validation (amounts and inventory change).

Do not auto-approve on expiry unless legal/compliance explicitly permits it—that pattern has caused real financial loss.

Idempotency and exactly-once side effects

Approvals intersect with retries:

Client retries the chat request → same idempotency_key → return existing approval_request, do not create duplicates
Approver double-clicks Approve → second request is a no-op if status is already executed
Worker crashes after DB commit but before external API → resume execution using status = approved and idempotent downstream keys (payment provider idempotency headers, etc.)

Pattern:

async function executeApprovedRequest(approvalId: string): Promise<void> {
  const row = await db.approvalRequests.findByIdForUpdate(approvalId);
  if (!row) throw new NotFoundError();
  if (row.status === "executed") return;
  if (row.status !== "approved") throw new InvalidStateError(row.status);

  const tool = registry.get(row.tool_name);
  const args = canonicalizeJson(row.arguments_json);
  if (hash(args) !== row.arguments_hash) throw new TamperError();

  const result = await tool.execute(args, buildContext(row));
  await db.approvalRequests.markExecuted(approvalId, result);
}

Use row-level locking or UPDATE … WHERE status = 'approved' with affected-rows check to prevent double execution under concurrency.

Observability and security

Emit structured logs and metrics:

approval.created, approval.approved, approval.rejected, approval.expired, approval.executed, approval.failed
Histogram: time from create → decision → execute
Alert: spike in denied or failed for a single action_type

Security notes:

Sign approval links; bind to approver identity and short TTL
Rate-limit approval endpoints separately from chat
Replay — approval IDs are one-time consumables for execution, not reusable bearer tokens
Align with LLM trust boundaries: injection may propose tools; policy + HITL must block execution

Practical example: refund tool with approval gate

Below is a condensed but realistic flow in TypeScript. Adapt persistence and auth to your stack; the structure is what matters.

import { createHash, randomUUID } from "node:crypto";

type ApprovalRow = {
  id: string;
  status: string;
  action_type: string;
  tool_name: string;
  arguments_json: string;
  arguments_hash: string;
  idempotency_key: string;
  summary: string;
};

function hashArgs(args: unknown): string {
  return createHash("sha256").update(JSON.stringify(args)).digest("hex");
}

export async function handleModelToolCalls(
  sessionId: string,
  toolCalls: Array<{ id: string; name: string; args: unknown }>,
  ctx: ToolContext,
): Promise<Array<{ toolCallId: string; content: string }>> {
  const results: Array<{ toolCallId: string; content: string }> = [];

  for (const call of toolCalls) {
    const tool = registry.get(call.name);
    const decision = evaluateToolPolicy(tool, call.args as Record<string, unknown>, ctx);

    if (decision.effect === "deny") {
      results.push({
        toolCallId: call.id,
        content: JSON.stringify({ status: "denied", reason: decision.reason }),
      });
      continue;
    }

    if (decision.effect === "require_approval") {
      const idempotencyKey = `${sessionId}:${call.id}`;
      const existing = await db.findApprovalByIdempotency(idempotencyKey);
      if (existing) {
        results.push({
          toolCallId: call.id,
          content: JSON.stringify({
            status: "pending_approval",
            approvalId: existing.id,
            summary: existing.summary,
          }),
        });
        continue;
      }

      const args = call.args;
      const row = await db.createApproval({
        id: randomUUID(),
        session_id: sessionId,
        status: "pending",
        action_type: tool.actionType,
        tool_name: tool.name,
        arguments_json: JSON.stringify(args),
        arguments_hash: hashArgs(args),
        idempotency_key: idempotencyKey,
        summary: buildRefundSummary(args),
        expires_at: new Date(Date.now() + (decision.expiresInSec ?? 3600) * 1000),
        policy_reason: decision.reason,
      });

      await notifyApprovers(row, ctx);
      results.push({
        toolCallId: call.id,
        content: JSON.stringify({
          status: "pending_approval",
          approvalId: row.id,
          summary: row.summary,
        }),
      });
      continue;
    }

    const output = await tool.execute(call.args, ctx);
    results.push({ toolCallId: call.id, content: JSON.stringify(output) });
  }

  return results;
}

export async function approveAndExecute(
  approvalId: string,
  approver: { userId: string; roles: string[] },
): Promise<void> {
  const row = await db.approvalRequests.findById(approvalId);
  if (!row || row.status !== "pending") throw new InvalidStateError();
  if (new Date(row.expires_at) < new Date()) {
    await db.markExpired(approvalId);
    throw new ExpiredError();
  }
  if (!canApprove(approver, row)) throw new ForbiddenError();

  await db.markApproved(approvalId, approver.userId);
  await executeApprovedRequest(approvalId);
}

Wire approveAndExecute to your inbox UI and signed deep links. Keep chat endpoints unable to call execute directly without passing through approval state.

Common mistakes and pitfalls

Re-running the model to “fill in” arguments at execution time — Changes scope; breaks audit. Execute exactly what was approved.
Storing approvals only in the chat transcript — Not queryable, not legally robust, lost on session reset.
Requiring approval for every tool — Users abandon the product; approvers fatigue and click through. Tier tools aggressively.
No expiry — Pending refunds pile up; execution fires on stale business state.
Same person proposes and approves high-impact actions — Fails SOC2-style controls; encode separation in policy.
Letting the model call execution endpoints — Tool implementations must be server-side only; never expose raw side-effect APIs to the client.
Ignoring idempotency on approval create — Duplicate rows confuse approvers and can double-charge if execution guards are weak.
Opaque tool results to approvers — “Model wanted to run tool X” without amount, tenant, and target is how mistakes get approved.

Conclusion

Agentic LLM features become production-ready when side effects are gated by policy and durable human decisions, not by conversational politeness. Classify tools by risk, evaluate deterministic policies, persist approval requests with hashed arguments, execute only after explicit approval (or bounded auto-approve), and instrument the full lifecycle. Pair this with solid session and tool-loop design from multi-turn LLM backends and defense in depth from trust boundaries for a coherent safety story.

If you are designing approval workflows for an assistant or hardening an existing agent stack, get in touch—I help teams ship scalable, auditable backends without sacrificing UX.

新文章发布时收到邮件。无垃圾信息 — 仅本博客的新文章通知。

由 Resend 发送，可在邮件中退订。