Model-agnostic LLM service layers: adapters, capability maps, and provider boundaries

Route OpenAI, Anthropic, and compatible hosts through one internal API: canonical messages, streaming adapters, tool translation, capability maps, and why full portability is a product decision.

Autor: Matheus Palma19. Mai 20269 Min. Lesezeit

Software engineeringArtificial intelligenceBackendTypeScriptAPI designArchitecture

Your product standardizes on one hosted model. Six months later, pricing shifts, a regional outage lasts hours, or a security review demands a second vendor for redundancy. Someone says, “We’ll just swap the SDK.” By the second sprint you discover that tool call shapes, streaming event formats, image inputs, and JSON schema constraints are not byte-compatible—and your “thin wrapper” has become a tangle of if (provider === …) branches in every route. The pain is not laziness; it is that each vendor optimizes for its own surface, while your application needs a stable internal contract.

This article describes how to design a model-agnostic service layer that is honest about trade-offs: where adapters buy leverage, where capability maps prevent silent degradation, and why pretending all models are interchangeable usually breaks observability or safety. The patterns reflect recurring work when helping teams ship assistants and API backends that must stay maintainable across provider churn—not a theoretical “universal LLM SDK,” but controlled seams you can test and operate.

What “model-agnostic” should mean (and what it should not promise)

Model-agnostic in production rarely means “identical behavior everywhere.” It means:

Your domain code (billing, permissions, persistence, tool execution) depends on your types and events, not on a vendor’s wire format.
You can route traffic to different providers or model IDs with configuration, not invasive refactors.
You can detect unsupported combinations early (for example, “this route requires JSON schema mode; provider B does not implement it yet”) instead of failing mid-stream.

It does not mean every model produces the same quality, latency, or safety profile. Those differences belong in routing policy and evaluation, not hidden inside a leaky abstraction.

Unified completion contract versus pass-through proxies

A pass-through gateway that forwards arbitrary JSON to whichever host is “OpenAI-compatible” is fast to build and hard to secure. You inherit schema drift, ambiguous errors, and clients that accidentally depend on undocumented fields.

A unified contract defines:

Message roles and content parts your product actually uses (text, image references, optional file handles).
Tool definitions in one canonical shape (name, description, parameters as JSON Schema).
Generation options as a reduced set your product supports (temperature, maxOutputTokens, stopSequences, responseFormat enum).

Adapters translate that contract to vendor requests. When a vendor cannot express a feature, the layer returns a controlled error or a degraded path, not silent omission.

Designing the internal boundary: `LlmClient` as a port, not a singleton SDK

Treat the LLM as infrastructure behind an interface your application owns—similar to how you would abstract email or object storage.

Canonical messages and tool definitions

Store and log your representation:

Roles: at minimum system, user, assistant, tool. Map vendor-specific roles (for example “developer” or “model”) explicitly in adapters.
Content: prefer structured parts ({ type: "text", text: "…" }) over one concatenated string when multimodal or citations appear later.
Tool results: always include a stable toolCallId generated by your server when executing tools, not only the opaque id the model returned—this matters for replay, audits, and multi-turn repair when providers disagree on id formats.

Canonical shapes make it possible to redact PII consistently, hash prompts for caching, and attach tenant metadata without scraping vendor-specific JSON.

Capability maps: declare what each deployment can do

Maintain a small capability record per (provider, modelId) or per deployment profile:

Streaming: supported or not; max concurrent streams per process if you self-host.
Tool calling: parallel calls allowed; maximum tools per request; whether strict JSON Schema is honored or “best effort.”
Structured outputs: native schema mode, constrained decoding, or prompt-only JSON (affects reliability guarantees).
Context window and output token cap: used for preflight checks before you pay for a round trip.

At request time, merge route requirements with capabilities. Example: a “contract extraction” route might require structuredOutputs: required. If the selected deployment lacks it, fail fast with a routing hint (“switch model” or “queue for offline batch on vendor A”) instead of returning prose that breaks downstream parsers.

This is where many integrations go wrong: they assume “GPT-class behavior” from every endpoint that speaks roughly the same REST shape. Capability maps turn that assumption into data you can assert in tests.

Streaming: normalize chunks, preserve cancellation

Streaming protocols differ (SSE framing, chunk types, usage metadata timing), but your downstream consumers—HTTP handlers, workers, WebSocket bridges—should see a single async iterator of typed events:

text_delta — append-only UTF-8 segments (already normalized for split codepoints if needed).
tool_call_delta — incremental JSON arguments where the vendor streams partial tool JSON.
finish — stop reason, resolved model id, usage when available.
error — mapped to stable internal codes.

Why normalize: UI code and analytics should not parse five different SSE dialects. Cancellation (AbortSignal) should tear down upstream readers promptly so disconnecting a browser tab does not leave provider streams running.

Keep vendor raw frames behind a debug flag only; full wire logs are a compliance and storage liability unless you have explicit retention policy and redaction.

Tool calling: translate proposals, never execute from raw model strings

Even with a unified tool schema, execution remains your responsibility:

Model emits a tool call proposal (name + arguments object or JSON string mid-stream).
Adapter parses into your canonical ToolInvocation type.
Server-side policy validates: allowlisted tool, argument schema with a validator, tenant scope, rate limits.
Handler executes with service credentials appropriate to the user; results are fed back as tool messages in your canonical transcript.

The adapter’s job ends at syntax; authorization belongs in domain services. In practice, when reviewing architectures for consulting clients, the highest-risk shortcut is skipping step 3 because “the model would not call delete_user.” It will—prompt injection is a separate concern, but tool surfaces must be safe by construction.

Routing, fallbacks, and blast radius

Multi-provider setups need explicit routing rules:

Primary / secondary by model family for redundancy.
Cost tier routing for internal vs. external traffic.
Data residency constraints that pin certain tenants to specific regions or vendors.

Fallback is not automatic retry on every 5xx: duplicated assistant messages, double tool execution, and mismatched token usage all appear when you naively “try the other provider.” Prefer:

Fail closed on routes with side effects unless idempotency keys cover the operation.
Degrade on read-only routes (for example, return a shorter answer from a smaller model with a banner) only when product accepts quality loss.

Isolate SDK clients in small modules per vendor so a dependency upgrade in one adapter does not force churn across the codebase. Blast radius is also operational: separate API keys, rate limits, and circuit breakers per provider.

Practical example: TypeScript skeleton

The following sketch shows ports and adapters: a stable LlmClient interface, a normalized stream event, and two adapter stubs. It is illustrative—wire real HTTP/SDK calls, retries, and telemetry in the adapters, not in domain services.

/** Your product’s stable shapes — not vendor JSON. */
export type Role = "system" | "user" | "assistant" | "tool";

export type TextPart = { type: "text"; text: string };

export type Message = {
  role: Role;
  parts: TextPart[];
  /** Stable id for tool correlation / audits */
  toolCallId?: string;
};

export type ToolDef = {
  name: string;
  description: string;
  parameters: Record<string, unknown>; // JSON Schema object
};

export type GenerateInput = {
  model: string;
  messages: Message[];
  tools?: ToolDef[];
  temperature?: number;
  maxOutputTokens?: number;
};

export type StreamEvent =
  | { kind: "text_delta"; text: string }
  | { kind: "tool_call_start"; id: string; name: string }
  | { kind: "tool_call_args_delta"; id: string; jsonFragment: string }
  | { kind: "finish"; stopReason: string; usage?: { inputTokens: number; outputTokens: number } }
  | { kind: "error"; code: string; retryable: boolean };

export type ModelCapabilities = {
  streaming: boolean;
  tools: boolean;
  structuredJsonMode: "native" | "best_effort" | "none";
  contextTokens: number;
};

export interface LlmClient {
  readonly capabilities: ModelCapabilities;
  streamGenerate(input: GenerateInput, signal: AbortSignal): AsyncIterable<StreamEvent>;
}

/** Example: route picks an implementation based on config / tenant policy. */
export function createLlmClient(profile: "vendorA" | "vendorB"): LlmClient {
  switch (profile) {
    case "vendorA":
      return new VendorAAdapter();
    case "vendorB":
      return new VendorBAdapter();
  }
}

class VendorAAdapter implements LlmClient {
  capabilities: ModelCapabilities = {
    streaming: true,
    tools: true,
    structuredJsonMode: "native",
    contextTokens: 128_000,
  };

  async *streamGenerate(
    input: GenerateInput,
    signal: AbortSignal,
  ): AsyncIterable<StreamEvent> {
    // Map `input` to vendor request; translate SSE chunks → StreamEvent.
    // Yield `text_delta` for token text; on parse errors yield `error` and stop.
    yield { kind: "text_delta", text: "…" };
    yield { kind: "finish", stopReason: "stop" };
  }
}

class VendorBAdapter implements LlmClient {
  capabilities: ModelCapabilities = {
    streaming: true,
    tools: true,
    structuredJsonMode: "best_effort",
    contextTokens: 200_000,
  };

  async *streamGenerate(
    input: GenerateInput,
    signal: AbortSignal,
  ): AsyncIterable<StreamEvent> {
    yield { kind: "text_delta", text: "…" };
    yield { kind: "finish", stopReason: "stop" };
  }
}

/** Preflight: fail before spend when the route needs features the model lacks. */
export function assertRouteSupported(
  caps: ModelCapabilities,
  needs: { tools?: boolean; structuredJson?: "required" | "optional" },
): void {
  if (needs.tools && !caps.tools) {
    throw new Error("tools_not_supported");
  }
  if (needs.structuredJson === "required" && caps.structuredJsonMode === "none") {
    throw new Error("structured_outputs_not_supported");
  }
}

Key properties this enforces:

Domain code depends on StreamEvent, not raw SSE strings.
Capability checks are explicit and testable.
Vendor code stays inside adapters, which is where SDK upgrades should concentrate risk.

Common mistakes and pitfalls

Leaking vendor-specific types into handlers — The moment your HTTP route imports a provider SDK, refactors balloon. Keep imports one-way: adapters depend on ports, not the reverse.

One “lowest common denominator” feature set forever — Over-normalization stalls product work. Instead, version your internal contract (GenerateInputV2) when you add image parts or new tool modes, and migrate routes incrementally.

Silent feature loss — If a secondary vendor ignores a JSON Schema constraint, you may ship parsing errors to users. Surface capability gaps as metrics (schema_validation_failures_by_provider) and block routes that cannot meet SLOs.

Shared rate limits and API keys across tenants — Noisy neighbors become security issues. Prefer per-tenant routing keys and per-provider budgets.

Logging raw prompts without policy — Multi-provider often means multi-region data processing agreements. Centralize redaction on the canonical message type before persistence or export.

Conclusion

A maintainable LLM layer is less about hiding vendors and more about owning the contract your product depends on: canonical messages, normalized streaming, explicit capabilities, and tool execution that stays on the server side of trust. Adapters absorb churn; capability maps make limitations visible; routing policies encode business and compliance constraints that no SDK will provide for you.

Takeaways:

Define your message and tool types first; treat vendor APIs as adapters, not the center of gravity.
Publish capabilities per model deployment and assert them before expensive or fragile paths.
Normalize streaming events and thread cancellation through the stack.
Execute tools only after server-side validation and policy checks, regardless of how “smart” the model appears.

Teams that invest in this seam ship faster afterward—new models become configuration plus adapter work, not a rewrite of every route. For background on focus areas and how collaborations around scalable, production-ready systems typically start, see About; for direct inquiries, Contact remains the right channel.

E-Mail erhalten, wenn neue Artikel erscheinen. Kein Spam — nur neue Beiträge von diesem Blog.

Über Resend. Abmeldung in jeder E-Mail möglich.