Why this matters

The query loop does not talk to model providers directly. The API service layer shapes the request, adds the request ID, owns backoff, and carries retry state so the query loop does not have to duplicate that machinery.

The lifecycle in one pass

From top to bottom, this path works like this: claude.ts assembles provider-ready params, withRetry() wraps the call with retry and fallback rules, the Anthropic client sends the request, and the API layer tracks request IDs so failures can still be correlated later.

That means this chapter is not really about one helper function. It is about how several small helpers cooperate so the query loop can ask for one model result without reimplementing transport details.

Request IDs First

The service layer tracks a client request ID for first-party Anthropic traffic. That makes timeouts and server logs easier to match up later. In the streaming path, claude.ts generates the ID and passes it into the request. The client helper also knows the shared header name and can inject one when appropriate.

export const CLIENT_REQUEST_ID_HEADER = 'x-client-request-id'

The header lives in the API client because request correlation is transport work, not turn-loop work.

Retry Context And Failure Handling

The retry layer carries a small context instead of the whole query engine state. That keeps retry decisions focused on what the API layer actually needs.

Before that retry context makes sense, we need the shape of ThinkingConfig. This is the small data model that says whether reasoning is adaptive, explicitly enabled with a token budget, or fully disabled.

export type ThinkingConfig =
  | { type: 'adaptive' }
  | { type: 'enabled'; budgetTokens: number }
  | { type: 'disabled' }

export interface RetryContext {
  maxTokensOverride?: number
  model: string
  thinkingConfig: ThinkingConfig
  fastMode?: boolean
}

model says which model we are trying to reach. thinkingConfig carries the reasoning settings. maxTokensOverride is an optional escape hatch when the caller needs a temporary cap. fastMode lets the retry layer respect the current fast-mode choice without pulling the whole query loop into the retry policy.

When retry cannot continue, the API layer throws a structured error that keeps that context attached.

export class CannotRetryError extends Error {
  constructor(
    public readonly originalError: unknown,
    public readonly retryContext: RetryContext,
  ) {
    const message = errorMessage(originalError)
    super(message)
    this.name = 'RetryError'

    // Preserve the original stack trace if available
    if (originalError instanceof Error && originalError.stack) {
      this.stack = originalError.stack
    }
  }
}

That is how the service layer preserves the useful retry details for whoever handles the final error or fallback path.

Backoff In One Place

Retry sleep timing also belongs here, not in the query loop. The loop should ask for one result or one failure and let the service layer decide how long to wait between attempts.

export function getRetryDelay(
  attempt: number,
  retryAfterHeader?: string | null,
  maxDelayMs = 32000,
): number {
  if (retryAfterHeader) {
    const seconds = parseInt(retryAfterHeader, 10)
    if (!isNaN(seconds)) {
      return seconds * 1000
    }
  }

  const baseDelay = Math.min(
    BASE_DELAY_MS * Math.pow(2, attempt - 1),
    maxDelayMs,
  )
  const jitter = Math.random() * 0.25 * baseDelay
  return baseDelay + jitter
}

The important part is not the exact math. The important part is that the service layer owns backoff, so every caller gets the same retry behavior.

Extra Body Parameters

Before the API body gets assembled, it helps to know the JSON shape that this helper returns. In this file, JsonObject means a plain JSON-safe dictionary.

type JsonValue = string | number | boolean | null | JsonObject | JsonArray
type JsonObject = { [key: string]: JsonValue }
type JsonArray = JsonValue[]

getExtraBodyParams() reads extra body settings, merges beta headers, and returns one request-ready object. That keeps the payload assembly in the API layer instead of scattering it through the rest of the runtime.

export function getExtraBodyParams(betaHeaders?: string[]): JsonObject {
  // Parse user's extra body parameters first
  const extraBodyStr = process.env.CLAUDE_CODE_EXTRA_BODY
  let result: JsonObject = {}

  if (extraBodyStr) {
    try {
      // Parse as JSON, which can be null, boolean, number, string, array or object
      const parsed = safeParseJSON(extraBodyStr)
      // We expect an object with key-value pairs to spread into API parameters
      if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
        // Shallow clone — safeParseJSON is LRU-cached and returns the same
        // object reference for the same string. Mutating `result` below
        // would poison the cache, causing stale values to persist.
        result = { ...(parsed as JsonObject) }
      } else {
        logForDebugging(
          `CLAUDE_CODE_EXTRA_BODY env var must be a JSON object, but was given ${extraBodyStr}`,
          { level: 'error' },
        )
      }
    } catch (error) {
      logForDebugging(
        `Error parsing CLAUDE_CODE_EXTRA_BODY: ${errorMessage(error)}`,
        { level: 'error' },
      )
    }
  }

  // Anti-distillation: send fake_tools opt-in for 1P CLI only
  if (
    feature('ANTI_DISTILLATION_CC')
      ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
        shouldIncludeFirstPartyOnlyBetas() &&
        getFeatureValue_CACHED_MAY_BE_STALE(
          'tengu_anti_distill_fake_tool_injection',
          false,
        )
      : false
  ) {
    result.anti_distillation = ['fake_tools']
  }

  // Handle beta headers if provided
  if (betaHeaders && betaHeaders.length > 0) {
    if (result.anthropic_beta && Array.isArray(result.anthropic_beta)) {
      // Add to existing array, avoiding duplicates
      const existingHeaders = result.anthropic_beta as string[]
      const newHeaders = betaHeaders.filter(
        header => !existingHeaders.includes(header),
      )
      result.anthropic_beta = [...existingHeaders, ...newHeaders]
    } else {
      // Create new array with the beta headers
      result.anthropic_beta = betaHeaders
    }
  }

  return result
}

What This Leaves To Later Pages

The API service path owns request-ID correlation. In some paths claude.ts generates the ID first, and the client wrapper shares the header name and injects one when appropriate. The retry helper owns backoff, retry state, and the fallback trigger after repeated overloads. Higher layers can choose which fallback model to supply, but they do not need to rebuild these low-level request rules themselves.

API Clients And Model Request Lifecycle