Why this matters
The query loop does not talk to model providers directly. The API service layer shapes the request, adds the request ID, owns backoff, and carries retry state so the query loop does not have to duplicate that machinery.
The lifecycle in one pass
From top to bottom, this path works like this:
claude.ts assembles provider-ready params, withRetry() wraps the call with
retry and fallback rules, the Anthropic client sends the request, and the API
layer tracks request IDs so failures can still be correlated later.
That means this chapter is not really about one helper function. It is about how several small helpers cooperate so the query loop can ask for one model result without reimplementing transport details.
Request IDs First
The service layer tracks a client request ID for first-party Anthropic traffic.
That makes timeouts and server logs easier to match up later. In the streaming
path, claude.ts generates the ID and passes it into the request. The client
helper also knows the shared header name and can inject one when appropriate.
export const CLIENT_REQUEST_ID_HEADER = 'x-client-request-id'
The header lives in the API client because request correlation is transport work, not turn-loop work.
Retry Context And Failure Handling
The retry layer carries a small context instead of the whole query engine state. That keeps retry decisions focused on what the API layer actually needs.
Before that retry context makes sense, we need the shape of
ThinkingConfig. This is the small data model that says whether reasoning is
adaptive, explicitly enabled with a token budget, or fully disabled.
export type ThinkingConfig =
| { type: 'adaptive' }
| { type: 'enabled'; budgetTokens: number }
| { type: 'disabled' }
export interface RetryContext {
maxTokensOverride?: number
model: string
thinkingConfig: ThinkingConfig
fastMode?: boolean
}
model says which model we are trying to reach. thinkingConfig carries the
reasoning settings. maxTokensOverride is an optional escape hatch when the
caller needs a temporary cap. fastMode lets the retry layer respect the
current fast-mode choice without pulling the whole query loop into the retry
policy.
When retry cannot continue, the API layer throws a structured error that keeps that context attached.
export class CannotRetryError extends Error {
constructor(
public readonly originalError: unknown,
public readonly retryContext: RetryContext,
) {
const message = errorMessage(originalError)
super(message)
this.name = 'RetryError'
// Preserve the original stack trace if available
if (originalError instanceof Error && originalError.stack) {
this.stack = originalError.stack
}
}
}
That is how the service layer preserves the useful retry details for whoever handles the final error or fallback path.
Backoff In One Place
Retry sleep timing also belongs here, not in the query loop. The loop should ask for one result or one failure and let the service layer decide how long to wait between attempts.
export function getRetryDelay(
attempt: number,
retryAfterHeader?: string | null,
maxDelayMs = 32000,
): number {
if (retryAfterHeader) {
const seconds = parseInt(retryAfterHeader, 10)
if (!isNaN(seconds)) {
return seconds * 1000
}
}
const baseDelay = Math.min(
BASE_DELAY_MS * Math.pow(2, attempt - 1),
maxDelayMs,
)
const jitter = Math.random() * 0.25 * baseDelay
return baseDelay + jitter
}
The important part is not the exact math. The important part is that the service layer owns backoff, so every caller gets the same retry behavior.
Extra Body Parameters
Before the API body gets assembled, it helps to know the JSON shape that this
helper returns. In this file, JsonObject means a plain JSON-safe dictionary.
type JsonValue = string | number | boolean | null | JsonObject | JsonArray
type JsonObject = { [key: string]: JsonValue }
type JsonArray = JsonValue[]
getExtraBodyParams() reads extra body settings, merges beta headers, and
returns one request-ready object. That keeps the payload assembly in the API
layer instead of scattering it through the rest of the runtime.
export function getExtraBodyParams(betaHeaders?: string[]): JsonObject {
// Parse user's extra body parameters first
const extraBodyStr = process.env.CLAUDE_CODE_EXTRA_BODY
let result: JsonObject = {}
if (extraBodyStr) {
try {
// Parse as JSON, which can be null, boolean, number, string, array or object
const parsed = safeParseJSON(extraBodyStr)
// We expect an object with key-value pairs to spread into API parameters
if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
// Shallow clone — safeParseJSON is LRU-cached and returns the same
// object reference for the same string. Mutating `result` below
// would poison the cache, causing stale values to persist.
result = { ...(parsed as JsonObject) }
} else {
logForDebugging(
`CLAUDE_CODE_EXTRA_BODY env var must be a JSON object, but was given ${extraBodyStr}`,
{ level: 'error' },
)
}
} catch (error) {
logForDebugging(
`Error parsing CLAUDE_CODE_EXTRA_BODY: ${errorMessage(error)}`,
{ level: 'error' },
)
}
}
// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
feature('ANTI_DISTILLATION_CC')
? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
shouldIncludeFirstPartyOnlyBetas() &&
getFeatureValue_CACHED_MAY_BE_STALE(
'tengu_anti_distill_fake_tool_injection',
false,
)
: false
) {
result.anti_distillation = ['fake_tools']
}
// Handle beta headers if provided
if (betaHeaders && betaHeaders.length > 0) {
if (result.anthropic_beta && Array.isArray(result.anthropic_beta)) {
// Add to existing array, avoiding duplicates
const existingHeaders = result.anthropic_beta as string[]
const newHeaders = betaHeaders.filter(
header => !existingHeaders.includes(header),
)
result.anthropic_beta = [...existingHeaders, ...newHeaders]
} else {
// Create new array with the beta headers
result.anthropic_beta = betaHeaders
}
}
return result
}
What This Leaves To Later Pages
The API service path owns request-ID correlation. In some paths claude.ts
generates the ID first, and the client wrapper shares the header name and
injects one when appropriate. The retry helper owns backoff, retry state, and
the fallback trigger after repeated overloads. Higher layers can choose which
fallback model to supply, but they do not need to rebuild these low-level
request rules themselves.