Why this matters

Long sessions do not fail all at once. They slowly fill the available context until the next turn has nowhere useful to fit. Compaction exists to keep that pressure from ending the session.

In the normal path, these services perform a controlled rewrite that preserves the important pieces in a smaller shape, then hand the session back to the main runtime so it can keep going. There is also a last-resort lossy fallback when the compaction request itself is too large, so this system is careful, but not magical.

Compaction under pressure

The service watches token pressure, decides when to compact, rewrites the conversation into a smaller shape, and returns enough state for the next turn.

Threshold Logic First

Compaction starts with a simple question: are we close enough to the context limit that we should intervene now instead of waiting for a failure? That decision eventually depends on a small tracking state plus a threshold helper.

But the service does not jump straight to token math. Before it even checks the threshold, shouldAutoCompact() applies guardrails: recursion guards for forked compact/session-memory work, feature and disable gates, and the special context-collapse suppression path.

AutoCompactTrackingState keeps the minimum moving parts the service needs to watch the current turn. It remembers whether compaction already happened, which turn is being tracked, and whether the service has been failing repeatedly.

getAutoCompactThreshold() turns the model’s context window into a concrete trigger point. The service can also lower that trigger for testing, but the threshold still lives in the compaction layer rather than in the query loop.

export type AutoCompactTrackingState = {
  compacted: boolean
  turnCounter: number
  // Unique ID per turn
  turnId: string
  // Consecutive autocompact failures. Reset on success.
  // Used as a circuit breaker to stop retrying when the context is
  // irrecoverably over the limit (e.g., prompt_too_long).
  consecutiveFailures?: number
}

export function getAutoCompactThreshold(model: string): number {
  const effectiveContextWindow = getEffectiveContextWindowSize(model)

  const autocompactThreshold =
    effectiveContextWindow - AUTOCOMPACT_BUFFER_TOKENS

  // Override for easier testing of autocompact
  const envPercent = process.env.CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
  if (envPercent) {
    const parsed = parseFloat(envPercent)
    if (!isNaN(parsed) && parsed > 0 && parsed <= 100) {
      const percentageThreshold = Math.floor(
        effectiveContextWindow * (parsed / 100),
      )
      return Math.min(percentageThreshold, autocompactThreshold)
    }
  }

  return autocompactThreshold
}

The important idea is not the math itself. The important idea is that the service layer decides when the transcript is close enough to the edge to compact it.

export async function shouldAutoCompact(
  messages: Message[],
  model: string,
  querySource?: QuerySource,
  // Snip removes messages but the surviving assistant's usage still reflects
  // pre-snip context, so tokenCountWithEstimation can't see the savings.
  // Subtract the rough-delta that snip already computed.
  snipTokensFreed = 0,
): Promise<boolean> {
  // Recursion guards. session_memory and compact are forked agents that
  // would deadlock.
  if (querySource === 'session_memory' || querySource === 'compact') {
    return false
  }
  // marble_origami is the ctx-agent — if ITS context blows up and
  // autocompact fires, runPostCompactCleanup calls resetContextCollapse()
  // which destroys the MAIN thread's committed log (module-level state
  // shared across forks). Inside feature() so the string DCEs from
  // external builds (it's in excluded-strings.txt).
  if (feature('CONTEXT_COLLAPSE')) {
    if (querySource === 'marble_origami') {
      return false
    }
  }

  if (!isAutoCompactEnabled()) {
    return false
  }

  // Reactive-only mode: suppress proactive autocompact, let reactive compact
  // catch the API's prompt-too-long. feature() wrapper keeps the flag string
  // out of external builds (REACTIVE_COMPACT is ant-only).
  // Note: returning false here also means autoCompactIfNeeded never reaches
  // trySessionMemoryCompaction in the query loop — the /compact call site
  // still tries session memory first. Revisit if reactive-only graduates.
  if (feature('REACTIVE_COMPACT')) {
    if (getFeatureValue_CACHED_MAY_BE_STALE('tengu_cobalt_raccoon', false)) {
      return false
    }
  }

  // Context-collapse mode: same suppression. Collapse IS the context
  // management system when it's on — the 90% commit / 95% blocking-spawn
  // flow owns the headroom problem. Autocompact firing at effective-13k
  // (~93% of effective) sits right between collapse's commit-start (90%)
  // and blocking (95%), so it would race collapse and usually win, nuking
  // granular context that collapse was about to save. Gating here rather
  // than in isAutoCompactEnabled() keeps reactiveCompact alive as the 413
  // fallback (it consults isAutoCompactEnabled directly) and leaves
  // sessionMemory + manual /compact working.
  //
  // Consult isContextCollapseEnabled (not the raw gate) so the
  // CLAUDE_CONTEXT_COLLAPSE env override is honored here too. require()
  // inside the block breaks the init-time cycle (this file exports
  // getEffectiveContextWindowSize which collapse's index imports).
  if (feature('CONTEXT_COLLAPSE')) {
    /* eslint-disable @typescript-eslint/no-require-imports */
    const { isContextCollapseEnabled } =
      require('../contextCollapse/index.js') as typeof import('../contextCollapse/index.js')
    /* eslint-enable @typescript-eslint/no-require-imports */
    if (isContextCollapseEnabled()) {
      return false
    }
  }

  const tokenCount = tokenCountWithEstimation(messages) - snipTokensFreed

When The Transcript Gets Rewritten

Once the threshold says “compact now”, the service rebuilds the transcript in a smaller form. That is a rewrite, not a wipe.

This is where CompactionResult comes in. It names the pieces that must survive the rewrite and gives later code a stable shape to rebuild from.

export interface CompactionResult {
  boundaryMarker: SystemMessage
  summaryMessages: UserMessage[]
  attachments: AttachmentMessage[]
  hookResults: HookResultMessage[]
  messagesToKeep?: Message[]
  userDisplayMessage?: string
  preCompactTokenCount?: number
  postCompactTokenCount?: number
  truePostCompactTokenCount?: number
  compactionUsage?: ReturnType<typeof getTokenUsage>
}

What The Result Carries

boundaryMarker shows where the compacted history begins. The summary messages, attachments, and hook results keep the important content alive after the rewrite. The token-count fields help the caller understand how much space was reclaimed, and userDisplayMessage is the human-facing explanation that can be shown after the rewrite completes.

/**
 * Build the base post-compact messages array from a CompactionResult.
 * This ensures consistent ordering across all compaction paths.
 * Order: boundaryMarker, summaryMessages, messagesToKeep, attachments, hookResults
 */
export function buildPostCompactMessages(result: CompactionResult): Message[] {
  return [
    result.boundaryMarker,
    ...result.summaryMessages,
    ...(result.messagesToKeep ?? []),
    ...result.attachments,
    ...result.hookResults,
  ]
}

messagesToKeep is the part of the old conversation that should stay visible after compaction.

Most of the time, compaction is a controlled rewrite that keeps the meaningful parts and shrinks the rest so the session can continue. But there is also a last-resort fallback in compact.ts that can drop the oldest API-round groups when the compaction request itself is still too long. So the right mental model is “preserve what matters when possible, fall back to lossy truncation only when necessary.”

Why The Tree Goes Deeper Here

Threshold logic, tracking state, and post-compact rebuilding each solve a different problem. The parent page only needs to teach the shape of the flow. Later pages can go deeper into the threshold rules, the recovery path, and the rewrite details without turning this chapter into a wall of code.

Takeaways

Compaction is a service-layer recovery path. It watches pressure, applies guardrails, and rewrites the transcript into a smaller form that tries to keep the most important conversation pieces while preserving forward progress.

Fun Facts

The threshold helper can be nudged for testing, but the service still owns the decision.
The rewrite order matters because later code expects the compacted boundary and preserved messages in a stable sequence.

Compaction And Context Recovery Services