The Request Loop, Traced

Imagine a user types: run the test suite. Walk it through.

Step 1 — REPL captures input

crates/rusty-claude-cli/src/main.rs:3829 has fn run_repl(), which builds a Repl wrapping ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor> (line 3946). Each line of user input is fed through Repl::run_turn() at line 4525.

Step 2 — Enter `ConversationRuntime::run_turn()`

crates/runtime/src/conversation.rs:318:

def run_turn(
    self,
    user_input: str,
    prompter: PermissionPrompter | None = None,
) -> TurnSummary:
    ...

Lines 326–334 do a health probe if a previous compaction happened; lines 336–339 push the user message into the session and emit a turn_started trace event.

Step 3 — Build the API request

Lines 356–359:

request = ApiRequest(
    system_prompt=self.system_prompt,    # list[str], not str
    messages=self.session.messages_for_api(),
)

system_prompt is a list[str] because the Anthropic API accepts a system field that's either a single string or an array. Splitting it lets each segment be marked individually for caching (in theory — claw-code doesn't actually mark them; see Context, Caching, Compaction).

Step 4 — Stream from the API

Line 360: self.api_client.stream(request)?. Inside crates/api/src/providers/anthropic.rs, the request is built (build_request at line 477), POSTed to /v1/messages with stream: true, and the SSE response is parsed by crates/runtime/src/sse.rs's IncrementalSseParser (push_chunk at line 18, finish at line 73). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus the streaming-friendly ping.

Step 5 — Build the assistant message

Line 368: build_assistant_message(events) walks the streamed AssistantEvent enum (variants: Thinking, TextDelta, ToolUse, Usage, PromptCache, MessageStop — defined at conversation.rs:30–44) and assembles a ConversationMessage with a Vec<ContentBlock>, plus a TokenUsage and a list of PromptCacheEvents.

Step 6 — Detect cache breaks

Line 744 inside build_assistant_message emits a PromptCacheEvent whenever the cache_read token count drops unexpectedly (defined at conversation.rs:48–54: unexpected: bool, reason: String, previous_cache_read_input_tokens, current_cache_read_input_tokens, token_drop). The prompt_cache.rs module's detect_cache_break() at line 314 is what flags these. Note this is observation only — claw-code watches caches but doesn't try to place them.

Step 7 — Extract tool uses

Lines 379–388:

pending_tool_uses = [
    (b.id, b.name, b.input)
    for b in assistant_message.blocks
    if b.kind == "tool_use"
]

If pending_tool_uses is empty after pushing the assistant message to the session (lines 395–398), the loop breaks (lines 400–402) — the model has decided no tools are needed and the assistant text is the final response.

Step 8 — For each tool use, run the gate

Lines 404–503. For our bash tool call to run tests:

8a. PreToolUse hook (line 405)

The runtime's HookRunner::run_pre_tool_use() is called via the runtime helper at lines 228–241. The hook payload includes tool_name, tool_input, hook_event_name: "PreToolUse". If the hook returns an abort signal (via HookAbortSignal at hooks.rs:63–81), the tool call is skipped and a synthetic error is returned.

8b. Permission check (lines 414–449)

Calls into permission_policy.authorize() at crates/runtime/src/permissions.rs:148. The decision tree:

1. Match against deny_rules        → Deny  (permissions.rs:182–189)
2. Apply PermissionContext override → Allow/Deny/AskNow (lines 196–242)
3. Match against ask_rules          → AskNow (lines 244–257)
4. Match against allow_rules
   OR current_mode >= required_mode → Allow  (lines 259–264)
5. Mode escalation needed?          → AskNow (lines 266–283)
6. Default                          → Deny  (lines 285–291)

If AskNow, the PermissionPrompter trait is invoked (decide() at permissions.rs:86–88), which in CLI mode shows the (y/n/always) prompt to the user.

For bash run-tests, the input flows through classify_bash_permission() (crates/tools/src/lib.rs:1210) — it inspects the command string and dynamically classifies it as ReadOnly (e.g. ls, cat), WorkspaceWrite (e.g. git status), or DangerFullAccess (e.g. rm -rf). The classified mode then drives the gating decision.

8c. Tool execution (lines 451–487)

tool_executor.execute(name, input) calls into execute_tool() in the tools crate, which dispatches to the registered handler. For bash, that's runtime::execute_bash() at crates/runtime/src/bash.rs:71. Inside:

sandbox_status_for_input() (sandbox.rs:156) decides whether to wrap the command in unshare. If the sandbox is supported and not explicitly disabled, the command runs as unshare --user --ipc --pid --uts --mount [--net] sh -lc "<command>" (sandbox.rs:209+) with HOME redirected to .sandbox-home and TMPDIR to .sandbox-tmp.
The subprocess is spawned, stdout/stderr captured up to size limits, exit code recorded. Returns a BashCommandOutput.

8d. PostToolUse / PostToolUseFailure hook (lines 461–487)

On success, fires PostToolUse (helper at lines 246–263) with payload including tool_output. On error, fires PostToolUseFailure (lines 274–289) with tool_error. Both hooks observe; they cannot retroactively block (the tool already ran). They can, however, inject context into the conversation by writing to specific output channels — see Extensibility (Hooks, Plugins, Skills, MCP).

8e. Push tool_result message (lines 489–502)

tool_result = ConversationMessage.tool_result(
    tool_use_id=tool_use_id,
    tool_name=tool_name,
    output=output,
    is_error=is_error,
)
self.session.push_message(tool_result)

This is what the model sees on the next turn: a User-role message with a ToolResult content block carrying tool_use_id, content, and is_error.

Step 9 — Loop or break

After all pending tool uses are processed, control returns to the top of the iteration loop (line 346). The next API call sends the now-augmented session, including all the tool_result blocks. The model either issues more tool calls (e.g., read_file to look at a failing test's output) or emits a final assistant message with no tool uses, breaking the loop.

Step 10 — Maybe auto-compact

Line 506: self.maybe_auto_compact()? (defined lines 559–582). If cumulative_usage().input_tokens >= auto_compaction_input_tokens_threshold (default 100_000 at line 18), compact_session() is invoked, which is covered in Context, Caching, Compaction.

Step 11 — Build the TurnSummary

Lines 508–515:

return TurnSummary(
    assistant_messages=assistant_messages,
    tool_results=tool_results,
    prompt_cache_events=prompt_cache_events,
    iterations=iterations,
    usage=self.usage_tracker.current_turn_usage(),
    auto_compaction=auto_compaction,
)

Returned to the REPL, which prints the assistant text and waits for the next input.

What this shape teaches

That's one full turn. From the REPL's perspective it's a single function call; from the runtime's perspective it can be many API round-trips with many tool calls. The harness pattern is clean: the model decides what tools to call; the runtime decides whether they're allowed, runs them, and feeds the results back. The model never sees the gating logic, only its outputs.

Continue: Context, Caching, Compaction