The Request Loop, Traced
Imagine a user types: run the test suite. Walk it through.
Step 1 — REPL captures input
crates/rusty-claude-cli/src/main.rs:3829 has fn run_repl(), which builds a Repl wrapping ConversationRuntime<AnthropicRuntimeClient, CliToolExecutor> (line 3946). Each line of user input is fed through Repl::run_turn() at line 4525.
Step 2 — Enter ConversationRuntime::run_turn()
crates/runtime/src/conversation.rs:318:
def run_turn(
self,
user_input: str,
prompter: PermissionPrompter | None = None,
) -> TurnSummary:
...
Lines 326–334 do a health probe if a previous compaction happened; lines 336–339 push the user message into the session and emit a turn_started trace event.
Step 3 — Build the API request
Lines 356–359:
request = ApiRequest(
system_prompt=self.system_prompt, # list[str], not str
messages=self.session.messages_for_api(),
)
system_prompt is a list[str] because the Anthropic API accepts a system field that's either a single string or an array. Splitting it lets each segment be marked individually for caching (in theory — claw-code doesn't actually mark them; see Context, Caching, Compaction).
Step 4 — Stream from the API
Line 360: self.api_client.stream(request)?. Inside crates/api/src/providers/anthropic.rs, the request is built (build_request at line 477), POSTed to /v1/messages with stream: true, and the SSE response is parsed by crates/runtime/src/sse.rs's IncrementalSseParser (push_chunk at line 18, finish at line 73). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus the streaming-friendly ping.
Step 5 — Build the assistant message
Line 368: build_assistant_message(events) walks the streamed AssistantEvent enum (variants: Thinking, TextDelta, ToolUse, Usage, PromptCache, MessageStop — defined at conversation.rs:30–44) and assembles a ConversationMessage with a Vec<ContentBlock>, plus a TokenUsage and a list of PromptCacheEvents.
Step 6 — Detect cache breaks
Line 744 inside build_assistant_message emits a PromptCacheEvent whenever the cache_read token count drops unexpectedly (defined at conversation.rs:48–54: unexpected: bool, reason: String, previous_cache_read_input_tokens, current_cache_read_input_tokens, token_drop). The prompt_cache.rs module's detect_cache_break() at line 314 is what flags these. Note this is observation only — claw-code watches caches but doesn't try to place them.
Step 7 — Extract tool uses
Lines 379–388:
pending_tool_uses = [
(b.id, b.name, b.input)
for b in assistant_message.blocks
if b.kind == "tool_use"
]
If pending_tool_uses is empty after pushing the assistant message to the session (lines 395–398), the loop breaks (lines 400–402) — the model has decided no tools are needed and the assistant text is the final response.
Step 8 — For each tool use, run the gate
Lines 404–503. For our bash tool call to run tests:
8a. PreToolUse hook (line 405)
The runtime's HookRunner::run_pre_tool_use() is called via the runtime helper at lines 228–241. The hook payload includes tool_name, tool_input, hook_event_name: "PreToolUse". If the hook returns an abort signal (via HookAbortSignal at hooks.rs:63–81), the tool call is skipped and a synthetic error is returned.
8b. Permission check (lines 414–449)
Calls into permission_policy.authorize() at crates/runtime/src/permissions.rs:148. The decision tree:
1. Match against deny_rules → Deny (permissions.rs:182–189)
2. Apply PermissionContext override → Allow/Deny/AskNow (lines 196–242)
3. Match against ask_rules → AskNow (lines 244–257)
4. Match against allow_rules
OR current_mode >= required_mode → Allow (lines 259–264)
5. Mode escalation needed? → AskNow (lines 266–283)
6. Default → Deny (lines 285–291)
If AskNow, the PermissionPrompter trait is invoked (decide() at permissions.rs:86–88), which in CLI mode shows the (y/n/always) prompt to the user.
For bash run-tests, the input flows through classify_bash_permission() (crates/tools/src/lib.rs:1210) — it inspects the command string and dynamically classifies it as ReadOnly (e.g. ls, cat), WorkspaceWrite (e.g. git status), or DangerFullAccess (e.g. rm -rf). The classified mode then drives the gating decision.
8c. Tool execution (lines 451–487)
tool_executor.execute(name, input) calls into execute_tool() in the tools crate, which dispatches to the registered handler. For bash, that's runtime::execute_bash() at crates/runtime/src/bash.rs:71. Inside:
sandbox_status_for_input()(sandbox.rs:156) decides whether to wrap the command inunshare. If the sandbox is supported and not explicitly disabled, the command runs asunshare --user --ipc --pid --uts --mount [--net] sh -lc "<command>"(sandbox.rs:209+) withHOMEredirected to.sandbox-homeandTMPDIRto.sandbox-tmp.- The subprocess is spawned, stdout/stderr captured up to size limits, exit code recorded. Returns a
BashCommandOutput.
8d. PostToolUse / PostToolUseFailure hook (lines 461–487)
On success, fires PostToolUse (helper at lines 246–263) with payload including tool_output. On error, fires PostToolUseFailure (lines 274–289) with tool_error. Both hooks observe; they cannot retroactively block (the tool already ran). They can, however, inject context into the conversation by writing to specific output channels — see Extensibility (Hooks, Plugins, Skills, MCP).
8e. Push tool_result message (lines 489–502)
tool_result = ConversationMessage.tool_result(
tool_use_id=tool_use_id,
tool_name=tool_name,
output=output,
is_error=is_error,
)
self.session.push_message(tool_result)
This is what the model sees on the next turn: a User-role message with a ToolResult content block carrying tool_use_id, content, and is_error.
Step 9 — Loop or break
After all pending tool uses are processed, control returns to the top of the iteration loop (line 346). The next API call sends the now-augmented session, including all the tool_result blocks. The model either issues more tool calls (e.g., read_file to look at a failing test's output) or emits a final assistant message with no tool uses, breaking the loop.
Step 10 — Maybe auto-compact
Line 506: self.maybe_auto_compact()? (defined lines 559–582). If cumulative_usage().input_tokens >= auto_compaction_input_tokens_threshold (default 100_000 at line 18), compact_session() is invoked, which is covered in Context, Caching, Compaction.
Step 11 — Build the TurnSummary
Lines 508–515:
return TurnSummary(
assistant_messages=assistant_messages,
tool_results=tool_results,
prompt_cache_events=prompt_cache_events,
iterations=iterations,
usage=self.usage_tracker.current_turn_usage(),
auto_compaction=auto_compaction,
)
Returned to the REPL, which prints the assistant text and waits for the next input.
What this shape teaches
That's one full turn. From the REPL's perspective it's a single function call; from the runtime's perspective it can be many API round-trips with many tool calls. The harness pattern is clean: the model decides what tools to call; the runtime decides whether they're allowed, runs them, and feeds the results back. The model never sees the gating logic, only its outputs.
Continue: Context, Caching, Compaction