N
Nexus API Referencev2.4.1

The Big Picture

This series studies the Claude Code agent harness through the lens of ultraworkers/claw-code, a public Rust reimplementation (~48K LOC, 9 crates). Where the Rust port falls short of upstream Claude Code, the gap is flagged; where it gets the architecture right, the patterns are worth borrowing wholesale.

A harness is the layer wrapped around the model that turns a one-shot text generator into something that can read your filesystem, run your tests, edit your code, and remember what it was doing five minutes ago. The model decides what to do; the harness decides whether to let it, how to actually do it, and what state survives to the next turn. Take the harness away and you have a chatbot. Add a good harness and you have an agent.

A user types a prompt. The harness assembles a system prompt out of static guidance + dynamic context, sends it with the conversation history to the model API, streams the response, extracts tool-use blocks, runs each tool through a permission gate and lifecycle hooks, sandboxes filesystem and shell side effects, feeds tool results back into the conversation, and loops until the model emits no more tool calls. Around that core loop sits an extensibility skirt — hooks, plugins, skills, MCP servers, sub-agents — all of which add or modify the tools and prompts the model sees.

Four pressures

Every architectural choice in the rest of this series is, at root, a response to one of four tensions:

  • Capability vs context cost. Every tool you give the model adds tokens to the system prompt. Every tool result adds tokens to the conversation. The model has a finite window, and you're paying for every token.
  • Power vs safety. Once you let the agent run bash, the blast radius is unbounded. You need fences, and the fences need to be cheap to evaluate (so they don't slow every call) and hard to bypass (so a creative prompt can't get around them).
  • Coherence vs drift. Long runs forget themselves. The model that decided the plan at turn 1 is not the same model context that's executing at turn 50; the system prompt is the same, but everything else has shifted. Without scaffolding, agents drift.
  • Determinism vs intelligence. Some behaviors are predictable and shouldn't depend on the model remembering — running lint after a commit, refreshing a token before it expires. The harness is where these get baked in, not the prompt.

Nine crates

rusty-claude-cli  ← binary entry, REPL, one-shot prompt, subcommand parser
       │
       ├── api               ← HTTP/SSE client, providers (Anthropic, OpenAI-compat), prompt_cache analytics
       ├── commands          ← slash-command implementations (/hooks, /skills, /mcp, /doctor, ~50 in total)
       ├── compat-harness    ← parity test scaffolding
       ├── mock-anthropic-service ← deterministic fake of /v1/messages for end-to-end tests
       ├── plugins           ← PluginManifest types (tools, hooks, commands, lifecycle)
       ├── runtime           ← THE CORE: conversation loop, session, prompt builder, hooks,
       │                        permissions, MCP plumbing, compaction, sub-agents, sandboxing
       ├── telemetry         ← session tracing, metrics
       └── tools             ← the 50-tool registry + dispatcher

The dependency graph is clean: rusty-claude-cli depends on everything; runtime is the trunk; tools depends on runtime+api+commands+plugins; api depends on runtime+telemetry; plugins and telemetry are leaves.

The request loop, in eleven lines

crates/runtime/src/conversation.rs:318 defines ConversationRuntime::run_turn(). Inside, lines 346–504 are the heartbeat:

loop:
    build ApiRequest from session.messages + system_prompt
    api_client.stream(request) → AssistantEvent stream
    build_assistant_message(events) → ConversationMessage + TokenUsage + cache events
    extract ToolUse blocks
    push assistant message to session
    if no tool uses → break
    for each tool use:
        run PreToolUse hook
        check permission policy → maybe prompt user
        execute tool
        run PostToolUse / PostToolUseFailure hook
        push tool_result message to session
maybe_auto_compact()
return TurnSummary

Every other piece in this series hangs off that loop.

Eight architectural moves worth borrowing

  1. Static-vs-dynamic system-prompt boundary. A literal string sentinel (__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__ at crates/runtime/src/prompt.rs:40) splits the prompt into a cache-friendly prefix and a per-turn suffix. The simplest possible affordance for a downstream cache layer.

  2. Tool-spec separation from tool-dispatch. mvp_tool_specs() returns metadata (crates/tools/src/lib.rs:393); execute_tool() does work (line 1197). The spec list can be sent to the model as JSON; the dispatch can be tested and gated independently.

  3. Deferred tool surface. Only 6 of 50 tools are eagerly loaded; the other 44 are exposed to the model by name only and require a ToolSearch round-trip to materialize their schemas (crates/tools/src/lib.rs:4944). Massive context savings.

  4. Per-subagent-type tool allowlists. When the parent invokes the Agent tool, the sub-agent gets a tailored tool subset (e.g. Explore is read-only, Plan adds TodoWrite but no bash) — at crates/tools/src/lib.rs:3657–3736.

  5. Permission decision tree. Five-tier mode (ReadOnlyAllow), with deny-rules > context-override > ask-rules > allow-rules > mode-sufficiency precedence (crates/runtime/src/permissions.rs:182–291). Predictable, auditable.

  6. Capability-probed sandboxing. resolve_sandbox_status() runs unshare --user --map-root-user true to verify the kernel actually supports user namespaces, rather than checking for the binary's existence (crates/runtime/src/sandbox.rs:156–303). Avoids the entire class of "the binary is there but the syscall is blocked" bugs.

  7. Lifecycle hooks as harness-mediated, not model-mediated. Hooks are config-driven and invoked by the runtime around each tool call — the model itself never sees the hook output unless a hook injects context. This separates "automated behaviors" from "things the model decides to do."

  8. Mock-Anthropic-service for parity tests. A 1124-LOC fake of /v1/messages plus 12 scripted scenarios run the entire CLI loop end-to-end without burning tokens or hitting rate limits. The most copy-paste-able single artifact in the repo.

The remaining pages expand each of these, plus everything else.


Continue: The Tool Surface

Last updated: May 14, 2026