N
Nexus API Referencev2.4.1

Plan vs Execution

The most consistent failure mode for agent loops is drift: the model loses track of what it was doing, repeats earlier work, or quietly changes goal mid-run. Claude Code's answer is a deliberate planning protocol with persistent artifacts.

Plan mode

EnterPlanMode and ExitPlanMode are first-class tools. EnterPlanMode flips the runtime into a mode that:

  1. Restricts most write tools (the model can only edit one specific plan file)
  2. Loads a multi-phase workflow into the system prompt instructing the model to: explore → design → review → finalize → exit

Visible in any live Claude Code session's system prompt under ## Plan Workflow:

Phase 1: Initial Understanding   → only Explore agents allowed
Phase 2: Design                   → up to 1 Plan agent
Phase 3: Review                   → AskUserQuestion to clarify
Phase 4: Final Plan              → write to plan file
Phase 5: Call ExitPlanMode       → request approval

The plan file is the only writable artifact. That single constraint forces the model to externalize its thinking incrementally: every refinement is a file edit, every rollback is a diff. The user can review the file at any point and redirect.

In claw-code, EnterPlanMode and ExitPlanMode are registered tools (crates/tools/src/lib.rs:1244–1245) but the workflow is enforced at the system-prompt level rather than the runtime level — there's no claw-code-specific code that checks "is plan mode active?" before each tool call. This is consistent with the harness philosophy: the model is the agent, the runtime is the gate; constraint-enforcement happens via prompts, not hard-coded modes (with one exception: tool registration may be filtered by mode).

Task tracking

Long runs need short-term memory. Two mechanisms:

TodoWrite — for the current turn or short loop

Each entry is {content, activeForm, status: pending|in_progress|completed}. The harness re-renders the todo list after every change, so the user sees a live progress bar in the UI. The model is encouraged to mark in_progress before starting a task and completed immediately after — not batched.

The discipline catches "I forgot to do task 4" early.

TaskRegistry — for cross-turn or sub-agent tracking

crates/runtime/src/task_registry.rs:56–58 defines a thread-safe registry:

class TaskRegistry:
    def __init__(self):
        self._lock = threading.Lock()
        self._tasks: dict[str, Task] = {}

A Task (lines 35–46) carries task_id, prompt, description, task_packet, status, timestamps, messages, output, team_id. Critically, tasks are pure state — they don't spawn processes; they're records that something else (a worker thread, a sub-agent thread) is doing. The registry methods (create, get, list, stop, update, output, append_output, set_status, assign_team) are all Mutex-guarded.

TaskStatus (lines 14–20): Created, Running, Completed, Failed, Stopped.

TaskPacket (task_packet.rs:30–45) is the richer governance variant — used when creating a task that needs strict parameters: objective, scope, scope_path, repo, worktree, branch_policy, acceptance_tests, commit_policy, reporting_contract, escalation_policy. This is the contract a sub-agent operates under. The RunTaskPacket tool (tools/lib.rs:1261) is what spawns one.

Plan files as durable intent

The plan file pattern (one file at a known path, edited incrementally, source of truth for the rest of the run) is genuinely powerful. Properties that make it work:

  • Single editable artifact. Plan mode forbids edits to anything else, so there's no "what about the changes I made over here" leakage.
  • Full content visible to the user. When the model exits plan mode, the user sees the final plan. There's no hidden state.
  • Approval gate. ExitPlanMode is the moment of consent. After approval, the plan is the contract for execution.
  • Persistence through compaction. Even if the conversation gets compacted, the plan file is on disk; the next turn can re-read it.

You see this pattern in production-grade agent systems — Devin, Aider, OpenDevin, Cursor's task planning — under various names. Claude Code's discipline of forcing a markdown file as the artifact (rather than e.g. structured JSON) makes it cheap to inspect.

Recovery + retry as part of "consistency"

Adjacent to planning is the recovery system covered in Context, Caching, Compaction. The point relevant here: when a long run hits a known-failure pattern (worker boot timeout, MCP handshake failure, stale branch), the harness has a named recipe, not an open-ended retry. Each recipe is testable in isolation and has a clear success condition.

Compare to the naive pattern of "retry the last action 3 times." The naive version sees a failure, retries with no context change, and either succeeds by luck or exhausts retries. The recipe pattern says: "this failure mode means X; the fix is Y; if Y doesn't work in N attempts, escalate to a human." The model never has to re-derive the diagnosis.

For a homegrown agent: catalogue your top 5–10 failure modes from real usage, write a recipe for each, and bind them to a tool_error payload pattern (e.g., regex on the error string). It's much more robust than reactive retry.


Continue: Permissions & Sandboxing

Last updated: May 14, 2026