Plan vs Execution

The most consistent failure mode for agent loops is drift: the model loses track of what it was doing, repeats earlier work, or quietly changes goal mid-run. Claude Code's answer is a deliberate planning protocol with persistent artifacts.

Plan mode

EnterPlanMode and ExitPlanMode are first-class tools. EnterPlanMode flips the runtime into a mode that:

Restricts most write tools (the model can only edit one specific plan file)
Loads a multi-phase workflow into the system prompt instructing the model to: explore → design → review → finalize → exit

Visible in any live Claude Code session's system prompt under ## Plan Workflow:

Phase 1: Initial Understanding   → only Explore agents allowed
Phase 2: Design                   → up to 1 Plan agent
Phase 3: Review                   → AskUserQuestion to clarify
Phase 4: Final Plan              → write to plan file
Phase 5: Call ExitPlanMode       → request approval

The plan file is the only writable artifact. That single constraint forces the model to externalize its thinking incrementally: every refinement is a file edit, every rollback is a diff. The user can review the file at any point and redirect.

In claw-code, EnterPlanMode and ExitPlanMode are registered tools (crates/tools/src/lib.rs:1244–1245) but the workflow is enforced at the system-prompt level rather than the runtime level — there's no claw-code-specific code that checks "is plan mode active?" before each tool call. This is consistent with the harness philosophy: the model is the agent, the runtime is the gate; constraint-enforcement happens via prompts, not hard-coded modes (with one exception: tool registration may be filtered by mode).

Task tracking

Long runs need short-term memory. Two mechanisms:

TodoWrite — for the current turn or short loop

Each entry is {content, activeForm, status: pending|in_progress|completed}. The harness re-renders the todo list after every change, so the user sees a live progress bar in the UI. The model is encouraged to mark in_progress before starting a task and completed immediately after — not batched.

The discipline catches "I forgot to do task 4" early.

TaskRegistry — for cross-turn or sub-agent tracking

crates/runtime/src/task_registry.rs:56–58 defines a thread-safe registry:

class TaskRegistry:
    def __init__(self):
        self._lock = threading.Lock()
        self._tasks: dict[str, Task] = {}

A Task (lines 35–46) carries task_id, prompt, description, task_packet, status, timestamps, messages, output, team_id. Critically, tasks are pure state — they don't spawn processes; they're records that something else (a worker thread, a sub-agent thread) is doing. The registry methods (create, get, list, stop, update, output, append_output, set_status, assign_team) are all Mutex-guarded.

TaskStatus (lines 14–20): Created, Running, Completed, Failed, Stopped.

TaskPacket (task_packet.rs:30–45) is the richer governance variant — used when creating a task that needs strict parameters: objective, scope, scope_path, repo, worktree, branch_policy, acceptance_tests, commit_policy, reporting_contract, escalation_policy. This is the contract a sub-agent operates under. The RunTaskPacket tool (tools/lib.rs:1261) is what spawns one.

Plan files as durable intent

The plan file pattern (one file at a known path, edited incrementally, source of truth for the rest of the run) is genuinely powerful. Properties that make it work:

Single editable artifact. Plan mode forbids edits to anything else, so there's no "what about the changes I made over here" leakage.
Full content visible to the user. When the model exits plan mode, the user sees the final plan. There's no hidden state.
Approval gate. ExitPlanMode is the moment of consent. After approval, the plan is the contract for execution.
Persistence through compaction. Even if the conversation gets compacted, the plan file is on disk; the next turn can re-read it.

You see this pattern in production-grade agent systems — Devin, Aider, OpenDevin, Cursor's task planning — under various names. Claude Code's discipline of forcing a markdown file as the artifact (rather than e.g. structured JSON) makes it cheap to inspect.

Recovery + retry as part of "consistency"

Adjacent to planning is the recovery system covered in Context, Caching, Compaction. The point relevant here: when a long run hits a known-failure pattern (worker boot timeout, MCP handshake failure, stale branch), the harness has a named recipe, not an open-ended retry. Each recipe is testable in isolation and has a clear success condition.

Compare to the naive pattern of "retry the last action 3 times." The naive version sees a failure, retries with no context change, and either succeeds by luck or exhausts retries. The recipe pattern says: "this failure mode means X; the fix is Y; if Y doesn't work in N attempts, escalate to a human." The model never has to re-derive the diagnosis.

For a homegrown agent: catalogue your top 5–10 failure modes from real usage, write a recipe for each, and bind them to a tool_error payload pattern (e.g., regex on the error string). It's much more robust than reactive retry.

Continue: Permissions & Sandboxing