Plan vs Execution
The most consistent failure mode for agent loops is drift: the model loses track of what it was doing, repeats earlier work, or quietly changes goal mid-run. Claude Code's answer is a deliberate planning protocol with persistent artifacts.
Plan mode
EnterPlanMode and ExitPlanMode are first-class tools. EnterPlanMode flips the runtime into a mode that:
- Restricts most write tools (the model can only edit one specific plan file)
- Loads a multi-phase workflow into the system prompt instructing the model to: explore → design → review → finalize → exit
Visible in any live Claude Code session's system prompt under ## Plan Workflow:
Phase 1: Initial Understanding → only Explore agents allowed
Phase 2: Design → up to 1 Plan agent
Phase 3: Review → AskUserQuestion to clarify
Phase 4: Final Plan → write to plan file
Phase 5: Call ExitPlanMode → request approval
The plan file is the only writable artifact. That single constraint forces the model to externalize its thinking incrementally: every refinement is a file edit, every rollback is a diff. The user can review the file at any point and redirect.
In claw-code, EnterPlanMode and ExitPlanMode are registered tools (crates/tools/src/lib.rs:1244–1245) but the workflow is enforced at the system-prompt level rather than the runtime level — there's no claw-code-specific code that checks "is plan mode active?" before each tool call. This is consistent with the harness philosophy: the model is the agent, the runtime is the gate; constraint-enforcement happens via prompts, not hard-coded modes (with one exception: tool registration may be filtered by mode).
Task tracking
Long runs need short-term memory. Two mechanisms:
TodoWrite — for the current turn or short loop
Each entry is {content, activeForm, status: pending|in_progress|completed}. The harness re-renders the todo list after every change, so the user sees a live progress bar in the UI. The model is encouraged to mark in_progress before starting a task and completed immediately after — not batched.
The discipline catches "I forgot to do task 4" early.
TaskRegistry — for cross-turn or sub-agent tracking
crates/runtime/src/task_registry.rs:56–58 defines a thread-safe registry:
class TaskRegistry:
def __init__(self):
self._lock = threading.Lock()
self._tasks: dict[str, Task] = {}
A Task (lines 35–46) carries task_id, prompt, description, task_packet, status, timestamps, messages, output, team_id. Critically, tasks are pure state — they don't spawn processes; they're records that something else (a worker thread, a sub-agent thread) is doing. The registry methods (create, get, list, stop, update, output, append_output, set_status, assign_team) are all Mutex-guarded.
TaskStatus (lines 14–20): Created, Running, Completed, Failed, Stopped.
TaskPacket (task_packet.rs:30–45) is the richer governance variant — used when creating a task that needs strict parameters: objective, scope, scope_path, repo, worktree, branch_policy, acceptance_tests, commit_policy, reporting_contract, escalation_policy. This is the contract a sub-agent operates under. The RunTaskPacket tool (tools/lib.rs:1261) is what spawns one.
Plan files as durable intent
The plan file pattern (one file at a known path, edited incrementally, source of truth for the rest of the run) is genuinely powerful. Properties that make it work:
- Single editable artifact. Plan mode forbids edits to anything else, so there's no "what about the changes I made over here" leakage.
- Full content visible to the user. When the model exits plan mode, the user sees the final plan. There's no hidden state.
- Approval gate. ExitPlanMode is the moment of consent. After approval, the plan is the contract for execution.
- Persistence through compaction. Even if the conversation gets compacted, the plan file is on disk; the next turn can re-read it.
You see this pattern in production-grade agent systems — Devin, Aider, OpenDevin, Cursor's task planning — under various names. Claude Code's discipline of forcing a markdown file as the artifact (rather than e.g. structured JSON) makes it cheap to inspect.
Recovery + retry as part of "consistency"
Adjacent to planning is the recovery system covered in Context, Caching, Compaction. The point relevant here: when a long run hits a known-failure pattern (worker boot timeout, MCP handshake failure, stale branch), the harness has a named recipe, not an open-ended retry. Each recipe is testable in isolation and has a clear success condition.
Compare to the naive pattern of "retry the last action 3 times." The naive version sees a failure, retries with no context change, and either succeeds by luck or exhausts retries. The recipe pattern says: "this failure mode means X; the fix is Y; if Y doesn't work in N attempts, escalate to a human." The model never has to re-derive the diagnosis.
For a homegrown agent: catalogue your top 5–10 failure modes from real usage, write a recipe for each, and bind them to a tool_error payload pattern (e.g., regex on the error string). It's much more robust than reactive retry.
Continue: Permissions & Sandboxing