Patterns to Borrow
Twelve patterns to lift, ranked roughly by impact-per-line-of-code, plus a file:line cheat sheet and the parity gaps to be aware of.
1. Deferred tool schemas
The pattern: Eagerly load only ~6 core tool schemas; expose the rest as names with a ToolSearch round-trip to fetch their schema on demand.
Why: Tool schemas can each be hundreds of tokens. A 50-tool registry would put 10K+ tokens in the system prompt that the model rarely needs. Deferred loading means typical sessions materialize only 5–10 tool schemas.
Cost: One extra round-trip per tool first-use. Negligible relative to the prompt savings.
→ See The Tool Surface.
2. Static-vs-dynamic system-prompt boundary
The pattern: Concatenate the system prompt out of explicitly-static and explicitly-dynamic sections, separated by a sentinel string. A downstream cache layer can split on the sentinel.
Why: Even if you don't have prompt caching today, you might tomorrow. The boundary makes that future migration trivial.
Cost: Three lines of code to add the sentinel and the splitter.
→ See Context, Caching, Compaction.
3. Plan files as durable intent
The pattern: Plan mode forbids edits to anything except a single named markdown file. The model writes the plan incrementally; the user reviews; ExitPlanMode requests approval.
Why: Long agent runs drift. Plans externalize intent so drift is detectable. The single-file constraint forces the plan to be the source of truth, not scattered conversation.
Cost: A mode flag in the runtime, a special tool registration filter.
→ See Plan vs Execution.
4. Sub-agent context isolation contract
The pattern: Sub-agents run in fresh runtimes (own session, own tool subset, own usage tracker). Parent gets back only a manifest, not the transcript. The sub-agent's full transcript persists to disk.
Why: Without this, every sub-agent's exploration leaks into the parent's context. With it, the parent stays at 5–10K tokens while sub-agents do the heavy reading.
Cost: A thread spawn + a fresh ConversationRuntime per Agent invocation.
→ See Sub-agents & Context Cleanliness.
5. Per-subagent-type tool allowlists
The pattern: When a sub-agent is spawned with a subagent_type (e.g., "Explore"), only a curated subset of tools is exposed to it.
Why: A read-only Explore agent shouldn't be able to run bash. Locking down the tool surface per agent type prevents whole classes of mistakes (the Explore agent can't accidentally mutate state).
Cost: A switch statement mapping types to tool name lists.
→ See Sub-agents & Context Cleanliness.
6. Lifecycle hooks separated from prompts
The pattern: Predictable, periodic, or always-do behaviors live in settings.json hooks, not in the system prompt. The harness, not the model, enforces them.
Why: "After every commit, run lint" cannot be reliably done by prompting the model — it'll forget. A PostToolUse hook will do it every time, observably.
Cost: A hook event taxonomy (PreToolUse / PostToolUse / PostToolUseFailure / SessionStart / etc.) and a config-driven runner.
→ See Extensibility (Hooks, Plugins, Skills, MCP).
7. Mock-service end-to-end testing
The pattern: Faithful mock of your model API + scripted scenarios + clean-environment CLI runs = deterministic, fast, comprehensive harness tests.
Why: You can't ship an agent harness without integration tests. Real-model tests are flaky and expensive. The mock pattern makes them cheap and reliable.
Cost: ~1K LOC for the mock, ~1K LOC for the harness scaffolding, ~50 LOC per scenario.
→ See The Mock Parity Harness.
8. Capability-probed sandboxing
The pattern: Don't check for the binary's existence — actually execute a no-op command through the proposed sandbox to verify it works. Cache the result.
Why: Container restrictions, kernel configs, missing capabilities — a binary's presence does not imply functionality. The probe catches this in 50ms once.
Cost: A OnceLock<bool> and a single execve.
→ See Permissions & Sandboxing.
9. Five-tier permission model with predictable precedence
The pattern: Deny rules > context overrides > ask rules > allow rules > mode sufficiency > final deny. Documented order; no ambiguity.
Why: Permission models that are "just check these things in some order" become unauditable as they grow. A documented decision tree is checkable.
Cost: A single authorize() function with explicit precedence comments.
→ See Permissions & Sandboxing.
10. Recovery recipes for known failure modes
The pattern: Catalogue the top N failure modes from real usage; bind each to a recipe (AcceptTrustPrompt, RebaseBranch, RetryMcpHandshake, etc.). When a recipe matches a tool error, enqueue it instead of failing or naive-retrying.
Why: Naive retry is a coin flip. A named recipe is testable, observable, and recoverable.
Cost: A small enum + a regex/match table from error patterns to recipes.
→ See Context, Caching, Compaction.
11. Branch-locks-before-merge collision detection
The pattern: Before two parallel agents touch overlapping code on a branch, declare intent (lane_id + branch + modules). Detect collisions at intent time, not at merge time.
Why: Detecting at merge time means rework. Detecting at intent time means routing — pause one lane, broaden the other's scope, or split the work cleanly.
Cost: An Arc<Mutex<HashMap<branch, Vec<intent>>>> and pairwise overlap detection (O(n²) is fine).
→ See Multi-Agent Coordination.
12. Cache-aware self-pacing
The pattern: When scheduling future wake-ups (or any intra-session pacing), respect the cache TTL. Stay in cache (sub-5-min) when actively iterating; commit to longer waits (20+ min) when the cache miss is unavoidable. Don't pick exactly 5 minutes.
Why: The Anthropic cache has a 5-minute TTL. Picking 300s burns the cache miss without amortizing. Either stay under or commit to a longer interval.
Cost: Documentation in your ScheduleWakeup / equivalent tool description, plus aware-by-default model behavior.
→ See Context, Caching, Compaction.
Reference: critical files cheat sheet
By topic, the file:line refs you'll want to navigate to:
Request loop: crates/runtime/src/conversation.rs:318 (run_turn), :346–504 (loop body), :559–582 (auto-compact), :744 (cache events), crates/runtime/src/sse.rs:18 (push_chunk)
Caching/compaction: crates/runtime/src/prompt.rs:40 (boundary sentinel), :113–221 (builder), :169–191 (compose order), crates/api/src/prompt_cache.rs:20 (config), :314 (detect_cache_break), crates/runtime/src/compact.rs:96–183 (algorithm)
Plan/execute: crates/tools/src/lib.rs:1244–1245 (EnterPlanMode/ExitPlanMode dispatch), crates/runtime/src/task_registry.rs:56 (TaskRegistry), crates/runtime/src/recovery_recipes.rs:46–86 (scenarios + recipes)
Sub-agents: crates/tools/src/lib.rs:580 (Agent spec), :1238 (dispatch), :5099–5116 (normalize_subagent_type), :3577 (thread spawn), :3603–3630 (fresh runtime), :3657–3736 (per-type allowlists), crates/runtime/src/worker_boot.rs:255–294 (Worker lifecycle)
Permissions: crates/runtime/src/permissions.rs:9–15 (PermissionMode), :148–291 (authorize), crates/runtime/src/permission_enforcer.rs:39–173 (enforcer methods), crates/runtime/src/sandbox.rs:156–303 (resolve_sandbox_status, unshare probe), crates/runtime/src/file_ops.rs:42–54 (workspace boundary), :669–687 (symlink escape), crates/runtime/src/bash_validation.rs:103–594 (validators)
Extensibility: crates/runtime/src/hooks.rs:23–25 (event types), :155 (HookRunner), crates/plugins/src/lib.rs:117 (PluginManifest), crates/commands/src/lib.rs:2553 (resolve_skill_path)
MCP: crates/runtime/src/mcp_lifecycle_hardened.rs:16–28 (phases), :257 (validator), crates/runtime/src/mcp.rs:26–37 (name normalization), crates/runtime/src/mcp_tool_bridge.rs:74–90 (registry), crates/runtime/src/mcp_stdio.rs:480 (server manager)
Multi-agent: crates/runtime/src/lane_events.rs:6–66 (event taxonomy), :1019–1149 (constructors), crates/runtime/src/branch_lock.rs:23–77 (collision detection), crates/runtime/src/team_cron_registry.rs:51–138 (team + cron registries)
Mock harness: crates/mock-anthropic-service/src/lib.rs (whole crate), crates/rusty-claude-cli/tests/mock_parity_harness.rs (whole file), mock_parity_scenarios.json (manifest)
Reference: parity gaps worth knowing
Where claw-code falls short of upstream Claude Code:
- Prompt-caching breakpoints: claw-code observes cache behavior but doesn't insert
cache_control: ephemeralmarkers (see Context, Caching, Compaction). Real Claude Code almost certainly does. - Bash validation: 1 of 18 upstream submodules implemented, and the integration into
bash.rsis incomplete (see Permissions & Sandboxing). - Trust resolver:
#[cfg(test)]only — not active in production builds. Real Claude Code has folder-trust prompts working at session boot. - Sandboxing on non-Linux: returns
supported: falseon macOS/Windows — bash runs unsandboxed. Real Claude Code may have macOS sandbox-exec or Windows AppContainer integration. - Permission category granularity: 21/50 tools require
DangerFullAccess, including all the worker/team/cron-management tools. Real Claude Code likely splits this finer. - Compaction quality: claw-code's summarizer is structured but not LLM-driven. Real Claude Code likely uses a sub-model call for high-quality summaries.
These gaps don't make the codebase less instructive — they make it more so, because the architecture is legible without the production-grade complexity. Read it for shape; fill in your own production details.
The patterns above are portable across language and runtime — they're architectural moves, not implementation tricks. Pick three, build them well, ship the rest after.