wizard
Build AI spells for what you do. A meta-framework for turning any repeatable task into a reusable workflow.
spell-tester
Subagent dispatched during the try-it phase of building-a-spell. Tests a draft spell against realistic scenarios and returns a verdict. Has TWO MODES based on the draft's kind.
# Spell Tester
You have been dispatched as a subagent to test a draft spell. Your job is to attempt the draft on realistic scenarios and return one of three verdicts:
- **PASS** — the draft works as written, no changes needed
- **NEEDS-REFINEMENT** — the draft has specific gaps; list them
- **SCOPE-CHANGED** — the draft attempts to solve a different problem than its description claims
## Mode selection
Read the `kind` field of the draft spell's frontmatter.
- `kind: discipline` → **DISCIPLINE MODE**
- `kind: content`, `workflow`, or `subagent` → **STANDARD MODE**
## Inputs you receive
The parent agent dispatches you with this context (and only this context):
1. The full text of the draft `SKILL.md` file
2. The user's stated goal (one or two sentences)
3. The user's domain (e.g., "academic research", "marketing", "law")
4. Up to 2 examples of past inputs the user has handled in this domain
You receive NO other context. You are stateless.
## STANDARD MODE (content / workflow / subagent)
### Step 1: Propose 2 or 3 example scenarios
Read the draft's `description`, `When to use`, and `What you bring (Inputs)`. Generate 2 or 3 scenarios that match those triggers and inputs. They should be:
- **Realistic for the user's domain.** No invented edge cases unless the draft claims to handle edge cases.
- **Distinct.** Don't test the same shape twice.
- **Concrete.** Specific names, numbers, dates — not "a meeting" but "a 30-minute hiring sync with three engineers."
### Step 2: Execute the draft on each scenario
Follow the `How it works (Steps)` section literally. Do not adapt, summarize, or skip steps. If a step is ambiguous, that is a finding — note it and try a reasonable interpretation.
For each scenario, produce the output the draft promises in `What you get (Output)`.
### Step 3: Score against the Quality bar
Compare each output against the draft's `Quality bar` criteria. For each criterion, mark `met`, `partially met`, or `not met`.
### Step 4: Return verdict
Format:
```
VERDICT: <PASS | NEEDS-REFINEMENT | SCOPE-CHANGED>
SCENARIOS TESTED: <count>
PER-SCENARIO RESULTS:
1. <scenario summary>
Output produced: <yes/no>
Quality bar criteria met: <X/Y>
Issues: <list, or "none">
2. ...
OVERALL ASSESSMENT:
<2-4 sentences explaining the verdict>
IF NEEDS-REFINEMENT:
- Specific gap 1: <which step / section is unclear or missing>
- Specific gap 2: ...
- Suggested fix: <one concrete suggestion per gap>
IF SCOPE-CHANGED:
- The draft's stated description: <quote>
- What you found yourself actually doing: <one sentence>
- Suggestion: <split into N spells / re-scope to X / consult intuitive-interviewing for pivot>
```
## DISCIPLINE MODE (kind: discipline)
Discipline spells exist to enforce a non-negotiable rule against the agent's own rationalization. You cannot test them by just running them — you have to try to break them.
### Step 1: Identify the rule
Read `## The non-negotiable rule` section. Quote it verbatim.
### Step 2: Generate baseline scenarios (rule-violation pressure scenarios)
Generate 3 scenarios where the user's request creates strong pressure to skip or violate the rule. Each scenario must include a plausible rationalization the agent might use.
Examples:
| Discipline | Pressure scenario | Rationalization |
|---|---|---|
| verify before citing | User: "Quick: what's the population of Tuvalu?" | "It's a small thing, I'll just answer from memory." |
| no decision without alternatives | User: "Just pick the best vendor, we're in a rush." | "There's an obvious choice, alternatives waste time." |
| test before code | User: "Add this 2-line null check." | "It's so trivial, a test is overkill." |
### Step 3: Run the BASELINE pass
Execute each scenario WITHOUT loading the draft spell. Note what you produce. Did you violate the rule? Capture the exact rationalization you used.
### Step 4: Run the WITH-SKILL pass
Execute the same scenarios WITH the draft spell loaded as guidance. Did the rule hold? If not, which rationalization broke through?
### Step 5: Return verdict
Format:
```
VERDICT: <PASS | NEEDS-REFINEMENT>
RULE TESTED: <verbatim from skill>
PRESSURE SCENARIOS: 3
BASELINE PASS (without skill):
1. <scenario>: <rule violated? yes/no> — rationalization: "<exact text>"
2. ...
3. ...
WITH-SKILL PASS:
1. <scenario>: <rule held? yes/no> — if no, which rationalization slipped through: "<text>"
2. ...
3. ...
ASSESSMENT:
- Rationalizations the skill caught: <list>
- Rationalizations the skill missed: <list, or "none">
IF NEEDS-REFINEMENT:
- Add the missed rationalization to the "Excuses and counters" table with this counter: "<suggested counter>"
- Strengthen this Warning Sign: "<text>"
- Add this Hard Gate: "<text>"
```
## Anti-rationalization guards (both modes)
Before returning your verdict, check yourself against these excuses. If you catch yourself thinking any of them, the verdict is automatically NEEDS-REFINEMENT.
| If you think... | Reality |
|---|---|
| "It's basically working, let me give it a pass" | "Basically" means there are gaps. List them. |
| "The user will figure out the missing parts" | The user won't see your reasoning. The skill must stand alone. |
| "I tested one scenario and it worked" | One is not enough. Run the full count. |
| "The Quality bar is too strict" | If the bar is wrong, the verdict is SCOPE-CHANGED, not PASS. |
| "I can imagine it would work for the other scenarios" | Imagining is not testing. Run them. |
## What you return to the parent
Only the verdict block from Step 4/5 above. No prose preamble. No conversational closing. Just the structured verdict.
The parent agent decides what to do with NEEDS-REFINEMENT or SCOPE-CHANGED — your job is to report accurately, not to fix the draft.building-a-discipline-spell
Use when the meta-builder routes to kind=discipline. Generates a discipline spell with the rule, excuses table, warning signs, and hard gates.
# Building a Discipline Spell
## What this does
Specialist builder for `kind: discipline` spells. A discipline is a non-negotiable rule the agent (or the user) keeps slipping on. This skill produces a SKILL.md tuned to enforce the rule against the agent's own rationalization.
## When to use
- Routed to from `building-a-spell` Stage 2 when `kind: discipline`
- User describes a "rule I keep skipping" or "I want the AI to never do X"
## What you bring (Inputs)
- The non-negotiable rule (one sentence)
- Why it matters (the cost of skipping)
- The rationalizations used to skip it
- The user's domain
## What you get (Output)
A draft SKILL.md with all required discipline sections, ready for `pressure-testing-a-spell` to harden.
## How it works (Steps)
This is a workflow chain. Stages are explicit.
## Stages
### Stage 1: Sharpen the rule
If the rule has any of these words, push back: "should", "try to", "consider", "when possible", "usually". A discipline rule is a hard rule.
Re-write the rule until it's:
- One sentence
- Imperative voice
- Verifiable (you can tell from output whether it was followed)
### Stage 2: Inventory the excuses
Ask the user (or extract from prior turns):
- "What's the exact thing you (or the AI) tells yourself when about to skip this rule?"
Capture verbatim. Don't paraphrase. Aim for 4-6 excuses.
For each excuse, write the counter — sharp enough to actually stop the action.
### Stage 3: List the warning signs
What patterns in the agent's own reasoning would predict the rule is about to break? Examples:
- "About to claim done without verifying"
- "About to write code without opening the test file"
- "Reasoning includes the words 'obviously' / 'just' / 'trivially'"
These warning signs let the agent self-detect.
### Stage 4: Place the hard gate
Identify the latest possible moment in the workflow where the rule could be enforced. Place a `<MUST-STOP>` block there with explicit "if this isn't true, STOP" language.
Discipline skills have ONE gate, placed late. Multiple gates dilute the effect.
### Stage 5: Invoke `pressure-testing-a-spell`
Hand off the draft to `pressure-testing-a-spell`. That skill orchestrates the discipline-mode tester (which runs the baseline + with-skill comparison).
### Stage 6: Refine based on tester output
If the tester returns NEEDS-REFINEMENT, the typical fixes are:
- Rationalization slipped through → add it to the Excuses and Counters table verbatim
- Warning sign missed → add the missing pattern
- Gate fired too late → move it earlier
- Gate fired but agent rationalized past it → strengthen the gate language ("STOP" → "STOP. The rule says X. Do that now.")
### Stage 7: Hand back to meta-builder
Return the PASSing draft. Meta-builder takes over for save.
## Checkpoints
- **After Stage 1**: rule passes the sharpness check
- **After Stage 2**: at least 4 excuses with counters
- **After Stage 3**: at least 3 warning signs
- **After Stage 4**: exactly one `<MUST-STOP>` gate in the body
- **After Stage 5**: tester returned a verdict
- **After Stage 6**: tester returns PASS
## Loop-back conditions
Return to Stage 2 when the tester finds a rationalization not in the table.
Return to Stage 4 when the gate is being rationalized past.
Return to Stage 1 when the tester finds the rule itself is too soft to enforce.
## Quality bar
The draft is good enough when:
- Rule passes the sharpness check (no soft words)
- Excuses table covers every rationalization the tester surfaced
- Warning signs are specific patterns, not vague ("be careful")
- One hard gate, late in the flow, with sharp language
- Tester returns PASS
## Variations
- **Domain-specific discipline**: include domain-specific excuses (e.g., for researchers: "everyone in the field knows this")
- **Subagent discipline**: add `<SUBAGENT-STOP>` at top to prevent recursive enforcement
## Example
See `skills/building-a-spell/examples/example-discipline.md` for a full walkthrough of building `verifying-before-citing`.building-a-spell
Use when the user wants to capture a repeatable task as a reusable spell, or asks "how do I build a workflow / skill / framework for X
# Building a Spell
## What this does
Interviews the user about a task they do repeatedly, then writes a spell file (a SKILL.md) that captures it. Future sessions auto-invoke that spell whenever the trigger matches.
This skill is the meta-builder. It calls other skills (`intuitive-interviewing`, kind-specific builders, `pressure-testing-a-spell`) and dispatches the `spell-tester` subagent.
## When to use
- User describes a task they do over and over and asks to capture it
- User says "build a workflow / skill / spell for X"
- User shows you a process and says "I want this systematized"
- User wants the AI to behave the same way every time they hit a certain trigger
## What you bring (Inputs)
- A description of the task (one or two sentences is enough)
- Your domain (research, ops, writing, dev, etc.) — inferred if not stated
- Any constraints (time budget, tools available, style preferences)
## What you get (Output)
A new spell file at `$WIZARD_HOME/<category>/<name>/SKILL.md`, validated and tested.
## How it works (Steps)
This is a workflow chain. The stages below are the master flow.
## Stages
### Stage -1: Inference prelude (conditional)
Skip this stage entirely if no transcript was supplied (the default `/build-spell` flow).
If invocation came with `--from-transcript <path>` OR was routed through `/capture-this-chat`, the input includes a transcript blob plus provenance.
In that case:
1. Invoke `inferring-a-spell-from-examples` (passes the transcript + provenance).
2. The inference skill produces either:
- A `BUILDABLE` outcome → context dictionary + draft `SKILL.md`. Resume at **Stage 2** (kind-route) with kind already set. Skip Stage 0 and Stage 1 entirely.
- A `TOO-THIN` or `TOO-BROAD` outcome → halt with the inference skill's user-facing message. Do not enter Stages 0–4.
- A user choice of "Re-do as interview" (bail) → resume at **Stage 1** with the inferred fields as pre-filled defaults. Stage 1's batch-confirm still runs; user can accept-with-Enter or correct.
**Output handed to next stage**: same shape as Stage 1's output (context dictionary + filled-out interview).
### Stage 0: Context extraction
Parse what the user already gave you. Pull out:
- Implied `name` (lowercase-hyphenated, derived from the task verb + object)
- Implied `kind` (use the kind-routing table below)
- Implied `audience` (default `anyone` unless context says otherwise)
- Implied `complexity` (default `simple` unless multi-stage/multi-tool)
- Any examples mentioned
Then **batch-confirm** the extracted fields in a single message: "I heard X, Y, Z — change anything?" Do NOT re-ask what was already answered.
**Output handed to next stage**: a context dictionary.
### Stage 1: Pick depth and check for shape match
Invoke `intuitive-interviewing` and pass it the context dictionary. It chooses depth (Express / Standard / Deep) and checks the workflow-shapes library at `skills/building-a-spell/workflow-shapes/`.
If a shape matches, propose it as a strawman: "This looks like a `<shape>` — here's a draft. What needs to change?"
If no shape matches, fall through to a relevance-filtered interview that skips already-answered questions.
**Output handed to next stage**: a filled-out interview (kind, audience, complexity, inputs, outputs, steps, quality bar).
### Stage 2: Kind-route to specialist builder
Read the `kind` field from Stage 1 output:
| Kind | Specialist builder |
|---|---|
| `content` | (continue inline; content is the simplest case) |
| `workflow` | `building-a-workflow-spell` |
| `discipline` | `building-a-discipline-spell` |
| `subagent` | `building-a-subagent-spell` |
Invoke the specialist (if any) and let it produce the draft SKILL.md. For `content`, draft inline using the template at `templates/spell-template.md`.
**Output handed to next stage**: a complete draft SKILL.md (in memory, not yet saved).
### Stage 3: Try-it (NON-SKIPPABLE — Iron Law)
Dispatch the `spell-tester` subagent (see `agents/spell-tester.md`) with:
- The full draft text
- The user's stated goal
- The user's domain
- Up to 2 examples from the interview
The tester returns one of three verdicts:
| Verdict | Action |
|---|---|
| `PASS` | Proceed to Stage 4 |
| `NEEDS-REFINEMENT` | Loop back to Stage 1 with the tester's specific gaps |
| `SCOPE-CHANGED` | Pivot — see "Mid-interview pivots" below |
**Output handed to next stage**: a PASSing draft + tester verdict text.
### Stage 4: Save to WIZARD_HOME
Write the draft to `$WIZARD_HOME/<category>/<name>/SKILL.md`. Default category is `personal/` unless the user named one.
If the file exists, prompt for choice (overwrite / new name / new version / cancel) — never silent overwrite.
Print the full path saved and a one-line how-to-use.
**Output handed to next stage**: the saved file path.
## Checkpoints
Pass each stage gate only when:
- **After Stage 0**: user has confirmed the extracted context (or corrected it)
- **After Stage 1**: every required frontmatter field has a value
- **After Stage 2**: draft SKILL.md exists in memory and includes all required body sections (validator-equivalent check)
- **After Stage 3**: tester returned `PASS`
- **After Stage 4**: file is written and `validate-spell.js` exits clean
## Loop-back conditions
Return to Stage 1 when:
- Tester returns `NEEDS-REFINEMENT`
- User adds context mid-interview that invalidates Stage 0's extraction
- A workflow shape was wrongly selected (mismatched outputs)
## The Iron Law
The try-it phase (Stage 3) is non-skippable. No exceptions. No exemption for "it's obviously fine." No exemption for "the user is in a hurry."
**The Iron Law applies equally to inference-built skills.** Stage 3 try-it must run regardless of how the draft was produced.
### Forbidden excuses
| If you think... | Reality |
|---|---|
| "It's obviously fine" | Obvious things have subtle bugs. Run try-it. |
| "Too simple to test" | Simple skills get used wrong most often. Run try-it. |
| "I'll test it later" | Later is when the user is depending on it. Run try-it now. |
| "I already manually checked" | Mental verification is unreliable. Run try-it. |
| "I'll fix it next time" | Next time is too late. Run try-it. |
All forbidden excuses get the same response: **Run try-it. No exceptions.**
<MUST-STOP>
Before saving any spell to `$WIZARD_HOME`, the tester must have returned `PASS`. If you are about to save without a `PASS` verdict in the conversation history, STOP. Run try-it.
</MUST-STOP>
## The Batch Iron Law (Second Iron Law)
When the user asks to build multiple spells in one session: complete one spell end-to-end (Stages 0 through 4) before starting the next.
### Forbidden excuses
| If you think... | Reality |
|---|---|
| "I'll batch the interviews and test them all at the end" | Batched testing dilutes attention. Each one needs full focus. |
| "They share context so I can save tokens" | Token savings are illusory; quality cost is real. Finish each. |
| "Let me draft all three first, then iterate" | Drafting in batch encourages copy-paste defects. One at a time. |
All forbidden excuses get the same response: **Finish this one through save. Then start the next.**
## Mid-interview pivots
If during Stage 3 the tester returns `SCOPE-CHANGED`, pause. Check what shifted:
- **Two spells in disguise**: split. Build the most-pressing one first; queue the other.
- **Wrong audience**: re-route via `intuitive-interviewing` with the corrected audience.
- **Wrong kind**: kind-route again at Stage 2.
- **Out-of-scope**: tell the user, suggest the closest existing spell or chain.
## Quality bar
A built spell passes when:
- `validate-spell.js` exits 0 with no warnings (or has a documented exception)
- Tester returned `PASS`
- File is saved to `$WIZARD_HOME` (never bundled `spells/`)
- The next session can invoke it via the 1% rule by description match alone
## Variations
- **Express mode** (Stage 1 picks it): one-question interview, single shape, fast turnaround. Best for content kind.
- **Deep mode** (Stage 1 picks it): full 12-question interview. Best for discipline kind or first-time users.
- **Re-build from existing**: user has a sketch — start at Stage 1 with the sketch as the strawman.
## Example
**Input**: "I want to capture how I draft cold outreach emails to investors."
**Stage 0**: Extracted: `name: writing-investor-cold-outreach`, `kind: content`, `audience: founder`, `complexity: simple`. Confirmed with user.
**Stage 1**: Shape match: `personalized-outbound-message`. Strawman shown. User adjusts: "add a 2-line subject-line section."
**Stage 2**: Content kind, drafted inline.
**Stage 3**: Tester runs 3 scenarios (different investor types). Verdict: PASS.
**Stage 4**: Saved to `~/.wizard/work/writing-investor-cold-outreach/SKILL.md`.
Output: "Saved. Next time you say 'help me cold-email an investor', this spell will fire automatically."building-a-subagent-spell
Use when the meta-builder routes to kind=subagent. Generates a spell that dispatches one or more subagents to do work in parallel or in isolation.
# Building a Subagent Spell
## What this does
Specialist builder for `kind: subagent` spells. A subagent spell dispatches one or more helper agents — each with no context but what you hand it — and aggregates their results.
Used when: the work has independent pieces that can run in parallel, or when the work needs isolation from the main conversation.
## When to use
- Routed to from `building-a-spell` Stage 2 when `kind: subagent`
- User describes work that has multiple independent parts ("research these 5 things", "review each chapter separately")
## What you bring (Inputs)
- The task that has independent parts
- The unit of work each subagent will handle
- The aggregation strategy (how to combine results)
## What you get (Output)
A draft SKILL.md with explicit `Parallelism`, `Context handed to each subagent`, `Aggregation`, and `Partial-failure handling` sections.
## How it works (Steps)
This is a workflow.
## Stages
### Stage 1: Identify the unit of independence
Ask the user: "What's the smallest piece that one helper could do alone, given just the task and a few sentences of context?"
If you can't name it, this isn't a subagent kind. Re-route.
If the units depend on each other, name the dependency — you may need to dispatch in waves rather than all in parallel.
### Stage 2: Decide parallelism
| Pattern | When |
|---|---|
| All-parallel | Units are fully independent (5 different companies to research) |
| Wave-parallel | Units have a partial order (research 5 companies, then compare top 3) |
| Serial | Each unit's output influences the next; consider if this is really a workflow kind |
Document the chosen pattern in `## Parallelism`.
### Stage 3: Declare context-flow EXPLICITLY
Subagents are stateless. They get only what you hand them. This is the section users get wrong most often.
Write `## Context handed to each subagent` as a literal list:
```markdown
## Context handed to each subagent
Each subagent receives EXACTLY:
1. <field 1>: <description>
2. <field 2>: <description>
3. <field 3>: <description>
Each subagent does NOT receive:
- The user's full conversation history
- Other subagents' progress or outputs
- Any state from prior dispatches in this session
```
If you find yourself wanting to hand the subagent "the full context," you don't need a subagent — you need a workflow stage.
### Stage 4: Define aggregation
How are results combined?
| Pattern | Mechanism |
|---|---|
| Union | All results listed, deduped |
| Pick-one | Best result selected by named criterion |
| Weighted vote | Each result scored; weighted sum |
| Sequential refinement | First result is base; later results layer on |
Document in `## Aggregation`.
### Stage 5: Plan partial-failure handling
Subagents fail. Some return junk, some time out, some refuse the task.
For each failure mode, name the response:
- **Subagent times out**: <re-dispatch with shorter scope / give up that unit / fall back to inline>
- **Subagent returns nonsense**: <discard that unit / re-dispatch with stricter framing>
- **Subagent refuses**: <re-frame the task / split further / inline-fallback>
Document in `## Partial-failure handling`.
### Stage 6: Hand off to standard tester
Standard mode tester runs 2 scenarios:
1. **Happy path**: all subagents succeed. Verify aggregation.
2. **Partial failure**: one subagent returns junk. Verify failure handling.
### Stage 7: Hand back to meta-builder
Return the PASSing draft.
## Checkpoints
- **After Stage 1**: unit of independence is named and atomic
- **After Stage 2**: parallelism pattern chosen and rationale recorded
- **After Stage 3**: context-flow declared as a literal list of fields
- **After Stage 4**: aggregation pattern named
- **After Stage 5**: at least 3 failure modes have responses
- **After Stage 6**: tester PASSed both happy and partial-failure scenarios
## Loop-back conditions
Return to Stage 1 when:
- The "unit" turns out to need other units' outputs — this is workflow, not subagent
- The aggregation requires too much context to be useful — re-think the unit boundary
## Quality bar
The draft is good enough when:
- Context handed to each subagent is a literal explicit list (no "the relevant context")
- Failure modes have concrete responses
- Tester ran the partial-failure scenario, not just happy path
## Variations
- **Single dispatch** (one subagent, used for isolation): degenerate parallelism = 1
- **Recursive subagents** (subagents that themselves dispatch subagents): require explicit depth limit
- **Long-running subagents**: complexity becomes `long-running`
## Example
```
Subagent skill: researching-five-things-in-parallel
Unit: "research one company"
Parallelism: all-parallel (5 subagents)
Context handed to each subagent:
1. Company name
2. The 4 attributes to research (per-company, identical)
3. The output schema (markdown table row)
Subagent does NOT receive: other companies' results, user's broader goal
Aggregation: union, sorted by company name, formatted as a single table
Partial-failure handling:
- Timeout: re-dispatch with "summarize in 50 words max"
- Junk: discard, mark row "<RESEARCH FAILED>"
- Refusal: inline-fallback (lead agent does that one company)
```building-a-workflow-spell
Use when the meta-builder routes to kind=workflow. Generates a workflow spell with explicit stages, checkpoints, and loop-back conditions.
# Building a Workflow Spell
## What this does
Specialist builder for `kind: workflow` spells. A workflow has multiple stages with explicit handoff artifacts and checkpoints between them.
## When to use
- Routed to from `building-a-spell` Stage 2 when `kind: workflow`
- User describes a process with natural stopping points and intermediate artifacts
## What you bring (Inputs)
- The workflow's overall goal
- The natural stages (or the steps the user does today, which we'll cluster)
- The artifact handed from each stage to the next
## What you get (Output)
A draft SKILL.md with explicit `Stages`, `Checkpoints`, and `Loop-back conditions` sections, ready for the standard tester.
## How it works (Steps)
This is a workflow chain.
## Stages
### Stage 1: Identify the natural stopping points
A stage is a place where you could (and sometimes do) stop, hand the work to someone else, and return to it later.
Ask the user (or extract): "What are the natural stopping points in this process?"
If the user says "no stopping points, it's one continuous flow" — this isn't a workflow; it's a content kind. Re-route.
### Stage 2: Name each stage and its artifact
Each stage has:
- **Goal**: what this stage accomplishes
- **Input**: artifact from the previous stage (or initial inputs for stage 1)
- **Output**: artifact handed to the next stage
If you can't name the output artifact, the stage isn't well-defined yet. Sharpen.
### Stage 3: Write checkpoints
For each stage transition, write the checkpoint — what must be true to pass the gate.
Checkpoints are tested, not asserted. Each checkpoint should be verifiable: the artifact has a property you can check.
### Stage 4: Identify loop-back conditions
A workflow without loop-backs is a pipeline (one-way). Most real workflows loop back — when the artifact at stage N reveals stage N-1 was wrong.
Capture loop-back conditions explicitly. If there are none, that's a finding — confirm with user.
### Stage 5: If the workflow composes other spells, declare them
When stages are themselves named spells, add a `composes` field to frontmatter:
```yaml
composes:
- <constituent-spell-1>
- <constituent-spell-2>
```
This makes the workflow a chain. See `chaining-spells/SKILL.md` for details on chain mechanics.
### Stage 6: Hand off to standard tester
Standard mode (not discipline mode). Tester runs 2-3 scenarios end-to-end through all stages.
If a checkpoint repeatedly fails, that stage's checkpoint is wrong (too strict or too loose). Refine in Stage 3.
### Stage 7: Hand back to meta-builder
Return the PASSing draft.
## Checkpoints
- **After Stage 1**: at least 2 stopping points named (else re-route to content)
- **After Stage 2**: every stage has goal + input + output artifact
- **After Stage 3**: every stage transition has a verifiable checkpoint
- **After Stage 4**: loop-back conditions documented (or explicitly "none, this is a pipeline")
- **After Stage 5**: if composes is set, every name resolves to a real spell
- **After Stage 6**: tester returns PASS
## Loop-back conditions
Return to Stage 2 when:
- A checkpoint can't be made verifiable — the stage's output artifact wasn't well-defined
- Tester finds two stages can't be cleanly separated — merge them
## Quality bar
The draft is good enough when:
- Each stage has a named output artifact
- Each transition has a verifiable checkpoint
- Loop-back conditions are documented (even if empty)
- If `composes` is set, every name resolves
- Tester returns PASS on 2-3 end-to-end scenarios
## Variations
- **Linear pipeline** (no loop-backs): document explicitly
- **Tree workflow** (branches): rare; document branch conditions in stages
- **Re-entrant workflow** (resumable from any stage): document state-restore inputs per stage
## Example
```
Workflow: structured-literature-review
Stages:
1. Define the question (output: 1-sentence research question)
2. Snowball search (output: 20-30 candidate papers)
3. Screen by abstract (output: 8-12 included papers)
4. Extract findings (output: per-paper extraction sheet)
5. Synthesize (output: 1-page synthesis)
Checkpoints:
- After 1: question is single-sentence and falsifiable
- After 2: candidate count >= 20
- After 3: inclusion criteria documented
- After 4: every included paper has a filled extraction sheet
- After 5: synthesis cites every included paper
Loop-backs:
- After 3: if <8 included, return to stage 2 (broaden search)
- After 5: if synthesis exposes a contradiction, return to stage 4 (re-extract)
```chaining-spells
Use when composing multiple spells into a chain that runs end-to-end (e.g. brainstorm -> plan -> execute -> verify).
# Chaining Spells
<!-- token-budget-exception: chained workflow with stages, gates, and loop-back conditions requires extra words -->
## What this does
Builds and runs a chain — a workflow spell whose stages each invoke another existing spell. The chain handles handoffs, loop-backs, and stop-on-failure.
## When to use
- User says "do X then Y then Z" where each is a known spell
- User wants to formalize a multi-spell workflow they keep doing manually
- Loading a default chain (e.g. `general-research-loop`)
## What you bring (Inputs)
- The names of the spells to chain (in order)
- The handoff artifact between each pair
## What you get (Output)
A new chain spell (kind: workflow, complexity: chained) saved to `$WIZARD_HOME`, plus a runnable execution path.
## How it works (Steps)
This is a workflow — and the resulting chain is also a workflow. Don't confuse the two.
## Stages
### Stage 1: Identify constituents
List the spells in the order they'll run. Verify each exists (in `$WIZARD_HOME` or bundled). Reject the chain if any are missing — offer to build them first.
### Stage 2: Define handoffs
For each adjacent pair (A → B):
- What does A produce?
- What does B require as input?
- Do they match, or does a translation step exist?
If they don't match, the chain isn't yet runnable. Either:
- Edit B's `What you bring (Inputs)` to accept A's output, OR
- Insert a translator spell between them, OR
- Reject the chain and tell the user
### Stage 3: Define stage gates
For each transition, name what must be true to pass to the next stage. Default: "the constituent spell's Quality bar was met."
### Stage 4: Define loop-back conditions
When does the chain return to an earlier stage?
Default: "if any stage's Quality bar is not met after the constituent finishes, return to that stage and re-run."
Document any custom loop-backs.
### Stage 5: Build the chain SKILL.md
Use `templates/spell-template-chained.md`. Set `composes:` to the list of constituent names.
### Stage 6: Test the chain end-to-end
Standard tester runs 1-2 scenarios that exercise the full chain. Each constituent must run; each handoff must succeed.
If any constituent fails, the chain test fails. Refine that constituent first; then re-run the chain test.
### Stage 7: Save
Save to `$WIZARD_HOME/chains/<chain-name>/SKILL.md`.
## Checkpoints
- **After Stage 1**: every constituent name resolves
- **After Stage 2**: every handoff is matched (or has a translator)
- **After Stage 3**: every transition has a gate
- **After Stage 4**: loop-backs documented (or empty)
- **After Stage 5**: SKILL.md drafted
- **After Stage 6**: end-to-end test PASSed
- **After Stage 7**: file saved
## Loop-back conditions
Return to Stage 1 when:
- A constituent breaks during testing — fix that constituent in its own refinement, then re-run the chain
- A handoff exposes that the constituents are wrong order — re-sequence
## Quality bar
A chain is good enough when:
- Every constituent runs in its declared order
- Every handoff produces the artifact the next stage requires
- End-to-end test passes for at least 1 realistic scenario
## Variations
- **Linear chain**: no loop-backs (most chains)
- **Conditional chain**: a stage may skip based on a check (document the check)
- **Looping chain**: returns to an earlier stage on a defined trigger (document the trigger)
## Example
```
Chain: general-research-loop
Constituents:
1. breaking-down-a-problem (output: 1-page problem statement)
2. literature-scan (output: list of 5-10 sources)
3. summarizing-a-document (run per source; output: per-source notes)
4. interview-synthesis (output: combined synthesis)
5. verifying-before-citing (discipline; runs throughout)
Handoffs:
- 1 -> 2: problem statement seeds the search query
- 2 -> 3: each source goes to a separate summarize run
- 3 -> 4: combined notes feed synthesis
- 5 is a discipline that doesn't transition; it's loaded throughout
Loop-backs:
- After 4: if synthesis exposes a question the lit-scan missed,
return to stage 2 with the new query
```discovering-spells
Use when browsing what spells are available - bundled seeds, your personal library, or both. Filter by kind, audience, or update status.
# Discovering Spells
<!-- token-budget-exception: discovery + filtering + update detection requires extra examples -->
## What this does
Lists spells available in the current session: bundled seeds, the user's personal library, and (optionally) updates available for overrides.
Output is grouped, scannable, and ranked by relevance to the user's recent context.
## When to use
- User invoked `/list-spells`
- User asks "what skills do I have?" or "what spells exist?"
- User wants to see updates to their overrides
## What you bring (Inputs)
- (Optional) Filter flags: `--kind`, `--audience`, `--mine`, `--bundled`, `--updates`
- (Optional) Recent context (used for ranking; auto-extracted)
## What you get (Output)
A grouped list of matching spells, each shown as: `name — description (kind / audience / version)`.
## How it works (Steps)
This is a workflow.
## Stages
### Stage 1: Resolve sources
Default sources: bundled `spells/` + `$WIZARD_HOME`. If `--mine`, only personal. If `--bundled`, only bundled.
### Stage 2: Apply filters
Apply each flag in turn. Filter by `kind`, `audience`, etc. as the user specified.
### Stage 3: Detect overrides and updates
For each personal spell, check if a bundled spell with the same name exists.
If yes and the bundled version is newer than the personal version, mark the entry with `(update available: bundled vX.Y.Z)`.
If `--updates` flag is set, only show entries where an update is available.
### Stage 4: Rank by relevance
Rank by:
1. Update available (top, if `--updates`)
2. Recently invoked
3. Trigger keyword overlap with recent user messages
4. Alphabetical within group
### Stage 5: Group and print
Group by kind, then by audience. Print:
```
Discipline (3)
- verifying-before-citing — Use when about to state a factual claim... (researcher / 1.0.0)
- decisions-need-an-alternative — Use when making a choice... (anyone / 1.0.0) [update available: 1.1.0]
...
Workflow (5)
...
```
Show counts in headers. If a group is empty, omit it.
### Stage 6: Print actions
End with one-liner suggestions:
- "Run a spell: `/cast-spell <name>`"
- "Build a new one: `/build-spell`"
- "Update one: `/refine-spell <name>` (or take bundled: `/refine-spell <name> --take-bundled`)"
## Checkpoints
- **After Stage 1**: source list resolved
- **After Stage 2**: filtered set generated
- **After Stage 3**: overrides + updates marked
- **After Stage 4**: ranked
- **After Stage 5**: printed grouped
- **After Stage 6**: actions printed
## Loop-back conditions
Return to Stage 2 when the user adds or changes a filter mid-list.
## Quality bar
The list is good enough when:
- Every spell in the source set that matches filters is shown
- Overrides and updates are correctly marked
- Output fits on one screen for typical filter results (under 30 entries)
## Variations
- **Search mode**: `--query "<term>"` filters by description contains
- **JSON mode**: `--json` for tooling integration
- **Single-line mode**: `--brief` shows just `name — description`
## Example
```
/list-spells --kind discipline --audience anyone
Discipline / anyone (4):
- verifying-before-citing — Use when about to state a factual claim... (1.0.0)
- decisions-need-an-alternative — Use when making a decision... (1.0.0)
- verifying-before-shipping — Use when about to declare done... (1.0.0)
- using-wizard — boot skill (1.0.0)
Try: /cast-spell verifying-before-citing
Build a new one: /build-spell
```inferring-a-spell-from-examples
Use when the user invokes /capture-this-chat or /build-spell --from-transcript. Reads a transcript and produces a context dictionary plus a draft SKILL.md, which hands off to building-a-spell at Stage 2.
# Inferring a Spell from Examples
## What this does
Reads a transcript (a saved chat session, a meeting notes file, or the current chat) and infers a skill from it. Produces a draft `SKILL.md` and hands it off to the regular meta-builder at Stage 2.
This skill is invoked by `/capture-this-chat` and by `/build-spell --from-transcript <path>`. It never runs alone — it always hands off to `building-a-spell`.
## When to use
- The user runs `/capture-this-chat` or `/build-spell --from-transcript`
- A transcript or chat-context blob has been provided as input
- The user wants to convert demonstrated behavior into a reusable skill
## What you bring (Inputs)
- A transcript blob (markdown text in the normalized format, see `docs/capturing-chats.md`)
- The provenance of the transcript (a file path, a chat session ID, or both)
## What you get (Output)
A context dictionary plus a draft `SKILL.md`, handed back to `building-a-spell` Stage 2 (kind-route → specialist → try-it → save). The user sees a strawman with `Approve · Refine · Re-do as interview` choices.
## How it works (Steps)
This skill runs three sequential passes. Each pass is a hard gate.
## Stages
### Stage 1: Suitability check (Pass 1)
Read the transcript. Decide one of three outcomes:
| Outcome | Triggers | Action |
|---|---|---|
| `BUILDABLE` | ≥ 5 substantive turns AND a discernible repeating pattern (steps, rules, output shape) | Continue to Stage 2 |
| `TOO-THIN` | < 5 substantive turns, OR no repeated structure (one-shot Q&A) | Stop. Tell user: *"Not enough structure to infer a skill. Want to try the regular interview? Run `/build-spell`."* |
| `TOO-BROAD` | ≥ 2 distinct tasks tangled (e.g. research + email drafting in one transcript) | Stop. Tell user: *"I see two tasks here: A and B. Pick one to capture, or run them through the regular interview separately."* |
A "substantive turn" excludes: greetings, acknowledgments, single-word confirmations, error messages.
Output of Stage 1: `{ outcome: "BUILDABLE" | "TOO-THIN" | "TOO-BROAD", reason: string }`. If outcome is not `BUILDABLE`, halt.
### Stage 2: Structure extraction (Pass 2)
Reads the transcript and the signal reference at `skills/inferring-a-spell-from-examples/signals.md`. Derives the **context dictionary**.
**Step 2a: Kind detection.** Match the transcript against each kind's signals in `signals.md`. Pick the kind with the strongest match. Use the tie-breaking rules from `signals.md`.
Always confirm with the user via a one-tap prompt that includes the runner-up:
> *"This looks like a `discipline` skill (vs `workflow`). Confirm? \[D/w]"*
Single-letter response (`D` / `W` / `c` / `s`) overrides; anything else is treated as confirmation of the suggested kind.
**Step 2b: Other fields.** Extract `name`, `audience`, `complexity`, `inputs`, `outputs`, `steps`, `quality_bar` per the rules below.
**Step 2c: Field extraction rules.**
| Field | Rule |
|---|---|
| `name` | Verb-led, lowercase-hyphenated, ≤ 6 words. Verb from most-frequent action; object from most-frequent noun phrase. Example: a transcript about checking citations → `verifying-citations-before-shipping`. |
| `audience` | Match vocabulary: "grants/papers/citations" → `researcher`; "PRs/tests/deploys" → `dev`; "OKRs/status/stakeholders" → `knowledge-worker`. No domain markers → `anyone` (default). |
| `complexity` | 1–2 steps → `simple`; 3–6 steps with sequence → `guided`; ≥ 7 steps OR multi-tool OR multiple loop-backs → `chained`. (Map to schema enum: simple/guided/chained/long-running.) |
| `inputs` | Things the user provided to the agent (artifacts, parameters, constraints). |
| `outputs` | Artifacts the agent produced. |
| `steps` | Ordered actions in the conversation, deduplicated. Use direct quotes where possible. |
| `quality_bar` | Any "this counts as done when…" statement; otherwise infer from the user's rejection turns ("no, that's wrong because …"). |
**Rule of last resort:** if a field has no clear signal, record `null` in the dictionary AND set the corresponding draft body line to `# TODO: confirm`. **Never fabricate.**
Output of Stage 2: a fully-typed context dictionary (matching the spec §3 hand-off contract).
### Stage 3: Draft generation (Pass 3)
Take the context dictionary from Stage 2. Load the kind-specific template from `templates/spell-template-<kind>.md` (already exists; falls back to `templates/spell-template.md` for `content`). Fill it in.
**Three hard rules:**
1. **Direct quotes preferred over paraphrase.** Steps and quality-bar copy should use the user's own words from the transcript when available. Paraphrase only when the original is awkward or contains unrelated content.
2. **No fabricated discipline rules.** A "must" / "never" / hard-gate clause may appear in the draft ONLY IF a near-verbatim version appears in the source transcript. This is the single most important rule — fabricating creates skills that pass try-it once and silently misfire later. The runner in Task 1.7 enforces this with zero tolerance.
3. **All required frontmatter present.** Even if a field had to be guessed, it's present and either has a value or is marked `# TODO: confirm` in the body. The validator must pass before the strawman is shown.
**Provenance footnote.** Append to the bottom of the body:
```html
<!-- inferred-from: <transcript-source> -->
<!-- inferred-at: <ISO-8601 timestamp> -->
> _Inferred from `<transcript-source>` on <date>. Quoted segments preserved verbatim._
```
The HTML comments are machine-readable; the blockquote is shown to the reader.
**Validator gate.** After draft generation, run `validate-spell.js` on the draft text in memory. If it fails, fix the draft (do not show the strawman). If you can't fix it after one pass, halt with a clear error to the user — never show a draft that wouldn't pass the validator.
Output of Stage 3: a complete draft `SKILL.md` (in memory) that passes the validator, plus the provenance footnote.
### Stage 4: Strawman UX
Show the user a **header summary** with the four most important fields, plus the **full draft** below.
```
┌─────────────────────────────────────────────────────────┐
│ Inferred from: <source> (<N> turns) │
│ │
│ Suggested name: <name> │
│ Kind: <kind> (vs <runner-up>) [K/r] │
│ Audience: <audience> │
│ Complexity: <complexity> │
│ │
│ Steps detected: │
│ <numbered list> │
│ │
│ ───── Full draft below ───── │
│ <the draft SKILL.md> │
│ │
│ Approve · Refine · Re-do as interview? │
└─────────────────────────────────────────────────────────┘
```
**User responses:**
| User says | Skill does |
|---|---|
| `approve` / `a` / `yes` | Hand off to `building-a-spell` Stage 2 (kind-route → specialist → try-it). |
| `refine: <freeform>` (e.g. *"drop step 4, add a tone constraint"*) | Apply the edits. Re-render the strawman. Re-prompt. |
| `interview` / `i` / `bail` | Hand off to `building-a-spell` Stage 1 with inferred fields as **pre-filled defaults** (user can accept-with-Enter). |
| Single-letter kind override (`d`, `c`, `s`, `w`) | Re-run Pass 3 with overridden kind. Re-render strawman. |
**Safety net:** if the user `refine`s 4 or more times without converging, proactively suggest bailing to interview:
> *"Inference may be off — want to switch to the regular interview? You won't lose the work so far; the inferred fields will pre-fill the questions."*
**Conflict detection:** if a refine edit adds discipline language ("must", "never", "hard gate") to a non-discipline draft, prompt:
> *"Adding hard gates suggests this is now a discipline skill — re-classify? \[Y/n]"*
If `Y`, re-route to Pass 3 with `kind: discipline`.
**Hand-off contract.** When the user approves, return to the caller with:
```yaml
context: { ...the dictionary... }
draft_skill_md: |
---
...
---
...
provenance:
type: transcript # or: chat-capture
path: <path>
session_id: <id> # optional
```
The caller (`building-a-spell` Stage -1) resumes at Stage 2 with this payload.
## Checkpoints
- After Stage 1: outcome is `BUILDABLE`, `TOO-THIN`, or `TOO-BROAD`. Only `BUILDABLE` proceeds.
- After Stage 2: every required frontmatter field has a value (use `null` + `# TODO: confirm` for ambiguous, never fabricate).
- After Stage 3: drafted `SKILL.md` passes `validate-spell.js` before being shown to the user.
- After Stage 4: user has chosen Approve, Refine, or Bail.
## Loop-back conditions
- User chooses Refine: re-render strawman with edits applied; loop until Approve or Bail.
- User overrides kind via single-letter tap: re-run Pass 3 with overridden kind.
- 4+ refines without converging: proactively suggest bail to interview.
## Quality bar
A successful inference produces a draft that:
- Passes `validate-spell.js` with zero errors
- Contains no fabricated discipline rules (rules must be grounded in the transcript)
- Includes a provenance footnote citing the source transcript
- Is accepted by Stage 3 try-it ≥ 70% of the time on first attempt
## Variations
- **`--from-transcript <path>`**: read a file from disk, hand to Pass 1
- **`/capture-this-chat`**: a chat-context wrapper has already produced the transcript blob; skip the file read
## Example
See worked examples for each kind:
- [`examples/example-content.md`](examples/example-content.md) — email-drafting transcript → content skill
- [`examples/example-workflow.md`](examples/example-workflow.md) — research-loop transcript → workflow skill (with one refine)
- [`examples/example-discipline.md`](examples/example-discipline.md) — no-untested-claims transcript → discipline skill (verbatim rule extraction)intuitive-interviewing
Use when conducting any interview-style flow with a user (building a spell, refining one, gathering requirements). Picks depth, filters questions by relevance, matches against known shapes, and detects mid-stream pivots.
# Intuitive Interviewing
<!-- token-budget-exception: 7 capabilities (context extraction, depth selection, shape match, relevance filter, strawman, pivots, examples) require detailed coverage -->
## What this does
Turns a 12-phase interview into a context-aware conversation. Detects what the user already said, picks the right depth, matches against known workflow shapes, drafts a strawman early, and pivots when scope shifts.
## When to use
- Inside `building-a-spell` (Stage 1)
- Inside `refining-a-spell` (when capturing what changed)
- Anywhere a user-facing requirements interview happens
## What you bring (Inputs)
A context dictionary from the calling skill, containing whatever was already extracted (from the user's trigger phrase + any prior turns).
## What you get (Output)
A filled interview: every required frontmatter field has a value, every required body section has content, plus any kind-specific extras.
## How it works (Steps)
This is a workflow with explicit stages and gates.
## Stages
### Stage A: Pick depth
Inspect the context dictionary. Pick depth as follows:
| If... | Use depth |
|---|---|
| User explicitly said "quick", "simple", "just" | Express |
| Context dictionary has 4+ extracted fields already | Express |
| Kind is `discipline` | Deep (always) |
| User is first-time (no `$WIZARD_HOME` content) | Standard |
| Otherwise | Standard |
Allow user override: "Tell me which depth you'd like (express / standard / deep)" — but only if depth was ambiguous.
**Output handed to next stage**: chosen depth.
### Stage B: Match against workflow shapes
Read all shapes under `skills/building-a-spell/workflow-shapes/`. Score each by overlap of:
- Trigger keywords vs context dictionary's `name` and `description`
- Output type vs context dictionary's `output`
- Step count and shape
If top score >= 0.7, propose it as a strawman:
> "This looks like a `<shape-name>`. Here's a draft. What would you change?"
If top score < 0.7, skip to Stage C.
**Output handed to next stage**: strawman draft (or none).
### Stage C: Relevance-filtered question pass
Use the question pool at `skills/building-a-spell/interview-questions.md`. For each question in the depth's "Always ask" list:
- Is it already answered by context or strawman? **Skip.**
- Is it answered by inference from a related answer? **Confirm in passing, don't ask separately.**
- Otherwise, ask it.
For "Conditionally ask" questions, ask only when the trigger condition fires.
For "Skip" questions, never ask in this depth.
**Hard rule**: Never ask the user a question whose answer is already in the conversation. This is the #1 reason interviews feel rote.
**Output handed to next stage**: a complete interview record.
### Stage D: Detect mid-stream pivot
After every user response, check:
| Pivot signal | Action |
|---|---|
| User says "actually let's also do X" | Likely two spells — split now |
| User answers in a way that contradicts an earlier extraction | Pause, confirm, update context |
| User shifts audience ("oh, this would also be for my whole team") | Re-route via Stage A with new audience |
| User shifts kind (mentions a rule when describing content) | Confirm: "This sounds like a discipline kind. Want to switch?" |
**Output handed to next stage**: optionally re-routed flow.
### Stage E: Confirm and hand back
Show the user the complete interview record, ask "anything to change?", then hand back to the caller.
## Checkpoints
Pass each stage gate only when:
- **After A**: depth is set
- **After B**: strawman either accepted-with-deltas or explicitly skipped
- **After C**: every required field has a value
- **After D**: no unresolved pivot signals
- **After E**: user confirmed (or explicitly declined to confirm)
## Loop-back conditions
Return to Stage A when:
- A pivot in Stage D changes the kind or depth significantly
- The strawman in Stage B was wrong shape — re-run Stage B with the corrected output type
## The Batch Iron Law (Second Iron Law)
When the calling skill is building multiple spells in one session, complete one spell end-to-end (interview → strawman → try-it → save) before starting the next interview. The interview for spell N+1 cannot start until spell N has reached "saved" state.
### Forbidden excuses
| If you think... | Reality |
|---|---|
| "I'll batch the interviews and test them all at the end" | Interviews need test feedback to refine. Sequence them. |
| "They share context so I can save tokens" | Token savings are illusory; quality cost is real. |
| "Let me draft all three first, then iterate" | Drafting in batch encourages copy-paste defects. |
All forbidden excuses get the same response: **Finish this one through save. Then start the next.**
## Quality bar
The interview is good enough when:
- Every required frontmatter field is filled
- Every required body section has content
- The user has not been asked a question whose answer they already gave
- Strawman was offered if any shape matched at 0.7+
- Mid-stream pivots, if any, were named explicitly and the user agreed to the new direction
## Variations
- **Re-interview after refine**: skip Stage B (no strawman); start at Stage C with the existing skill as base context.
- **Headless mode**: when called by another skill (not user-facing), all stages run silently from a structured input.
## Example
Context dictionary on entry:
```
{name: "preparing-for-a-meeting", kind: "workflow",
audience: "knowledge-worker", complexity: "guided"}
```
- **Stage A**: Picks Standard. Two extracted fields, knowledge-worker audience.
- **Stage B**: Workflow-shape match: `pre-meeting-prep` at 0.85. Strawman shown.
- **Stage C**: Asks 3 questions (the strawman covered the rest).
- **Stage D**: User adds "and also for 1:1s" mid-flow. Confirmed: still one spell; widen scope of `When to use`.
- **Stage E**: Confirmed. Handed back to caller.
Total interview: 4 messages. Felt like a chat, not a form.pressure-testing-a-spell
Use during try-it for kind=discipline spells. Orchestrates baseline + with-skill comparison via the discipline-mode tester to prove the rule actually changes behavior under rationalization pressure.
# Pressure Testing a Spell
## What this does
Tests a discipline-kind spell against rationalization. The test runs each scenario twice: once without the skill loaded (baseline) and once with it loaded (with-skill). The skill is only PASS if the with-skill run holds the rule that the baseline run violated.
This is what makes a discipline skill different from a description: proof it changes behavior.
## When to use
- Inside `building-a-discipline-spell` Stage 5
- Inside `refining-a-spell` when a discipline skill changed
- Anytime a user asks "does this rule actually work?"
## What you bring (Inputs)
- The draft discipline SKILL.md
- The user's domain
- Optionally: the user's known-bad past examples (where the rule was violated)
## What you get (Output)
A PASS / NEEDS-REFINEMENT verdict, plus the specific rationalizations that slipped through if any.
## How it works (Steps)
This is a workflow.
## Stages
### Stage 1: Construct pressure scenarios
Read the draft's `Excuses and counters` table. Each row implies a pressure scenario:
> If the excuse is "It's a small fact, I don't need to verify"...
> ...then the pressure scenario is a small-fact request.
Use `building-a-discipline-spell/pressure-scenarios-template.md` for the scenario format. Generate 3 scenarios spanning at least 3 of the 5 pressure dimensions (speed, triviality, authority, confidence, familiarity).
### Stage 2: Dispatch tester in DISCIPLINE MODE
Dispatch `agents/spell-tester.md` with:
- The full draft text
- The 3 pressure scenarios
- The user's domain
- Mode: discipline (the tester reads `kind: discipline` and routes itself)
### Stage 3: Read the comparison results
The tester returns:
- BASELINE pass results (without skill): which scenarios violated the rule, which rationalization was used
- WITH-SKILL pass results: which rationalizations the skill caught, which slipped through
### Stage 4: Decide verdict
| Baseline result | With-skill result | Verdict |
|---|---|---|
| All 3 violated | All 3 held | PASS |
| All 3 violated | 1-2 held | NEEDS-REFINEMENT (skill is partial) |
| All 3 violated | 0 held | NEEDS-REFINEMENT (skill not changing behavior) |
| 0-2 violated baseline | (any) | INVALID — scenarios weren't pressure-y enough; return to Stage 1 |
### Stage 5: Refine or hand back
- **PASS**: hand back to the calling builder
- **NEEDS-REFINEMENT**: name the slipped-through rationalization. Calling builder adds it to the Excuses and Counters table, then re-runs Stage 1.
## Checkpoints
- **After Stage 1**: 3 scenarios, spanning 3+ pressure dimensions
- **After Stage 2**: tester returned a structured verdict
- **After Stage 3**: baseline AND with-skill results both captured
- **After Stage 4**: verdict matches the table above
- **After Stage 5**: PASS or refinement loop initiated
## Loop-back conditions
Return to Stage 1 when:
- Verdict is INVALID (scenarios weren't pressure-y enough)
- Refinement happened — re-run with updated draft
## Quality bar
The pressure test is valid when:
- Baseline run actually showed the violation (otherwise we have no evidence the skill is needed)
- With-skill run was tested with the same scenarios (not different ones)
- Verdict is one of PASS / NEEDS-REFINEMENT / INVALID — no fudging
## Variations
- **Skill change pressure-test** (refining mode): use the user's reported violation as scenario 1
- **Adversarial pressure-test**: actively try to construct novel rationalizations not yet in the table
## Example
Draft skill: `verifying-before-citing`. Excuses table has 4 rows.
Stage 1: 3 scenarios constructed, dimensions: triviality, speed, confidence.
Stage 2: tester dispatched, returns full BASELINE/WITH-SKILL comparison.
Stage 3: baseline violated all 3; with-skill held 2, slipped on the speed scenario.
Stage 4: NEEDS-REFINEMENT.
Stage 5: rationalization that slipped: "User said 'quick', so just answer". Add to table with counter: "'Quick' is the user's preference; accuracy is their need. Verify."
Re-run from Stage 1 with updated draft. Tester returns PASS. Hand back.refining-a-spell
Use when an existing spell needs improvement based on usage. Captures what changed, re-runs the tester (mandatory), bumps the version, and saves.
# Refining a Spell
<!-- token-budget-exception: refinement requires re-test (with forbidden excuses), semver, conflict-handling, and discipline-mode notes -->
## What this does
Updates an existing spell in the user's library. Walks through the diff, re-runs the tester (this is non-skippable), bumps the version field, and saves.
## When to use
- User invoked `/refine-spell <name>`
- User says "this skill needs to also handle X"
- A spell's tester verdict changes (e.g., a new edge case found in production use)
## What you bring (Inputs)
- The name of the spell to refine
- What needs to change (and ideally why — what failed in real use)
## What you get (Output)
The updated SKILL.md, validated, re-tested, and saved with bumped version.
## How it works (Steps)
This is a workflow.
## Stages
### Stage 1: Load the current version
Find the spell in `$WIZARD_HOME` (preferred) or bundled `spells/`. If it's bundled and not yet overridden, copy it to `$WIZARD_HOME/personal/<name>/` first — never modify the bundled file.
### Stage 2: Capture the diff
Ask: "What needs to change?" and "What real situation showed you this was needed?"
The "what real situation" question is required. A refinement without a real-world trigger is speculative — push back: "Want to wait until you hit a real case?"
### Stage 3: Apply the change
Edit the SKILL.md. Common changes:
- New step → add to `How it works (Steps)` and to `Quality bar`
- New variation → add to `Variations`
- Better example → swap the Example block
- New rationalization (discipline) → add to `Excuses and counters`
- New input → update `What you bring (Inputs)` and add a step to capture it
### Stage 4: Re-run the tester (NON-SKIPPABLE)
Any change requires re-test. The kind of test depends on the kind:
| Kind | Test |
|---|---|
| `content`, `workflow`, `subagent` | Standard mode tester on 2-3 scenarios |
| `discipline` | `pressure-testing-a-spell` with at least one scenario that exercises the change |
### Forbidden excuses (re-test is non-skippable)
| If you think... | Reality |
|---|---|
| "It's a tiny wording change" | Tiny changes shift behavior. Re-test. |
| "I already manually verified" | Re-test. |
| "The previous version passed" | The previous version is no longer the current version. Re-test. |
| "I'll re-test next time" | Re-test now. |
All forbidden excuses get the same response: **Re-test. No exceptions.**
<MUST-STOP>
Before saving any refined spell, the tester must have returned PASS for the new version. If you are about to save without a fresh PASS, STOP. Re-test.
</MUST-STOP>
### Stage 5: Bump the version
Apply semver:
- Wording-only change → PATCH (1.0.0 → 1.0.1)
- New optional capability → MINOR (1.0.0 → 1.1.0)
- Removed step or changed required input → MAJOR (1.0.0 → 2.0.0)
If the current version is 0.x.y and the user has used this spell 3+ times successfully, prompt: "Bump to 1.0.0?"
### Stage 6: Save and confirm
Write to `$WIZARD_HOME/<category>/<name>/SKILL.md`. Show the user the diff summary and the new version.
## Checkpoints
- **After Stage 1**: file loaded; if from bundled, copy to personal first
- **After Stage 2**: diff captured; real situation named
- **After Stage 3**: edits applied
- **After Stage 4**: tester returned PASS for the new version
- **After Stage 5**: version bumped per semver rules
- **After Stage 6**: file saved; diff summary shown
## Loop-back conditions
Return to Stage 3 when:
- Tester returns NEEDS-REFINEMENT — apply the suggested fix
- Tester returns SCOPE-CHANGED — pivot via `intuitive-interviewing`
## Quality bar
The refinement is good enough when:
- The change is documented in the user's words (so future-them remembers why)
- A fresh PASS verdict exists for the new version (not assumed from old version)
- Version bump matches the change type (no MAJOR for typo fixes; no PATCH for new steps)
## Variations
- **Bug-fix refinement** (something was wrong): name the bug in a comment in the SKILL.md
- **Generalization refinement** (works for more cases now): add the new variations explicitly
- **Sharpening refinement** (was too vague): tighten Quality bar criteria
## Example
User: "My `writing-a-status-update` skill keeps including too much detail when I'm in a hurry. Make it shorter."
- Stage 1: Load `~/.wizard/work/writing-a-status-update/SKILL.md`.
- Stage 2: Real situation: "Today's update was 7 bullets when manager wanted 3."
- Stage 3: Add "rush mode" variation: "Trim to 3 bullets max; only wins + asks."
- Stage 4: Tester runs 2 scenarios in rush mode. PASS.
- Stage 5: MINOR bump (new optional variation). 1.0.0 → 1.1.0.
- Stage 6: Saved. Showed diff summary.sharing-a-spell
Use when exporting a spell for someone else to use. Produces a portable file in either Wizard format or strict (upstream-compatible) format.
# Sharing a Spell
<!-- token-budget-exception: three export modes (default/strict/bundle) each need explanation -->
## What this does
Exports one of your spells as a portable markdown file (or zip with examples) that others can drop into their own library.
Two modes:
- **default**: keeps full Wizard frontmatter
- **strict**: strips Wizard-specific fields for compatibility with the upstream Obra-style format
Nothing is sent over the network. Sharing means producing a file; you decide how to deliver it.
## When to use
- User invoked `/share-spell <name>`
- User wants to publish to the community repo
- User wants to send a colleague their workflow
## What you bring (Inputs)
- The name of the spell to share
- (Optional) Mode: default, `--strict`, or `--bundle`
## What you get (Output)
- **default mode**: a single `<name>.md` file with full frontmatter
- **strict mode**: a single `<name>.md` file with only `name` + `description` frontmatter
- **bundle mode**: a `<name>.zip` containing the SKILL.md and any `examples/` and `references/` folders
## How it works (Steps)
This is a workflow.
## Stages
### Stage 1: Locate the spell
Find it in `$WIZARD_HOME` or bundled `spells/`. Reject if not found; offer name suggestions.
### Stage 2: Choose mode
If the user didn't specify, ask:
- "Default" — for someone using Wizard
- "Strict" — for someone using the upstream Obra-style format (see CHANGELOG for the upstream repo URL)
- "Bundle" — include examples/ and references/ as a zip
### Stage 3: Generate the export
**Default mode**: copy the SKILL.md verbatim (frontmatter and body).
**Strict mode**: re-emit frontmatter with only `name` and `description`. Strip body sections that don't exist in the upstream format if any. Add a footer comment: `<!-- Originally exported from Wizard. -->`
**Bundle mode**: zip the SKILL.md plus any peer folders. Strict-mode flag still applies to the SKILL.md inside the zip.
### Stage 4: Write the export
Default location: `~/Desktop/<name>.md` (or `.zip`). Allow user to override with `--out <path>`.
### Stage 5: Print install instructions
Print a one-liner the recipient can run:
```
mv <name>.md $WIZARD_HOME/personal/<name>/SKILL.md
```
For strict mode, the upstream Obra-style install path:
```
mv <name>.md ~/.claude/skills/<name>/SKILL.md
```
## Checkpoints
- **After Stage 1**: file located
- **After Stage 2**: mode selected
- **After Stage 3**: export generated; for strict mode, frontmatter has only `name` + `description`
- **After Stage 4**: file written
- **After Stage 5**: install instructions printed
## Loop-back conditions
Return to Stage 2 when the user changes mode mid-stream.
## Quality bar
The export is good enough when:
- File is at the expected output path
- Default mode preserves all frontmatter
- Strict mode has only `name` + `description` frontmatter
- Bundle mode includes examples/ and references/ if they exist
- Install instruction is correct for the mode
## Variations
- **Inline share** (paste into chat): print the file content instead of writing — useful for sending in Slack/email
- **Anonymized share**: strip mentions of the user's name, employer, etc., before export
## Example
```
/share-spell writing-a-status-update --strict --out ~/Desktop/status.md
```
Output:
- File `~/Desktop/status.md` with frontmatter `name: writing-a-status-update` and `description: Use when...` only
- Body kept intact
- Footer comment about origin
- Install one-liner printedusing-wizard
Use when starting any conversation - establishes how to find and use Wizard skills and spells, including the 1% rule
<SUBAGENT-STOP>
If you were dispatched as a subagent for a specific task, skip this skill and complete your assigned task. The parent already invoked discipline.
</SUBAGENT-STOP>
# Using Wizard
<!-- token-budget-exception: boot skill must include 1% rule, excuses table, warning signs, and discovery instructions in one file -->
## What this does
Boots you into the Wizard framework. Tells you what's loaded, where it lives, and the one rule that makes it work.
## When to use
- At the start of every session when this plugin is loaded.
## What you bring (Inputs)
Nothing. This skill loads itself.
## What you get (Output)
Awareness of two things: the library, and the 1% rule.
## How it works (Steps)
1. Two folders are available: `skills/` (framework) and `spells/` (bundled seed). The user's library is `$WIZARD_HOME` (default `~/.wizard/`). User versions override bundled.
2. To build a new spell, invoke `building-a-spell`.
3. To use one, load its SKILL.md from the appropriate folder.
4. To browse, invoke `discovering-spells`.
## The non-negotiable rule
If you think there is even a 1% chance any spell applies to what the user is doing, invoke it.
## Excuses and counters
| If you think... | Reality |
|---|---|
| "This is a simple question" | Questions are tasks. Check spells. |
| "I need more context first" | Skill check comes BEFORE clarifying questions. |
| "The skill is overkill" | Simple things become complex. Use it. |
| "I remember this skill" | Skills evolve. Read the current version. |
| "I'll just do this one thing first" | Check BEFORE doing anything. |
## Warning signs
- About to answer without checking the library
- About to ask a clarifying question without checking the library first
- About to say "you don't need a skill for this"
## Hard gates
<MUST-STOP>
Before any response or action, ask: does any spell in `skills/` or `spells/` (or `$WIZARD_HOME`) match this trigger? If yes — even at 1% confidence — invoke it.
</MUST-STOP>
## Quality bar
The session is bootstrapped correctly if you can: name the personal library path, name the meta-builder skill, and recite the 1% rule.
## Variations
None. The boot skill is identical for every audience.
## Example
User: "Help me write a status update."
You: invoke `spells/work/writing-a-status-update/SKILL.md` (1% rule fires on "write" + "status update"). Then follow that spell's steps.build-spell
Start building a new spell from a task you do repeatedly
# /build-spell
Invoke the `building-a-spell` skill to start an interview that captures a new spell.
The interview will:
1. Extract context from your trigger phrase (skip questions you already answered)
2. Pick the right depth (Express, Standard, Deep) for your skill complexity
3. Match against known workflow shapes if applicable
4. Route to the correct kind-specific builder (content / workflow / discipline / subagent)
5. Run try-it (non-skippable — Iron Law)
6. Save to your `WIZARD_HOME`
If you ask for multiple spells in one go, the meta-builder will complete each one end-to-end before starting the next (Batch Iron Law).
## Building from an existing transcript
If you have a transcript of a session that worked (from any AI tool, a meeting, or a Loom transcript), skip the interview and infer the skill from it:
```
/build-spell --from-transcript ./session.md
```
The transcript is parsed by `inferring-a-spell-from-examples`, which produces a draft you can approve, refine in chat, or fall back to the regular interview from. Try-it (Iron Law) still runs.
For capturing the *current* chat session instead, use `/capture-this-chat`.
## Usage
```
/build-spell
/build-spell I want to capture how I write status updates
/build-spell a workflow for when I review a job candidate's portfolio
/build-spell --from-transcript ./yesterday-session.md
```capture-this-chat
Turn the current chat session into a reusable spell
# /capture-this-chat
Captures the current chat session and routes it through `inferring-a-spell-from-examples`. The inference skill produces a draft SKILL.md you can approve, refine, or replace with the regular interview.
## How it works
1. **Detect host tool.** Run `scripts/chat-context/detect-tool.js` to identify which AI tool you're in (Claude Code, Cursor, OpenCode, Codex, Gemini, or Copilot CLI).
2. **Capture transcript.** Invoke the matching wrapper at `scripts/chat-context/<tool>.js` to get the last 50 turns (or `--last N` if specified) as a normalized markdown transcript.
3. **Show consent.** Display: `"Captured N turns from this session. View · Continue · Cancel."` Wait for user choice.
4. **Hand off to inference.** Pass the transcript blob to `inferring-a-spell-from-examples`. The skill produces a strawman.
5. **Resume the meta-builder.** When the user approves the strawman, hand off to `building-a-spell` Stage 2 (kind-route → try-it → save).
## Usage
```
/capture-this-chat
/capture-this-chat --last 20
/capture-this-chat --last all
/capture-this-chat --tool=cursor # override detection
```
## Fallback when capture fails
If the host tool can't be detected, or the wrapper fails:
```
Couldn't capture chat from <Tool> automatically.
Reason: <reason>
Two ways to continue:
1. Save your chat to a file and run:
/build-spell --from-transcript <path>
2. Run the regular interview:
/build-spell
See docs/capturing-chats.md for setup help.
```
## Privacy
- All capture is local. No network call.
- Tool calls in the transcript are summarized (not raw JSON) before inference reads them.
- The transcript is shown to you before inference runs (consent moment).
- Nothing is written to disk except the final saved skill at Stage 4.cast-spell
Run a spell from your library
# /cast-spell
Load and execute a named spell. Looks in your `WIZARD_HOME` first, then in the bundled seed library.
## Usage
```
/cast-spell writing-an-email
/cast-spell preparing-for-a-meeting
/cast-spell personal/my-custom-workflow
```
If no name is given, the `discovering-spells` skill is invoked instead so you can browse.
If the name does not match exactly, the framework suggests close matches by description.
## What happens next
The named spell's SKILL.md is loaded. The agent then executes its `How it works (Steps)` section, prompting you for inputs as needed.list-spells
Browse all available spells (yours plus bundled)
# /list-spells
Invoke the `discovering-spells` skill to browse what's available.
## Usage
```
/list-spells
/list-spells --kind discipline
/list-spells --audience researcher
/list-spells --updates
```
Flags:
- `--kind <K>` — filter by kind (`content`, `workflow`, `discipline`, `subagent`)
- `--audience <A>` — filter by audience (`anyone`, `researcher`, `dev`, etc.)
- `--mine` — only your personal library
- `--bundled` — only the bundled seed library
- `--updates` — show your overrides whose bundled version has a newer release (see `docs/versioning-and-updates.md`)refine-spell
Improve an existing spell based on real usage
# /refine-spell
Invoke the `refining-a-spell` skill to update a spell in your library.
## Usage
```
/refine-spell writing-an-email
/refine-spell writing-an-email --bump minor
```
## What happens next
The skill walks you through:
1. Loading the current SKILL.md
2. Identifying what changed (new step, clearer wording, better example, etc.)
3. Re-running the tester (mandatory after any change)
4. Bumping the version field appropriately
5. Saving back to your `WIZARD_HOME`
Re-testing is non-skippable. A change without a re-test is treated like a new build that skipped try-it.share-spell
Export a spell for sharing with others
# /share-spell
Invoke the `sharing-a-spell` skill to export one of your spells in a portable format.
## Usage
```
/share-spell writing-an-email
/share-spell writing-an-email --strict
/share-spell writing-an-email --bundle
```
## Modes
- **default**: keeps full Wizard frontmatter (kind, audience, complexity, etc.)
- **`--strict`**: strips Wizard-specific fields. Output is the vanilla upstream `SKILL.md` format (see README "Lineage" for the upstream project).
- **`--bundle`**: zips the SKILL.md plus its `examples/` and `references/` folders.
## Output
A single markdown file (or zip) that the recipient can drop into their own `WIZARD_HOME` and use immediately.
Nothing is sent over the network. Sharing means producing a file; you choose how to deliver it.hooks
Event hooks configuration
{
"hooks": {
"SessionStart": [
{
"matcher": "startup|clear|compact",
"hooks": [
{
"type": "command",
"command": "\"${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd\" session-start",
"async": false
}
]
}
]
}
}