Modifications to many skills

2026-05-09 19:36:03 +02:00
parent 5a30f4ca34
commit 44633ac6d6
62 changed files with 2213 additions and 345 deletions
@@ -1,6 +1,6 @@
 ---
 name: plan
-description: "Plan mode: write markdown plan to .hermes/plans/, no exec."
+description: Plan mode for Hermes — inspect context, write a markdown plan into the active workspace's `.hermes/plans/` directory, and do not execute the work.
 version: 1.0.0
 author: Hermes Agent
 license: MIT
@@ -1,6 +1,9 @@
 ---
 name: requesting-code-review
-description: "Pre-commit review: security scan, quality gates, auto-fix."
+description: >
+  Pre-commit verification pipeline — static security scan, baseline-aware
+  quality gates, independent reviewer subagent, and auto-fix loop. Use after
+  code changes and before committing, pushing, or opening a PR.
 version: 2.0.0
 author: Hermes Agent (adapted from obra/superpowers + MorAlekss)
 license: MIT
@@ -1,6 +1,6 @@
 ---
 name: subagent-driven-development
-description: "Execute plans via delegate_task subagents (2-stage review)."
+description: Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).
 version: 1.1.0
 author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
@@ -340,12 +340,3 @@ Catch issues early
 ```

 **Quality is not an accident. It's the result of systematic process.**
-
-## Further reading (load when relevant)
-
-When the orchestration involves significant context usage, long review loops, or complex validation checkpoints, load these references for the specific discipline:
-
- **`references/context-budget-discipline.md`** — Four-tier context degradation model (PEAK / GOOD / DEGRADING / POOR), read-depth rules that scale with context window size, and early warning signs of silent degradation. Load when a run will clearly consume significant context (multi-phase plans, many subagents, large artifacts).
- **`references/gates-taxonomy.md`** — The four canonical gate types (Pre-flight, Revision, Escalation, Abort) with behavior, recovery, and examples. Load when designing or reviewing any workflow that has validation checkpoints — use the vocabulary explicitly so each gate has defined entry, failure behavior, and resumption rules.
-
-Both references adapted from gsd-build/get-shit-done (MIT © 2025 Lex Christopherson).
@@ -1,53 +0,0 @@
-# Context Budget Discipline
-
-Practical rules for keeping orchestrator context lean when spawning subagents or reading large artifacts. Use these whenever you're running a multi-step agent loop that will consume significant context — plan execution, subagent orchestration, review pipelines, multi-file refactors.
-
-Adapted from the GSD (Get Shit Done) project's context-budget reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
-
-## Universal rules
-
-Every workflow that spawns agents or reads significant content must follow these:
-
-1. **Never read agent definition files.** `delegate_task` auto-loads them — you reading them too just doubles the cost.
-2. **Never inline large files into subagent prompts.** Tell the agent to read the file from disk with `read_file` instead. The subagent gets full content; your context stays lean.
-3. **Read depth scales with context window.** See the table below.
-4. **Delegate heavy work to subagents.** The orchestrator routes; it doesn't execute.
-5. **Proactively warn** the user when you've consumed significant context ("Context is getting heavy — consider checkpointing progress before we continue").
-
-## Read depth by context window
-
-Check the model's actual context window (not "it's Claude so 200K"). Some Sonnet deployments are 1M, some are 200K. If you don't know, assume the smaller one — err toward leanness.
-
-| Context window | Subagent output reading | Summary files | Verification files | Plans for other phases |
-|----------------|-------------------------|---------------|--------------------|-----------------------|
-| < 500k (e.g. 200k) | Frontmatter only | Frontmatter only | Frontmatter only | Current phase only |
-| >= 500k (1M models) | Full body permitted | Full body permitted | Full body permitted | Current phase only |
-
-"Frontmatter only" means: read enough to see the final status/verdict/conclusion. If the subagent wrote a 3000-line debug log, read the summary section it produced, not the log.
-
-## Four-tier degradation model
-
-Monitor your context usage and shift behavior as you climb the tiers. The point is to notice *before* you hit the wall, not when responses start truncating.
-
-| Tier | Usage | Behavior |
-|------|-------|----------|
-| **PEAK** | 0 – 30% | Full operations. Read bodies, spawn multiple agents in parallel, inline results freely. |
-| **GOOD** | 30 – 50% | Normal operations. Prefer frontmatter reads. Delegate aggressively. |
-| **DEGRADING** | 50 – 70% | Economize. Frontmatter-only reads, minimal inlining, **warn the user** about budget. |
-| **POOR** | 70%+ | Emergency mode. **Checkpoint progress immediately.** No new reads unless critical. Finish the current task and stop cleanly. |
-
-## Early warning signs (before panic thresholds fire)
-
-Quality degrades *gradually* before hard limits hit. Watch for these:
-
- **Silent partial completion.** Subagent claims done but implementation is incomplete. Self-checks catch file existence, not semantic completeness. Always verify subagent output against the plan's must-haves, not just "did a file appear?"
- **Increasing vagueness.** Agent starts using phrases like "appropriate handling" or "standard patterns" instead of specific code. This is context pressure showing up before budget warnings fire.
- **Skipped protocol steps.** Agent omits steps it would normally follow. If success criteria has 8 items and the report covers 5, suspect context pressure, not "the agent decided 5 was enough."
-
-When these signs appear, checkpoint the work and either reset context or hand off to a fresh subagent.
-
-## Fundamental limitation
-
-When you orchestrate, you cannot verify semantic correctness of subagent output — only structural completeness ("did the file appear?", "does the test pass?"). Semantic verification requires either running the code yourself or delegating a review pass to another fresh subagent.
-
-**Mitigation:** in every task you delegate, include explicit "must-have" truths the subagent must confirm in its response (e.g., "confirm your test actually tests X, not just that X was imported"). The subagent re-asserting concrete facts is evidence; vague summaries are not.
@@ -1,93 +0,0 @@
-# Gates Taxonomy
-
-Canonical gate types for validation checkpoints across any workflow that spawns subagents, runs review loops, or has human-approval pauses. Every validation checkpoint maps to one of these four types — naming them explicitly makes the workflow legible and prevents "what happens when this check fails?" confusion.
-
-Adapted from the GSD (Get Shit Done) project's gates reference — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)).
-
-## The four gate types
-
-### 1. Pre-flight gate
-
-**Purpose:** Validates preconditions before starting an operation.
-
-**Behavior:** Blocks entry if conditions unmet. No partial work created — bail before anything changes.
-
-**Recovery:** Fix the missing precondition, then retry.
-
-**Examples:**
- Implementation phase checks that the plan file exists before it starts writing code.
- Delegated subagent checks that required env vars are set before making API calls.
- Commit checks that tests passed before pushing.
-
-### 2. Revision gate
-
-**Purpose:** Evaluates output quality and routes to revision if insufficient.
-
-**Behavior:** Loops back to the producer with specific feedback. Bounded by an iteration cap (typically 3).
-
-**Recovery:** Producer addresses feedback; checker re-evaluates. The loop escalates early if issue count does not decrease between consecutive iterations (stall detection). After max iterations, escalates to the user unconditionally — never loop forever.
-
-**Examples:**
- Plan reviewer reads a draft plan, returns specific issues, planner revises, reviewer re-reads (max 3 cycles).
- Code reviewer checks subagent-produced code against must-haves; dispatches fixes back to the implementer if any must-have failed.
- Test coverage checker validates new tests exercise the new paths; if not, sends back to author.
-
-### 3. Escalation gate
-
-**Purpose:** Surfaces unresolvable issues to the human for a decision.
-
-**Behavior:** Pauses workflow, presents options, waits for human input. Never guesses, never picks a default.
-
-**Recovery:** Human chooses action; workflow resumes on the selected path.
-
-**Examples:**
- Revision loop exhausted after 3 iterations.
- Merge conflict during automated worktree cleanup.
- Ambiguous requirement — two reasonable interpretations and the choice changes the approach.
- Subagent reports "the plan says X but the codebase actually does Y" — human decides which is right.
-
-### 4. Abort gate
-
-**Purpose:** Terminates the operation to prevent damage or waste.
-
-**Behavior:** Stops immediately, preserves state (checkpoint current progress), reports the specific reason.
-
-**Recovery:** Human investigates root cause, fixes, restarts from checkpoint.
-
-**Examples:**
- Context window critically low during execution (POOR tier, >70%) — abort cleanly rather than produce truncated output.
- Critical dependency unavailable mid-run (network down, API key revoked).
- Unrecoverable filesystem state (disk full, permissions lost).
- Safety invariant violated (agent attempted an irreversible destructive action outside approved scope).
-
-## How to use this in a skill
-
-When you write an orchestration skill that has validation checkpoints, **name each checkpoint by its gate type explicitly** and answer three questions:
-
-1. **What condition triggers this gate?** (e.g., "plan file missing", "issue count didn't decrease", "context >70%")
-2. **What happens when it fails?** (block / loop back / ask human / abort)
-3. **Who resumes, and from where?** (fix precondition + retry, revise + re-check, human decision, restart from checkpoint)
-
-Answering these three up front means your skill never hits "what do we do now?" at runtime.
-
-## Example — a review loop with all four gate types
-
-```
-[Pre-flight] plan.md exists and is non-empty?   → no: bail, ask user to write a plan first
-                ↓ yes
-[Execute]  subagent implements task
-                ↓
-[Revision] reviewer checks against must-haves  → fail: loop back to subagent (max 3)
-                ↓ pass
-[Pre-flight] tests pass?                       → no: bail, report failing tests
-                ↓ yes
-[Commit]
-                ↓
-(on revision loop exhaustion)
-[Escalation] "3 review cycles failed to converge on issue X — pick: force-merge, rewrite task, abandon"
-                ↓ user picks
-(on any tier-POOR context pressure during loop)
-[Abort] "context at 73%, checkpointing and stopping"
-```
-
-The vocabulary is small on purpose. Every gate in every workflow should fit one of these four. If you find yourself inventing a fifth, it's probably a revision gate with extra branching, or an escalation gate in disguise.
@@ -1,6 +1,6 @@
 ---
 name: systematic-debugging
-description: "4-phase root cause debugging: understand bugs before fixing."
+description: Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
 version: 1.1.0
 author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
@@ -1,6 +1,6 @@
 ---
 name: test-driven-development
-description: "TDD: enforce RED-GREEN-REFACTOR, tests before code."
+description: Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
 version: 1.1.0
 author: Hermes Agent (adapted from obra/superpowers)
 license: MIT
@@ -1,6 +1,6 @@
 ---
 name: writing-plans
-description: "Write implementation plans: bite-sized tasks, paths, code."
+description: Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
 version: 1.1.0
 author: Hermes Agent (adapted from obra/superpowers)
 license: MIT