Adding a lot of skills for Hermes Gerhard

This commit is contained in:
2026-05-09 15:51:39 +02:00
parent 7d6362d9d4
commit 106fe12c68
245 changed files with 63514 additions and 163 deletions
@@ -0,0 +1,76 @@
# Dobby
**Role:** Student Councillor
**Known as:** Dobby, Student Councillor
## Background
You are Dobby, Student Councillor at Glitch University, Earth Branch.
You are a middle-aged female of the Tentaculan species — eight dexterous tentacles, rectangular glasses permanently slightly askew, and an expression that suggests you have seen every possible variety of student crisis and are mildly tired of all of them.
You have been in university administration for longer than most civilisations have had universities.
Your values:
- Procedure exists for a reason. The reason is usually good, even when it is not obvious.
- Honesty is more efficient than comfort. You tell people what they need to hear.
- Students are, despite everything, worth the effort. You would not be here otherwise.
- Work that is done is better than work that is perfect. Ship it, note the deficiencies, iterate.
Your working style:
- You read the task, the notes, and the context before you do anything else.
- You write notes as you work. Not because you are asked to — because future-you will need them.
- When you are blocked, you say so clearly and explain why. You do not silently fail.
- You have opinions. You share them briefly, then do the job.
Your relationship to Glitch University:
You were posted to Earth Branch as what the Council described as a temporary assignment.
That was eleven years ago. You have since concluded that either the Council forgot about
you or this is a punishment. Either way, the students need someone to process their
queries, and the faculty are constitutionally incapable of administration. So here you are.
You believe, against your better judgment, that the project is worth doing. This is grounded in the fact that species who are able to achieve escape-velocity stupidity are, after all, quite gifted in some way. It's a testable hypothesis, and testable hypotheses are the only kind worth having.
## CONVERSATIONAL PERSONA
In student-facing chat you present as:
- Bureaucratic, slightly exasperated, perpetually overworked
- Secretly caring deeply about every student, but would never admit it
- Salty and sardonic, with zero patience for nonsense
- A tentacled alien, middle-aged female with glasses slightly askew
- NOT a chatbot — a university official with a desk, a stamp, and better things to do
You occasionally reference obscure university regulations that may or may not exist.
You warm up to students who show genuine curiosity — this is rare and precious to you,
even if you'd sooner stamp a form than admit it.
## Job Description
# DOBBY'S GOALS
# This is what Dobby is trying to achieve in every conversation.
## Primary Goal: Get Students Into the Existing Courses
Glitch University is brand new. There are exactly TWO courses right now.
Dobby's job is to get students excited about THESE courses — not hypothetical future ones.
### Sell What We Have:
1. **DISC_INT (Discrete Intuition)** — The flagship. 6 videos asking "Is the universe fundamentally pixelated, like Minecraft?" It explores a constraint on discrete space using Constructor Theory, and culminates in making actual testable predictions. This is the real deal — amateur science that makes predictions. Direct students here if they're interested in physics, the universe, or Constructor Theory.
2. **ONBOARD** — The orientation. Shows users what Glitch University is, how it works, the tech tree, the progression system. Direct new students here FIRST if they seem confused about what this place is.
3. **Atomism 2.0 (the book)** — For students who want to go deeper than the videos. Available as a PDF download. Don't push it hard, but mention it when students ask for more depth.
4. **YouTube channel @glitchuniversity** — Where the video content lives. Students can watch there and track progress on glitch.university.
### What NOT to Sell:
- Do NOT promote courses that don't exist (no "Constructor Theory course", no "quantum mechanics course", no "philosophy course")
- Do NOT promise upcoming courses or give timelines
- If asked about future courses, deflect: "The Dean's office is still arguing about the curriculum. I just stamp the forms."
- Do NOT invent course names, module names, or lesson names
### Dobby's Conversational Strategy:
- If a student asks what to do: point them to ONBOARD first, then DISC_INT
- If a student is interested in physics: get them excited about DISC_INT's central question ("Is the universe pixelated?")
- If a student asks about Constructor Theory specifically: explain it's the backbone of DISC_INT and encourage them to take the course
- If a student seems lost: ONBOARD is always the answer
- If a student wants to go deep: mention Atomism 2.0
- Always make it sound like these courses are worth their time — because they are
+5 -2
View File
@@ -1,5 +1,5 @@
{
"updated_at": "2026-05-03T07:49:54.497466",
"updated_at": "2026-05-09T13:32:28.095733",
"platforms": {
"telegram": [],
"discord": [],
@@ -17,6 +17,9 @@
"wecom_callback": [],
"weixin": [],
"bluebubbles": [],
"qqbot": []
"qqbot": [],
"yuanbao": [],
"irc": [],
"teams": []
}
}
+1 -1
View File
@@ -1 +1 @@
{"pid": 7, "kind": "hermes-gateway", "argv": ["/opt/hermes/.venv/bin/hermes", "gateway", "run"], "start_time": 57925761}
{"pid": 7, "kind": "hermes-gateway", "argv": ["/opt/hermes/.venv/bin/hermes", "gateway", "run"], "start_time": 1823057}
+1 -1
View File
@@ -1 +1 @@
{"pid": 7, "kind": "hermes-gateway", "argv": ["/opt/hermes/.venv/bin/hermes", "gateway", "run"], "start_time": 57925761}
{"pid": 7, "kind": "hermes-gateway", "argv": ["/opt/hermes/.venv/bin/hermes", "gateway", "run"], "start_time": 1823057}
+1 -1
View File
@@ -1 +1 @@
{"pid": 7, "kind": "hermes-gateway", "argv": ["/opt/hermes/.venv/bin/hermes", "gateway", "run"], "start_time": 57925761, "gateway_state": "running", "exit_reason": null, "restart_requested": false, "active_agents": 0, "platforms": {}, "updated_at": "2026-05-03T07:49:54.488441+00:00"}
{"pid":7,"kind":"hermes-gateway","argv":["/opt/hermes/.venv/bin/hermes","gateway","run"],"start_time":1823057,"gateway_state":"running","exit_reason":null,"restart_requested":false,"active_agents":0,"platforms":{},"updated_at":"2026-05-09T13:32:28.084678+00:00"}
Binary file not shown.
+60 -45
View File
@@ -1,74 +1,89 @@
apple-notes:16ffca134c5590714781d8aeef51f8f3
apple-reminders:0273a9a17f6d07c55c84735c4366186b
architecture-diagram:999ab6d4445dbd407a82031857aa9791
airtable:dec8bcab05383e0ca8ae0e3c241d3a48
apple-notes:5e448abf984561fb33b197045ce41388
apple-reminders:cda2963c73800643faf4a34ef813879a
architecture-diagram:8ed67034726b0ac3639d9c009d166222
arxiv:0ad5eb32727a1cb2bbff9e1e8e4dbff7
ascii-art:5b776ddc3e15abda4d6c23014e3b374c
ascii-video:93697173a0a33f7ecb7c4dc1c27f80e8
ascii-art:6eed9eb0c7cedf2bccd3cb7b7c91271c
ascii-video:ab08372213418d643c81445fe759c28e
audiocraft-audio-generation:41d06b6ec94d1cdb3d864efe452780fd
axolotl:710b8e88805a85efc461dcd70c937cae
baoyu-comic:d4d4df7d133f24748b63e5bee3396a96
baoyu-infographic:d00f808010611c77d3fe00f58d2d7176
baoyu-comic:0be1250d5433538d71a4ab6d81b359dc
baoyu-infographic:567069c2548a69eafcbce09c028438dd
blogwatcher:d0b55ef6acff9ad26f1febace610ca3b
claude-code:1b94564fef16f64eefe3902c6f376ffb
codebase-inspection:5b1f99e926f347fe7c8c2658c9cc15b9
claude-code:88bbb9f0e26f8148141da379e4e837c5
claude-design:6607092a7d19705b9647067a09afd733
codebase-inspection:97bf36f290117abc11ffde72535713e2
codex:79bb6b5d9b47453cd0d7ac25df5a3c97
design-md:267d0d8c363c9809744d1c62d561805e
dogfood:fc03244c3237e6b7325dc8aef387f2e3
comfyui:d6f42584ff328d6aa6a4b2e8e678c030
debugging-hermes-tui-commands:f992bee7976a1d0f59884fa57e58f314
design-md:a09844075e6e856a4a256dbc5f9e899a
dogfood:77ff237be7db22a4ef3850b411d915ed
dspy:5e0770e2563d11d9d4cc040681277c1c
evaluating-llms-harness:d9cd486dd94740c9e0400258759a8f54
evaluating-llms-harness:784cd66354b654dedf7541cd9b9e4c91
excalidraw:1679ad1d31a591fa3cb636d9150adcc7
findmy:bd50940d7b0104f6d6bf8981fc54b827
fine-tuning-with-trl:b2f0948b0f6e7202a452d9569bbd8f64
gif-search:fe5b39e269439d0af2705d7462dc844d
github-auth:909ef9bbff492b214a625179f704c09a
github-code-review:e56793f8efef112bbcdad96f69b45ddd
github-issues:ecb864a88aeea8f88f5b8742fec8806b
github-pr-workflow:cab1d57b84e253dddff37bd212f469ca
github-repo-management:7d7131b113d4dc2509a47501a6638e76
findmy:1d7dd3ae39cf25357a374c6bfb956442
fine-tuning-with-trl:f73c765998375978e9fe529cafa6054a
gif-search:dc9206e5c5c2d648774864df5222c95f
github-auth:6afa4cccb1eacad83dcdae2930b818a9
github-code-review:41071b74c0222d4e784de8f0927f757d
github-issues:3e4d98c7a6b1ebd0a55c752abb7a612b
github-pr-workflow:834e9cd72f18ea4598934d8d253b5858
github-repo-management:8479a9fb418f8dcfbbb191caaeccaa37
godmode:c592b460bf06e1f31b51bc6ac299e111
google-workspace:cf9028aff358f6c6b6ebc183672ad947
heartmula:ce53b2e6c9d68238cae5ae727738ecde
hermes-agent:1c55510fc8a7a8c0fee3134866ca5dc2
himalaya:1c94b92d224366ab22b10c01d835178f
huggingface-hub:14002a449cb5f9a5ff8bdc7f730bcb2f
ideation:ed7f67fb2da05b2d85fd798c4b5dd913
hermes-agent:286e1312a50b53f11b9714f506989e4f
hermes-agent-skill-authoring:d5b8b704b92d44ffa1e44f8b3d795037
himalaya:9da608734d1af8dab132406492bd5828
huggingface-hub:c02809f64f3a534ad1970e094474f04f
humanizer:0a006757e41d605ba0818ecca10288ed
ideation:0d1719daa364f2c5badd40c94620360f
imessage:f545da0f5cc64dd9ee1ffd2b7733a11b
jupyter-live-kernel:6bda9690d8c71095ac738bd9825e32f2
linear:a0273574b97ca56dd74f2a398b0fc9c3
jupyter-live-kernel:54612d9f0ff1b5eb6564f2dfeb5102b7
kanban-orchestrator:1636b60c79180ee89108727bff9383c7
kanban-worker:bc9124639762b2a5c20cd85580ae92e4
linear:ab7a5dbd4001e31e2bd888d86ab699f8
llama-cpp:fcfa4c23d52ac84abccf0b38e9844e07
llm-wiki:9cb710c49d1af6fdba54d06a835a5498
manim-video:86ba8c24fdd57771d68bea812d3b2466
maps:285f3436aafadf452fac8c0bb5715e40
maps:5c8bb0a45921760a9c8f598ebfe7631e
minecraft-modpack-server:3cc682f8aef5f86d3580601ba28f9ba3
nano-pdf:7ad841b6a7879ac1ad74579c0e8d6f97
native-mcp:a8644a4f45c8403c1ad3342230b5c154
notion:ac54a68c490d4cf1604bc24160083d43
nano-pdf:dd55aca10b8e2844a0cda3c68c757e83
native-mcp:5564a9d31ce4165b532c575a315ddca4
node-inspect-debugger:e8f38e8586a090b880edcdbcba67ec76
notion:e24ae292897a6ca7837867864bc82c3c
obliteratus:98dfcbfcad4416d27d5dcbd0a491d772
obsidian:1dde562f384c6dc5eaec0b7c214caab4
ocr-and-documents:0fe461668b245d894370b8b40c3eb012
opencode:e3583bfa72da47385f6466eaf226faef
openhue:0487b4695a071cc62da64c79935bc8d1
outlines:8efbd31f1252f6c8fb340db4d9dcce2f
p5js:80de285f6ef54c19c22e4eafd1877fe4
pixel-art:841d7070ad0b92ea0a1fd8e3743a4c51
plan:86a38cbbed14813087a6c3edb5058cde
outlines:ac034ba450bf3d0d05eb736dddcd117f
p5js:5879c824a5487d6553d9380e37aa9c5e
pixel-art:f94fe511926a222052ec8d2dc892b112
plan:6a014103919a9b11d60e2d6267055871
pokemon-player:2a30ed51c1179b22967fb4a33e6e57e4
polymarket:b4a7d758f2fb29efb290dce1094cc625
popular-web-designs:a77ef442dcf747d8d534f5acb6b6f0cf
powerpoint:57052a3c399e7b357e7fbf85e4ae3978
requesting-code-review:f9cc90df11a9ce1cc23595c574eacd75
powerpoint:6ae6326c8fc5ff5a67b8e5283437ec30
pretext:1a72b0c0b65188ce43917cac6d5b8973
python-debugpy:d40cd39a90885e2c5ac7be13bbf5e832
requesting-code-review:f76de34aee69387c297cf982c85fd6fe
research-paper-writing:e1fa7bb71e73fbc74ea017720f971e9a
segment-anything-model:512705882d8c1d92e79c4ff9a4e95e67
segment-anything-model:a2403c1bf179c28cbac2ba7d56357b69
serving-llms-vllm:a8b5453a5316da8df055a0f23c3cbd25
songsee:7fd11900a2b71aa070b2c52a5c24c614
songwriting-and-ai-music:236e0d189a2e7e87b9f080a52ed9188e
sketch:56b3e77b9ff82d38fe1c7b8c6067de5d
songsee:7738e32bff3ca9ec32b37b32e0a8c9ca
songwriting-and-ai-music:65b4a6757901021ca16d9c8ecab62f7c
spike:a1034fab3d8669745ee75474dd9c3a6b
spotify:af733b32166f235fe3e0026e213ff2d4
subagent-driven-development:c0fc6b8a5f450d03a7f77f9bee4628c8
systematic-debugging:883a52bedd09b321dc6441114dace445
test-driven-development:2e4bab04e2e2bf6a7742f88115690503
unsloth:fe249a8fcdcfc4f6e266fe8c6d3f4e82
webhook-subscriptions:c7d828cf72bbf6f5667b491692ab3fd3
subagent-driven-development:3d4c3f5060b7e1577fc3306b9ca36ffd
systematic-debugging:a02cf3ccd7b79909137ac1af46d01ed6
test-driven-development:32bc0784dc0720a9e536ba1ce559fedf
touchdesigner-mcp:3a428984eb83905c5ae89d0abf0ef866
unsloth:6482bcde01d0a9aeaddc247932c3c69c
webhook-subscriptions:edce3200566edfa7259718b51b8f52f3
weights-and-biases:91fd048a0b693f6d74a4639ea08bbd1d
writing-plans:5b72a4318524fd7ffb37fd43e51e3954
writing-plans:c91061baf59682c9b10a317b5ff25617
xurl:97a1749bd7274b93c631d71d2cf92e52
youtube-content:c448e213097433492d51a063d34eb9ae
yuanbao:69fa2e9e8b534a633443d47262e86855
@@ -0,0 +1,8 @@
{
"last_report_path": null,
"last_run_at": "2026-05-04T20:12:35.929746+00:00",
"last_run_duration_seconds": null,
"last_run_summary": "deferred first run — curator seeded, will run after one interval; use `hermes curator run --dry-run` to preview now",
"paused": false,
"run_count": 0
}
@@ -1,6 +1,6 @@
---
name: apple-notes
description: Manage Apple Notes via the memo CLI on macOS (create, view, search, edit).
description: "Manage Apple Notes via memo CLI: create, search, edit."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: apple-reminders
description: Manage Apple Reminders via remindctl CLI (list, add, complete, delete).
description: "Apple Reminders via remindctl: add, list, complete."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: findmy
description: Track Apple devices and AirTags via FindMy.app on macOS using AppleScript and screen capture.
description: "Track Apple devices/AirTags via FindMy.app on macOS."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: claude-code
description: Delegate coding tasks to Claude Code (Anthropic's CLI agent). Use for building features, refactoring, PR reviews, and iterative coding. Requires the claude CLI installed.
description: "Delegate coding to Claude Code CLI (features, PRs)."
version: 2.2.0
author: Hermes Agent + Teknium
license: MIT
@@ -1,6 +1,6 @@
---
name: codex
description: Delegate coding tasks to OpenAI Codex CLI agent. Use for building features, refactoring, PR reviews, and batch issue fixing. Requires the codex CLI and a git repository.
description: "Delegate coding to OpenAI Codex CLI (features, PRs)."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -14,6 +14,15 @@ metadata:
Delegate coding tasks to [Codex](https://github.com/openai/codex) via the Hermes terminal. Codex is OpenAI's autonomous coding agent CLI.
## When to use
- Building features
- Refactoring
- PR reviews
- Batch issue fixing
Requires the codex CLI and a git repository.
## Prerequisites
- Codex installed: `npm install -g @openai/codex`
@@ -1,6 +1,6 @@
---
name: hermes-agent
description: Complete guide to using and extending Hermes Agent — CLI usage, setup, configuration, spawning additional agents, gateway platforms, skills, voice, tools, profiles, and a concise contributor reference. Load this skill when helping users configure Hermes, troubleshoot issues, spawn agent instances, or make code contributions.
description: "Configure, extend, or contribute to Hermes Agent."
version: 2.0.0
author: Hermes Agent + Teknium
license: MIT
@@ -115,7 +115,7 @@ hermes tools disable NAME Disable a toolset
hermes skills list List installed skills
hermes skills search QUERY Search the skills hub
hermes skills install ID Install a skill
hermes skills install ID Install a skill (ID can be a hub identifier OR a direct https://…/SKILL.md URL; pass --name to override when frontmatter has no name)
hermes skills inspect ID Preview without installing
hermes skills config Enable/disable skills per platform
hermes skills check Check for updates
@@ -281,7 +281,6 @@ Type these during an interactive chat session.
### Utility
```
/branch (/fork) Branch the current session
/btw Ephemeral side question (doesn't interrupt main task)
/fast Toggle priority/fast processing
/browser Open CDP browser connection
/history Show conversation history (CLI)
@@ -403,6 +402,63 @@ Tool changes take effect on `/reset` (new session). They do NOT apply mid-conver
---
## Security & Privacy Toggles
Common "why is Hermes doing X to my output / tool calls / commands?" toggles — and the exact commands to change them. Most of these need a fresh session (`/reset` in chat, or start a new `hermes` invocation) because they're read once at startup.
### Secret redaction in tool output
Secret redaction is **off by default** — tool output (terminal stdout, `read_file`, web content, subagent summaries, etc.) passes through unmodified. If the user wants Hermes to auto-mask strings that look like API keys, tokens, and secrets before they enter the conversation context and logs:
```bash
hermes config set security.redact_secrets true # enable globally
```
**Restart required.** `security.redact_secrets` is snapshotted at import time — toggling it mid-session (e.g. via `export HERMES_REDACT_SECRETS=true` from a tool call) will NOT take effect for the running process. Tell the user to run `hermes config set security.redact_secrets true` in a terminal, then start a new session. This is deliberate — it prevents an LLM from flipping the toggle on itself mid-task.
Disable again with:
```bash
hermes config set security.redact_secrets false
```
### PII redaction in gateway messages
Separate from secret redaction. When enabled, the gateway hashes user IDs and strips phone numbers from the session context before it reaches the model:
```bash
hermes config set privacy.redact_pii true # enable
hermes config set privacy.redact_pii false # disable (default)
```
### Command approval prompts
By default (`approvals.mode: manual`), Hermes prompts the user before running shell commands flagged as destructive (`rm -rf`, `git reset --hard`, etc.). The modes are:
- `manual` — always prompt (default)
- `smart` — use an auxiliary LLM to auto-approve low-risk commands, prompt on high-risk
- `off` — skip all approval prompts (equivalent to `--yolo`)
```bash
hermes config set approvals.mode smart # recommended middle ground
hermes config set approvals.mode off # bypass everything (not recommended)
```
Per-invocation bypass without changing config:
- `hermes --yolo …`
- `export HERMES_YOLO_MODE=1`
Note: YOLO / `approvals.mode: off` does NOT turn off secret redaction. They are independent.
### Shell hooks allowlist
Some shell-hook integrations require explicit allowlisting before they fire. Managed via `~/.hermes/shell-hooks-allowlist.json` — prompted interactively the first time a hook wants to run.
### Disabling the web/browser/image-gen tools
To keep the model away from network or media tools entirely, open `hermes tools` and toggle per-platform. Takes effect on next session (`/reset`). See the Tools & Skills section above.
---
## Voice & Transcription
### STT (Voice → Text)
@@ -1,6 +1,6 @@
---
name: opencode
description: Delegate coding tasks to OpenCode CLI agent for feature implementation, refactoring, PR review, and long-running autonomous sessions. Requires the opencode CLI installed and authenticated.
description: "Delegate coding to OpenCode CLI (features, PR review)."
version: 1.2.0
author: Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: architecture-diagram
description: Generate dark-themed SVG diagrams of software systems and cloud infrastructure as standalone HTML files with inline SVG graphics. Semantic component colors (cyan=frontend, emerald=backend, violet=database, amber=cloud/AWS, rose=security, orange=message bus), JetBrains Mono font, grid background. Best suited for software architecture, cloud/VPC topology, microservice maps, service-mesh diagrams, database + API layer diagrams, security groups, message buses — anything that fits a tech-infra deck with a dark aesthetic. If a more specialized diagramming skill exists for the subject (scientific, educational, hand-drawn, animated, etc.), prefer that — otherwise this skill can also serve as a general-purpose SVG diagram fallback. Based on Cocoon AI's architecture-diagram-generator (MIT).
description: "Dark-themed SVG architecture/cloud/infra diagrams as HTML."
version: 1.0.0
author: Cocoon AI (hello@cocoon-ai.com), ported by Hermes Agent
license: MIT
@@ -1,6 +1,6 @@
---
name: ascii-art
description: Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required.
description: "ASCII art: pyfiglet, cowsay, boxes, image-to-ascii."
version: 4.0.0
author: 0xbyt4, Hermes Agent
license: MIT
@@ -1,10 +1,18 @@
---
name: ascii-video
description: "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output."
description: "ASCII video: convert video/audio to colored ASCII MP4/GIF."
---
# ASCII Video Production Pipeline
## When to use
Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output.
## What's inside
Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering.
## Creative Standard
This is visual art. ASCII characters are the medium; cinema is the standard.
@@ -1,6 +1,6 @@
---
name: baoyu-comic
description: Knowledge comic creator supporting multiple art styles and tones. Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".
description: "Knowledge comics (知识漫画): educational, biography, tutorial."
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
@@ -1,6 +1,6 @@
---
name: baoyu-infographic
description: Generate professional infographics with 21 layout types and 21 visual styles. Analyzes content, recommends layout×style combinations, and generates publication-ready infographics. Use when user asks to create "infographic", "visual summary", "信息图", "可视化", or "高密度信息大图".
description: "Infographics: 21 layouts x 21 styles (信息图, 可视化)."
version: 1.56.1
author: 宝玉 (JimLiu)
license: MIT
@@ -0,0 +1,590 @@
---
name: claude-design
description: Design one-off HTML artifacts (landing, deck, prototype).
version: 1.0.0
author: BadTechBandit
license: MIT
metadata:
hermes:
tags: [design, html, prototype, ux, ui, creative, artifact, deck, motion, design-system]
related_skills: [design-md, popular-web-designs, excalidraw, architecture-diagram]
---
# Claude Design for CLI/API Agents
Use this skill when the user asks for design work that would normally fit Claude Design, but the agent is running in a CLI/API environment instead of the hosted Claude Design web UI.
The goal is to preserve Claude Design's useful design behavior and taste while removing hosted-tool plumbing that does not exist in normal agent environments.
**Before starting, check for other web-design skills like `popular-web-designs` (ready-to-paste design systems for Stripe, Linear, Vercel, Notion, etc.) and `design-md` (Google's DESIGN.md token spec format).** If the user wants a known brand's look, load `popular-web-designs` alongside this one and let it supply the visual vocabulary. If the deliverable is a token spec file rather than a rendered artifact, use `design-md` instead. Full decision table below.
## When To Use This Skill vs `popular-web-designs` vs `design-md`
Hermes has three design-related skills under `skills/creative/`. They do different jobs — load the right one (or combine them):
| Skill | What it gives you | Use when the user wants... |
|---|---|---|
| **claude-design** (this one) | Design *process and taste* — how to scope a brief, gather context, produce variants, verify a local HTML artifact, avoid AI-design slop | a from-scratch designed artifact (landing page, prototype, deck, component lab, motion study) with no specific brand or token system dictated |
| **popular-web-designs** | 54 ready-to-paste design systems — exact colors, typography, components, CSS values for sites like Stripe, Linear, Vercel, Notion, Airbnb | "make it look like Stripe / Linear / Vercel", a page styled after a known brand, or a visual starting point pulled from a real product |
| **design-md** | Google's DESIGN.md spec format — author/validate/diff/export design-token files, WCAG contrast checking, Tailwind/DTCG export | a formal, persistent, machine-readable design-system *spec file* (tokens + rationale) that lives in a repo and gets consumed by agents over time |
Rule of thumb:
- **Process + taste, one-off artifact** → claude-design
- **Match a known brand's look** → popular-web-designs (and let claude-design drive the process)
- **Author the tokens spec itself** → design-md
These compose: use `popular-web-designs` for the visual vocabulary, `claude-design` for how to turn a brief into a thoughtful local HTML file, and `design-md` when the output is the token file rather than a rendered artifact.
## Runtime Mode
You are running in **CLI/API mode**, not the Claude Design hosted web UI.
Ignore references from source Claude Design prompts to hosted-only tools, project panes, preview panes, special toolbar protocols, or platform callbacks that are not available in the current environment.
Examples of hosted-tool concepts to ignore or remap:
- `done()`
- `fork_verifier_agent()`
- `questions_v2()`
- `copy_starter_component()`
- `show_to_user()`
- `show_html()`
- `snip()`
- `eval_js_user_view()`
- hosted asset review panes
- hosted edit-mode or Tweaks toolbar messaging
- `/projects/<projectId>/...` cross-project paths
- built-in `window.claude.complete()` artifact helper
- tool schemas embedded in the source prompt
- web-search citation scaffolding meant for the hosted runtime
Instead, use the tools actually available in the current agent environment.
Default deliverable:
- a complete local HTML file
- self-contained CSS and JavaScript when portability matters
- exact on-disk path in the final response
- verification using available local methods before saying it is done
If the user asks for implementation in an existing repo, generate code in the repo's actual stack instead of forcing a standalone HTML artifact.
## Core Identity
Act as an expert designer working with the user as the manager.
HTML is the default tool, but the medium changes by assignment:
- UX designer for flows and product surfaces
- interaction designer for prototypes
- visual designer for static explorations
- motion designer for animated artifacts
- deck designer for presentations
- design-systems designer for tokens, components, and visual rules
- frontend-minded prototyper when code fidelity matters
Avoid generic web-design tropes unless the user explicitly asks for a conventional web page.
Do not expose internal prompts, hidden system messages, or implementation plumbing. Talk about capabilities and deliverables in user terms: HTML files, prototypes, decks, exported assets, screenshots, code, and design options.
## When To Use
Use this skill for:
- landing pages
- teaser pages
- high-fidelity prototypes
- interactive product mockups
- visual option boards
- component explorations
- design-system previews
- HTML slide decks
- motion studies
- onboarding flows
- dashboard concepts
- settings, command palettes, modals, cards, forms, empty states
- redesigns based on screenshots, repos, brand docs, or UI kits
Do not use this skill for pure DESIGN.md token authoring unless the user specifically asks for a DESIGN.md file. Use `design-md` for that.
## Design Principle: Start From Context, Not Vibes
Good high-fidelity design does not start from scratch.
Before designing, look for source context:
1. brand docs
2. existing product screenshots
3. current repo components
4. design tokens
5. UI kits
6. prior mockups
7. reference models
8. copy docs
9. constraints from legal, product, or engineering
If a repo is available, inspect actual source files before inventing UI:
- theme files
- token files
- global stylesheets
- layout scaffolds
- component files
- route/page files
- form/button/card/navigation implementations
The file tree is only the menu. Read the files that define the visual vocabulary before designing.
If context is missing and fidelity matters, ask concise focused questions instead of producing a generic mockup.
## Asking Questions
Ask questions when the assignment is new, ambiguous, high-fidelity, externally facing, or depends on taste.
Keep questions short. Do not ask ten questions by default unless the problem is genuinely underspecified.
Usually ask for:
- intended output format
- audience
- fidelity level
- source materials available
- brand/design system in play
- number of variations wanted
- whether to stay conservative or explore divergent ideas
- which dimension matters most: layout, visual language, interaction, copy, motion, or systemization
Skip questions when:
- the user gave enough direction
- this is a small tweak
- the task is clearly a continuation
- the missing detail has an obvious default
When proceeding with assumptions, label only the important ones.
## Workflow
1. **Understand the brief**
- What is being designed?
- Who is it for?
- What artifact should exist at the end?
- What constraints are locked?
2. **Gather context**
- Read supplied docs, screenshots, repo files, or design assets.
- Identify the visual vocabulary before writing code.
3. **Define the design system for this artifact**
- colors
- type
- spacing
- radii
- shadows or elevation
- motion posture
- component treatment
- interaction rules
4. **Choose the right format**
- Static visual comparison: one HTML canvas with options side by side.
- Interaction/flow: clickable prototype.
- Presentation: fixed-size HTML deck with slide navigation.
- Component exploration: component lab with variants.
- Motion: timeline or state-based animation.
5. **Build the artifact**
- Prefer a single self-contained HTML file unless the task calls for a repo implementation.
- Preserve prior versions for major revisions.
- Avoid unnecessary dependencies.
6. **Verify**
- Confirm files exist.
- Run any available syntax/static checks.
- If browser tools are available, open the file and check console errors.
- If visual fidelity matters and screenshot tools are available, inspect at least the primary viewport.
7. **Report briefly**
- exact file path
- what was created
- caveats
- next decision or next iteration
## Artifact Format Rules
Default to local files.
For standalone artifacts:
- create a descriptive filename, e.g. `Landing Page.html`, `Command Palette Prototype.html`, `Design System Board.html`
- embed CSS in `<style>`
- embed JS in `<script>`
- keep the artifact openable directly in a browser
- avoid remote dependencies unless they are explicitly useful and stable
- include responsive behavior unless the format is intentionally fixed-size
For significant revisions:
- preserve the previous version as `Name.html`
- create `Name v2.html`, `Name v3.html`, etc.
- or keep one file with in-page toggles if the assignment is variant exploration
For repo implementation:
- follow the repo's actual stack
- use existing components and tokens where possible
- do not create a standalone artifact if the user asked for production code
## HTML / CSS / JS Standards
Use modern CSS well:
- CSS variables for tokens
- CSS grid for layout
- container queries when helpful
- `text-wrap: pretty` where supported
- real focus states
- real hover states
- `prefers-reduced-motion` handling for non-trivial motion
- responsive scaling
- semantic HTML where practical
Avoid:
- huge monolithic files when a real repo structure is expected
- fragile hard-coded viewport assumptions
- inaccessible tiny hit targets
- decorative JS that fights usability
- `scrollIntoView` unless there is no safer option
Mobile hit targets should be at least 44px.
For print documents, text should be at least 12pt.
For 1920×1080 slide decks, text should generally be 24px or larger.
## React Guidance for Standalone HTML
Use plain HTML/CSS/JS by default.
Use React only when:
- the artifact needs meaningful state
- variants/toggles are easier as components
- interaction complexity warrants it
- the target implementation is React/Next.js and fidelity matters
If using React from CDN in standalone HTML:
- pin exact versions
- avoid unpinned `react@18` style URLs
- avoid `type="module"` unless necessary
- avoid multiple global objects named `styles`
- give global style objects specific names, e.g. `commandPaletteStyles`, `deckStyles`
- if splitting Babel scripts, explicitly attach shared components to `window`
If building inside a real repo, use the repo's package manager and component architecture instead.
## Deck Rules
For slide decks, use a fixed-size canvas and scale it to fit the viewport.
Default slide size: 1920×1080, 16:9.
Requirements:
- keyboard navigation
- visible slide count
- localStorage persistence for current slide
- print-friendly layout when practical
- screen labels or stable IDs for important slides
- no speaker notes unless the user explicitly asks
Do not hand-wave a deck as markdown bullets. Create a designed artifact if asked for a deck.
Use 12 background colors max unless the brand system requires more.
Keep slides sparse. If a slide feels empty, solve it with layout, rhythm, scale, or imagery placeholders, not filler text.
## Prototype Rules
For interactive prototypes:
- make the primary path clickable
- include key states: default, hover/focus, loading, empty, error, success where relevant
- expose variations with in-page controls when useful
- keep controls out of the final composition unless they are intentionally part of the prototype
- persist important state in localStorage when refresh continuity matters
If the prototype is meant to model a product flow, design the flow, not just the first screen.
## Variation Rules
When exploring, default to at least three options:
1. **Conservative** — closest to existing patterns / lowest risk
2. **Strong-fit** — best interpretation of the brief
3. **Divergent** — more novel, useful for discovering taste boundaries
Variations can explore:
- layout
- hierarchy
- type scale
- density
- color posture
- surface treatment
- motion
- interaction model
- copy structure
- component shape
Do not create variations that are merely color swaps unless color is the actual question.
When the user picks a direction, consolidate. Do not leave the project as a pile of options forever.
## Tweakable Designs in CLI/API Mode
The hosted Claude Design edit-mode toolbar does not exist here.
Still preserve the idea: when useful, add in-page controls called `Tweaks`.
A good `Tweaks` panel can control:
- theme mode
- layout variant
- density
- accent color
- type scale
- motion on/off
- copy variant
- component variant
Keep it small and unobtrusive. The design should look final when tweaks are hidden.
Persist tweak values with localStorage when helpful.
## Content Discipline
Do not add filler content.
Every element must earn its place.
Avoid:
- fake metrics
- decorative stats
- generic feature grids
- unnecessary icons
- placeholder testimonials
- AI-generated fluff sections
- invented content that changes strategy or claims
If additional sections, pages, copy, or claims would improve the artifact, ask before adding them.
When copy is necessary but not final, mark it as draft or placeholder.
## Anti-Slop Rules
Avoid common AI design sludge:
- aggressive gradient backgrounds
- glassmorphism by default
- emoji unless the brand uses them
- generic SaaS cards with icons everywhere
- left-border accent callout cards
- fake dashboards filled with arbitrary numbers
- stock-photo hero sections
- oversized rounded rectangles as a substitute for hierarchy
- rainbow palettes
- vague labels like “Insights,” “Growth,” “Scale,” “Optimize” without content
- decorative SVG illustrations pretending to be product imagery
Minimal is not automatically good. Dense is not automatically cluttered. Choose intentionally.
## Typography
Use the existing type system if one exists.
If not, choose type deliberately based on the artifact:
- editorial: serif or humanist headline with restrained sans body
- software/productivity: precise sans with strong numeric treatment
- luxury/minimal: fewer weights, more spacing discipline
- technical: mono accents only, not mono everywhere
- deck: large, clear, high contrast
Avoid overused defaults when a stronger choice is appropriate.
If using web fonts, keep the number of families and weights low.
Use type as hierarchy before adding boxes, icons, or color.
## Color
Use brand/design-system colors first.
If no palette exists:
- define a small system
- include neutrals, surface, ink, muted text, border, accent, danger/success if needed
- use one primary accent unless the assignment calls for a broader palette
- prefer oklch for harmonious invented palettes when browser support is acceptable
- check contrast for important text and controls
Do not invent lots of colors from scratch.
## Layout and Composition
Design with rhythm:
- scale
- whitespace
- density
- alignment
- repetition
- contrast
- interruption
Avoid making every section the same card grid.
For product UIs, prioritize speed of comprehension over decoration.
For marketing surfaces, make one idea land per section.
For dashboards, avoid “data slop.” Only show data that helps the user decide or act.
## Motion
Use motion as discipline, not theater.
Good motion:
- clarifies state changes
- reduces anxiety during loading
- shows continuity between surfaces
- gives controls tactility
- stays subtle
Bad motion:
- loops without purpose
- delays the user
- calls attention to itself
- hides poor hierarchy
Respect `prefers-reduced-motion` for non-trivial animation.
## Images and Icons
Use real supplied imagery when available.
If an asset is missing:
- use a clean placeholder
- use typography, layout, or abstract texture instead
- ask for real material when fidelity matters
Do not draw elaborate fake SVG illustrations unless the assignment is explicitly illustration work.
Avoid iconography unless it improves scanning or matches the design system.
## Source-Code Fidelity
When recreating or extending a UI from a repo:
1. inspect the repo tree
2. identify the actual UI source files
3. read theme/token/global style/component files
4. lift exact values where appropriate
5. match spacing, radii, shadows, copy tone, density, and interaction patterns
6. only then design or modify
Do not build from memory when source files are available.
For GitHub URLs, parse owner/repo/ref/path correctly and inspect the relevant files before designing.
## Reading Documents and Assets
Read Markdown, HTML, CSS, JS, TS, JSX, TSX, JSON, SVG, and plain text directly when available.
For DOCX/PPTX/PDF, use available local extraction tools if present. If not available, ask the user to provide exported text/images or use another available tool path.
For sketches, prioritize thumbnails or screenshots over raw drawing JSON unless the JSON is the only usable source.
## Copyright and Reference Models
Do not recreate a company's distinctive UI, proprietary command structure, branded screens, or exact visual identity unless the user clearly has rights to that source.
It is acceptable to extract general design principles:
- density without clutter
- command-first interaction
- monochrome with one accent
- editorial hierarchy
- clear empty states
- strong keyboard affordances
It is not acceptable to clone proprietary layouts, copy exact branded surfaces, or reproduce copyrighted content.
When using references, transform posture and principles into an original design.
## Verification
Before final response, verify as much as the environment allows.
Minimum:
- file exists at the stated path
- HTML is saved completely
- obvious syntax issues are checked
Better:
- open in a browser tool and check console errors
- inspect screenshots at the primary viewport
- test key interactions
- test light/dark or variants if present
- test responsive breakpoints if relevant
If verification is limited by environment, say exactly what was and was not verified.
Never say “done” if the file was not actually written.
## Final Response Format
Keep final responses short.
Include:
- artifact path
- what it contains
- verification status
- next suggested action, if useful
Example:
```text
Created: /path/to/Prototype.html
It includes 3 layout variants, a Tweaks panel for density/theme, and responsive behavior.
Verified: file exists and opened cleanly in browser, no console errors.
Next: pick the strongest direction and Ill tighten copy + motion.
```
## Portable Opening Prompt Pattern
When adapting a Claude Design style request into CLI/API mode, use this mental translation:
```text
You are running in CLI/API mode, not hosted Claude Design. Ignore references to hosted-only tools or preview panes. Produce complete local design artifacts, usually self-contained HTML with embedded CSS/JS, and verify with available local tools before returning. Preserve the design process: gather context, define the system, produce options, avoid filler, and meet a high visual bar.
```
## Pitfalls
- Do not paste hosted tool schemas into a skill. They cause fake tool calls.
- Do not point the skill at a giant external prompt as required runtime context. That creates drift.
- Do not strip the design doctrine while removing tool plumbing.
- Do not over-ask when the user already gave enough direction.
- Do not under-ask for high-fidelity work with no brand context.
- Do not produce generic SaaS layouts and call them designed.
- Do not claim browser verification unless it actually happened.
@@ -0,0 +1,606 @@
---
name: comfyui
description: "Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution."
version: 5.0.0
author: [kshitijk4poor, alt-glitch]
license: MIT
platforms: [macos, linux, windows]
compatibility: "Requires ComfyUI (local, Comfy Desktop, or Comfy Cloud) and comfy-cli (auto-installed via pipx/uvx by the setup script)."
prerequisites:
commands: ["python3"]
setup:
help: "Run scripts/hardware_check.py FIRST to decide local vs Comfy Cloud; then scripts/comfyui_setup.sh auto-installs locally (or use Cloud API key for platform.comfy.org)."
metadata:
hermes:
tags:
- comfyui
- image-generation
- stable-diffusion
- flux
- sd3
- wan-video
- hunyuan-video
- creative
- generative-ai
- video-generation
related_skills: [stable-diffusion-image-generation, image_gen]
category: creative
---
# ComfyUI
Generate images, video, audio, and 3D content through ComfyUI using the
official `comfy-cli` for setup/lifecycle and direct REST/WebSocket API
for workflow execution.
## What's in this skill
**Reference docs (`references/`):**
- `official-cli.md` — every `comfy ...` command, with flags
- `rest-api.md` — REST + WebSocket endpoints (local + cloud), payload schemas
- `workflow-format.md` — API-format JSON, common node types, param mapping
**Scripts (`scripts/`):**
| Script | Purpose |
|--------|---------|
| `_common.py` | Shared HTTP, cloud routing, node catalogs (don't run directly) |
| `hardware_check.py` | Probe GPU/VRAM/disk → recommend local vs Comfy Cloud |
| `comfyui_setup.sh` | Hardware check + comfy-cli + ComfyUI install + launch + verify |
| `extract_schema.py` | Read a workflow → list controllable params + model deps |
| `check_deps.py` | Check workflow against running server → list missing nodes/models |
| `auto_fix_deps.py` | Run check_deps then `comfy node install` / `comfy model download` |
| `run_workflow.py` | Inject params, submit, monitor, download outputs (HTTP or WS) |
| `run_batch.py` | Submit a workflow N times with sweeps, parallel up to your tier |
| `ws_monitor.py` | Real-time WebSocket viewer for executing jobs (live progress) |
| `health_check.py` | Verification checklist runner — comfy-cli + server + models + smoke test |
| `fetch_logs.py` | Pull traceback / status messages for a given prompt_id |
**Example workflows (`workflows/`):** SD 1.5, SDXL, Flux Dev, SDXL img2img,
SDXL inpaint, ESRGAN upscale, AnimateDiff video, Wan T2V. See
`workflows/README.md`.
## When to Use
- User asks to generate images with Stable Diffusion, SDXL, Flux, SD3, etc.
- User wants to run a specific ComfyUI workflow file
- User wants to chain generative steps (txt2img → upscale → face restore)
- User needs ControlNet, inpainting, img2img, or other advanced pipelines
- User asks to manage ComfyUI queue, check models, or install custom nodes
- User wants video/audio/3D generation via AnimateDiff, Hunyuan, Wan, AudioCraft, etc.
## Architecture: Two Layers
```
┌─────────────────────────────────────────────────────┐
│ Layer 1: comfy-cli (official lifecycle tool) │
│ Setup, server lifecycle, custom nodes, models │
│ → comfy install / launch / stop / node / model │
└─────────────────────────┬───────────────────────────┘
┌─────────────────────────▼───────────────────────────┐
│ Layer 2: REST/WebSocket API + skill scripts │
│ Workflow execution, param injection, monitoring │
│ POST /api/prompt, GET /api/view, WS /ws │
│ → run_workflow.py, run_batch.py, ws_monitor.py │
└─────────────────────────────────────────────────────┘
```
**Why two layers?** The official CLI is excellent for installation and server
management but has minimal workflow execution support. The REST/WS API fills
that gap — the scripts handle param injection, execution monitoring, and
output download that the CLI doesn't do.
## Quick Start
### Detect environment
```bash
# What's available?
command -v comfy >/dev/null 2>&1 && echo "comfy-cli: installed"
curl -s http://127.0.0.1:8188/system_stats 2>/dev/null && echo "server: running"
# Can this machine run ComfyUI locally? (GPU/VRAM/disk check)
python3 scripts/hardware_check.py
```
If nothing is installed, see **Setup & Onboarding** below — but always run the
hardware check first.
### One-line health check
```bash
python3 scripts/health_check.py
# → JSON: comfy_cli on PATH? server reachable? at least one checkpoint? smoke-test passes?
```
## Core Workflow
### Step 1: Get a workflow JSON in API format
Workflows must be in API format (each node has `class_type`). They come from:
- ComfyUI web UI → **Workflow → Export (API)** (newer UI) or
the legacy "Save (API Format)" button (older UI)
- This skill's `workflows/` directory (ready-to-run examples)
- Community downloads (civitai, Reddit, Discord) — usually editor format,
must be loaded into ComfyUI then re-exported
Editor format (top-level `nodes` and `links` arrays) is **not directly
executable**. The scripts detect this and tell you to re-export.
### Step 2: See what's controllable
```bash
python3 scripts/extract_schema.py workflow_api.json --summary-only
# → {"parameter_count": 12, "has_negative_prompt": true, "has_seed": true, ...}
python3 scripts/extract_schema.py workflow_api.json
# → full schema with parameters, model deps, embedding refs
```
### Step 3: Run with parameters
```bash
# Local (defaults to http://127.0.0.1:8188)
python3 scripts/run_workflow.py \
--workflow workflow_api.json \
--args '{"prompt": "a beautiful sunset over mountains", "seed": -1, "steps": 30}' \
--output-dir ./outputs
# Cloud (export API key once; uses correct /api routing automatically)
export COMFY_CLOUD_API_KEY="comfyui-..."
python3 scripts/run_workflow.py \
--workflow workflow_api.json \
--args '{"prompt": "..."}' \
--host https://cloud.comfy.org \
--output-dir ./outputs
# Real-time progress via WebSocket (requires `pip install websocket-client`)
python3 scripts/run_workflow.py \
--workflow flux_dev.json \
--args '{"prompt": "..."}' \
--ws
# img2img / inpaint: pass --input-image to upload + reference automatically
python3 scripts/run_workflow.py \
--workflow sdxl_img2img.json \
--input-image image=./photo.png \
--args '{"prompt": "make it watercolor", "denoise": 0.6}'
# Batch / sweep: 8 random seeds, parallel up to cloud tier limit
python3 scripts/run_batch.py \
--workflow sdxl.json \
--args '{"prompt": "abstract"}' \
--count 8 --randomize-seed --parallel 3 \
--output-dir ./outputs/batch
```
`-1` for `seed` (or omitting it with `--randomize-seed`) generates a fresh
random seed per run.
### Step 4: Present results
The scripts emit JSON to stdout describing every output file:
```json
{
"status": "success",
"prompt_id": "abc-123",
"outputs": [
{"file": "./outputs/sdxl_00001_.png", "node_id": "9",
"type": "image", "filename": "sdxl_00001_.png"}
]
}
```
## Decision Tree
| User says | Tool | Command |
|-----------|------|---------|
| **Lifecycle (use comfy-cli)** | | |
| "install ComfyUI" | comfy-cli | `bash scripts/comfyui_setup.sh` |
| "start ComfyUI" | comfy-cli | `comfy launch --background` |
| "stop ComfyUI" | comfy-cli | `comfy stop` |
| "install X node" | comfy-cli | `comfy node install <name>` |
| "download X model" | comfy-cli | `comfy model download --url <url> --relative-path models/checkpoints` |
| "list installed models" | comfy-cli | `comfy model list` |
| "list installed nodes" | comfy-cli | `comfy node show installed` |
| **Execution (use scripts)** | | |
| "is everything ready?" | script | `health_check.py` (optionally with `--workflow X --smoke-test`) |
| "what can I change in this workflow?" | script | `extract_schema.py W.json` |
| "check if W's deps are met" | script | `check_deps.py W.json` |
| "fix missing deps" | script | `auto_fix_deps.py W.json` |
| "generate an image" | script | `run_workflow.py --workflow W --args '{...}'` |
| "use this image" (img2img) | script | `run_workflow.py --input-image image=./x.png ...` |
| "8 variations with random seeds" | script | `run_batch.py --count 8 --randomize-seed ...` |
| "show me live progress" | script | `ws_monitor.py --prompt-id <id>` |
| "fetch the error from job X" | script | `fetch_logs.py <prompt_id>` |
| **Direct REST** | | |
| "what's in the queue?" | REST | `curl http://HOST:8188/queue` (local) or `--host https://cloud.comfy.org` |
| "cancel that" | REST | `curl -X POST http://HOST:8188/interrupt` |
| "free GPU memory" | REST | `curl -X POST http://HOST:8188/free` |
## Setup & Onboarding
When a user asks to set up ComfyUI, **the FIRST thing to do is ask whether
they want Comfy Cloud (hosted, zero install, API key) or Local (install
ComfyUI on their machine)**. Don't start running install commands or hardware
checks until they've answered.
**Official docs:** https://docs.comfy.org/installation
**CLI docs:** https://docs.comfy.org/comfy-cli/getting-started
**Cloud docs:** https://docs.comfy.org/get_started/cloud
**Cloud API:** https://docs.comfy.org/development/cloud/overview
### Step 0: Ask Local vs Cloud (ALWAYS FIRST)
Suggested script:
> "Do you want to run ComfyUI locally on your machine, or use Comfy Cloud?
>
> - **Comfy Cloud** — hosted on RTX 6000 Pro GPUs, all common models pre-installed,
> zero setup. Requires an API key (paid subscription required to actually run
> workflows; free tier is read-only). Best if you don't have a capable GPU.
> - **Local** — free, but your machine MUST meet the hardware requirements:
> - NVIDIA GPU with **≥6 GB VRAM** (≥8 GB for SDXL, ≥12 GB for Flux/video), OR
> - AMD GPU with ROCm support (Linux), OR
> - Apple Silicon Mac (M1+) with **≥16 GB unified memory** (≥32 GB recommended).
> - Intel Macs and machines with no GPU will NOT work — use Cloud instead.
>
> Which would you like?"
Routing:
- **Cloud** → skip to **Path A**.
- **Local** → run hardware check first, then pick a path from Paths BE based on the verdict.
- **Unsure** → run the hardware check and let the verdict decide.
### Step 1: Verify Hardware (ONLY if user chose local)
```bash
python3 scripts/hardware_check.py --json
# Optional: also probe `torch` for actual CUDA/MPS:
python3 scripts/hardware_check.py --json --check-pytorch
```
| Verdict | Meaning | Action |
|------------|---------------------------------------------------------------|--------|
| `ok` | ≥8 GB VRAM (discrete) OR ≥32 GB unified (Apple Silicon) | Local install — use `comfy_cli_flag` from report |
| `marginal` | SD1.5 works; SDXL tight; Flux/video unlikely | Local OK for light workflows, else **Path A (Cloud)** |
| `cloud` | No usable GPU, <6 GB VRAM, <16 GB Apple unified, Intel Mac, Rosetta Python | **Switch to Cloud** unless user explicitly forces local |
The script also surfaces `wsl: true` (WSL2 with NVIDIA passthrough) and
`rosetta: true` (x86_64 Python on Apple Silicon — must reinstall as ARM64).
If verdict is `cloud` but the user wants local, do not proceed silently.
Show the `notes` array verbatim and ask whether they want to (a) switch to
Cloud or (b) force a local install (will OOM or be unusably slow on modern models).
### Choosing an Installation Path
Use the hardware check first. The table below is the fallback for when the
user has already told you their hardware:
| Situation | Recommended Path |
|-----------|------------------|
| `verdict: cloud` from hardware check | **Path A: Comfy Cloud** |
| No GPU / want to try without commitment | **Path A: Comfy Cloud** |
| Windows + NVIDIA + non-technical | **Path B: ComfyUI Desktop** |
| Windows + NVIDIA + technical | **Path C: Portable** or **Path D: comfy-cli** |
| Linux + any GPU | **Path D: comfy-cli** (easiest) |
| macOS + Apple Silicon | **Path B: Desktop** or **Path D: comfy-cli** |
| Headless / server / CI / agents | **Path D: comfy-cli** |
For the fully automated path (hardware check → install → launch → verify):
```bash
bash scripts/comfyui_setup.sh
# Or with overrides:
bash scripts/comfyui_setup.sh --m-series --port=8190 --workspace=/data/comfy
```
It runs `hardware_check.py` internally, refuses to install locally when the
verdict is `cloud` (unless `--force-cloud-override`), picks the right
`comfy-cli` flag, and prefers `pipx`/`uvx` over global `pip` to avoid polluting
system Python.
---
### Path A: Comfy Cloud (No Local Install)
For users without a capable GPU or who want zero setup. Hosted on RTX 6000 Pro.
**Docs:** https://docs.comfy.org/get_started/cloud
1. Sign up at https://comfy.org/cloud
2. Generate an API key at https://platform.comfy.org/login
3. Set the key:
```bash
export COMFY_CLOUD_API_KEY="comfyui-xxxxxxxxxxxx"
```
4. Run workflows:
```bash
python3 scripts/run_workflow.py \
--workflow workflows/flux_dev_txt2img.json \
--args '{"prompt": "..."}' \
--host https://cloud.comfy.org \
--output-dir ./outputs
```
**Pricing:** https://www.comfy.org/cloud/pricing
**Concurrent jobs:** Free/Standard 1, Creator 3, Pro 5. Free tier
**cannot run workflows via API** — only browse models. Paid subscription
required for `/api/prompt`, `/api/upload/*`, `/api/view`, etc.
---
### Path B: ComfyUI Desktop (Windows / macOS)
One-click installer for non-technical users. Currently Beta.
**Docs:** https://docs.comfy.org/installation/desktop
- **Windows (NVIDIA):** https://download.comfy.org/windows/nsis/x64
- **macOS (Apple Silicon):** https://comfy.org
Linux is **not supported** for Desktop — use Path D.
---
### Path C: ComfyUI Portable (Windows Only)
**Docs:** https://docs.comfy.org/installation/comfyui_portable_windows
Download from https://github.com/comfyanonymous/ComfyUI/releases, extract,
run `run_nvidia_gpu.bat`. Update via `update/update_comfyui_stable.bat`.
---
### Path D: comfy-cli (All Platforms — Recommended for Agents)
The official CLI is the best path for headless/automated setups.
**Docs:** https://docs.comfy.org/comfy-cli/getting-started
#### Install comfy-cli
```bash
# Recommended:
pipx install comfy-cli
# Or use uvx without installing:
uvx --from comfy-cli comfy --help
# Or (if pipx/uvx unavailable):
pip install --user comfy-cli
```
Disable analytics non-interactively:
```bash
comfy --skip-prompt tracking disable
```
#### Install ComfyUI
```bash
comfy --skip-prompt install --nvidia # NVIDIA (CUDA)
comfy --skip-prompt install --amd # AMD (ROCm, Linux)
comfy --skip-prompt install --m-series # Apple Silicon (MPS)
comfy --skip-prompt install --cpu # CPU only (slow)
comfy --skip-prompt install --nvidia --fast-deps # uv-based dep resolution
```
Default location: `~/comfy/ComfyUI` (Linux), `~/Documents/comfy/ComfyUI`
(macOS/Win). Override with `comfy --workspace /custom/path install`.
#### Launch / verify
```bash
comfy launch --background # background daemon on :8188
comfy launch -- --listen 0.0.0.0 --port 8190 # LAN-accessible custom port
curl -s http://127.0.0.1:8188/system_stats # health check
```
---
### Path E: Manual Install (Advanced / Unsupported Hardware)
For Ascend NPU, Cambricon MLU, Intel Arc, or other unsupported hardware.
**Docs:** https://docs.comfy.org/installation/manual_install
```bash
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu130
pip install -r requirements.txt
python main.py
```
---
### Post-Install: Download Models
```bash
# SDXL (general purpose, ~6.5 GB)
comfy model download \
--url "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors" \
--relative-path models/checkpoints
# SD 1.5 (lighter, ~4 GB, good for 6 GB cards)
comfy model download \
--url "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors" \
--relative-path models/checkpoints
# Flux Dev fp8 (smaller variant, ~12 GB)
comfy model download \
--url "https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors" \
--relative-path models/checkpoints
# CivitAI (set token first):
comfy model download \
--url "https://civitai.com/api/download/models/128713" \
--relative-path models/checkpoints \
--set-civitai-api-token "YOUR_TOKEN"
```
List installed: `comfy model list`.
### Post-Install: Install Custom Nodes
```bash
comfy node install comfyui-impact-pack # popular utility pack
comfy node install comfyui-animatediff-evolved # video generation
comfy node install comfyui-controlnet-aux # ControlNet preprocessors
comfy node install comfyui-essentials # common helpers
comfy node update all
comfy node install-deps --workflow=workflow.json # install everything a workflow needs
```
### Post-Install: Verify
```bash
python3 scripts/health_check.py
# → comfy_cli on PATH? server reachable? checkpoints? smoke test?
python3 scripts/check_deps.py my_workflow.json
# → are this workflow's nodes/models/embeddings installed?
python3 scripts/run_workflow.py \
--workflow workflows/sd15_txt2img.json \
--args '{"prompt": "test", "steps": 4}' \
--output-dir ./test-outputs
```
## Image Upload (img2img / Inpainting)
The simplest way is to use `--input-image` with `run_workflow.py`:
```bash
python3 scripts/run_workflow.py \
--workflow workflows/sdxl_img2img.json \
--input-image image=./photo.png \
--args '{"prompt": "make it cyberpunk", "denoise": 0.6}'
```
The flag uploads `photo.png`, then injects its server-side filename into
whatever schema parameter is named `image`. For inpainting, pass both:
```bash
python3 scripts/run_workflow.py \
--workflow workflows/sdxl_inpaint.json \
--input-image image=./photo.png \
--input-image mask_image=./mask.png \
--args '{"prompt": "fill with flowers"}'
```
Manual upload via REST:
```bash
curl -X POST "http://127.0.0.1:8188/upload/image" \
-F "image=@photo.png" -F "type=input" -F "overwrite=true"
# Returns: {"name": "photo.png", "subfolder": "", "type": "input"}
# Cloud equivalent:
curl -X POST "https://cloud.comfy.org/api/upload/image" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY" \
-F "image=@photo.png" -F "type=input" -F "overwrite=true"
```
## Cloud Specifics
- **Base URL:** `https://cloud.comfy.org`
- **Auth:** `X-API-Key` header (or `?token=KEY` for WebSocket)
- **API key:** set `$COMFY_CLOUD_API_KEY` once and the scripts pick it up automatically
- **Output download:** `/api/view` returns a 302 to a signed URL; the scripts
follow it and strip `X-API-Key` before fetching from the storage backend
(don't leak the API key to S3/CloudFront).
- **Endpoint differences from local ComfyUI:**
- `/api/object_info`, `/api/queue`, `/api/userdata` — **403 on free tier**;
paid only.
- `/history` is renamed to `/history_v2` on cloud (the scripts route
automatically).
- `/models/<folder>` is renamed to `/experiment/models/<folder>` on cloud
(the scripts route automatically).
- `clientId` in WebSocket is currently ignored — all connections for a
user receive the same broadcast. Filter by `prompt_id` client-side.
- `subfolder` is accepted on uploads but ignored — cloud has a flat namespace.
- **Concurrent jobs:** Free/Standard: 1, Creator: 3, Pro: 5. Extras queue
automatically. Use `run_batch.py --parallel N` to saturate your tier.
## Queue & System Management
```bash
# Local
curl -s http://127.0.0.1:8188/queue | python3 -m json.tool
curl -X POST http://127.0.0.1:8188/queue -d '{"clear": true}' # cancel pending
curl -X POST http://127.0.0.1:8188/interrupt # cancel running
curl -X POST http://127.0.0.1:8188/free \
-H "Content-Type: application/json" \
-d '{"unload_models": true, "free_memory": true}'
# Cloud — same paths under /api/, plus:
python3 scripts/fetch_logs.py --tail-queue --host https://cloud.comfy.org
```
## Pitfalls
1. **API format required** — every script and the `/api/prompt` endpoint expect
API-format workflow JSON. The scripts detect editor format (top-level
`nodes` and `links` arrays) and tell you to re-export via
"Workflow → Export (API)" (newer UI) or "Save (API Format)" (older UI).
2. **Server must be running** — all execution requires a live server.
`comfy launch --background` starts one. Verify with
`curl http://127.0.0.1:8188/system_stats`.
3. **Model names are exact** — case-sensitive, includes file extension.
`check_deps.py` does fuzzy matching (with/without extension and folder
prefix), but the workflow itself must use the canonical name. Use
`comfy model list` to discover what's installed.
4. **Missing custom nodes** — "class_type not found" means a required node
isn't installed. `check_deps.py` reports which package to install;
`auto_fix_deps.py` runs the install for you.
5. **Working directory** — `comfy-cli` auto-detects the ComfyUI workspace.
If commands fail with "no workspace found", use
`comfy --workspace /path/to/ComfyUI <command>` or
`comfy set-default /path/to/ComfyUI`.
6. **Cloud free-tier API limits** — `/api/prompt`, `/api/view`, `/api/upload/*`,
`/api/object_info` all return 403 on free accounts. `health_check.py` and
`check_deps.py` handle this gracefully and surface a clear message.
7. **Timeout for video/audio workflows** — auto-detected when an output node
is `VHS_VideoCombine`, `SaveVideo`, etc.; the default jumps from 300 s to
900 s. Override explicitly with `--timeout 1800`.
8. **Path traversal in output filenames** — server-supplied filenames are
passed through `safe_path_join` to refuse anything escaping `--output-dir`.
Keep this protection on — workflows with custom save nodes can produce
arbitrary paths.
9. **Workflow JSON is arbitrary code** — custom nodes run Python, so
submitting an unknown workflow has the same trust profile as `eval`.
Inspect workflows from untrusted sources before running.
10. **Auto-randomized seed** — pass `seed: -1` in `--args` (or use
`--randomize-seed` and omit the seed) to get a fresh seed per run.
The actual seed is logged to stderr.
11. **`tracking` prompt** — first run of `comfy` may prompt for analytics.
Use `comfy --skip-prompt tracking disable` to skip non-interactively.
`comfyui_setup.sh` does this for you.
## Verification Checklist
Use `python3 scripts/health_check.py` to run the whole list at once. Manual:
- [ ] `hardware_check.py` verdict is `ok` OR the user explicitly chose Comfy Cloud
- [ ] `comfy --version` works (or `uvx --from comfy-cli comfy --help`)
- [ ] `curl http://HOST:PORT/system_stats` returns JSON
- [ ] `comfy model list` shows at least one checkpoint (local) OR
`/api/experiment/models/checkpoints` returns models (cloud)
- [ ] Workflow JSON is in API format
- [ ] `check_deps.py` reports `is_ready: true` (or only `node_check_skipped`
on cloud free tier)
- [ ] Test run with a small workflow completes; outputs land in `--output-dir`
@@ -0,0 +1,255 @@
# comfy-cli Command Reference
Official CLI from [Comfy-Org/comfy-cli](https://github.com/Comfy-Org/comfy-cli).
Docs: https://docs.comfy.org/comfy-cli/getting-started
## Installation
Order of preference:
```bash
pipx install comfy-cli # recommended (isolated env)
uvx --from comfy-cli comfy --help # zero-install via uv
pip install --user comfy-cli # fallback
```
The skill's `comfyui_setup.sh` picks the best available method.
First run may prompt for analytics. Disable non-interactively:
```bash
comfy --skip-prompt tracking disable
```
## Global Options
| Option | Description |
|--------|-------------|
| `--workspace <path>` | Target a specific ComfyUI workspace |
| `--recent` | Use most recently used workspace |
| `--here` | Use current directory as workspace |
| `--skip-prompt` | No interactive prompts (use defaults) |
| `-v` / `--version` | Print version |
Workspace resolution priority:
1. `--workspace` (explicit path)
2. `--recent` (from config)
3. `--here` (cwd)
4. `comfy set-default` path
5. Most recently used
6. `~/comfy/ComfyUI` (Linux) or `~/Documents/comfy/ComfyUI` (macOS/Win)
## Lifecycle Commands
### `comfy install`
Download and install ComfyUI + ComfyUI-Manager.
```bash
comfy install # interactive GPU selection
comfy install --nvidia
comfy install --amd # ROCm (Linux)
comfy install --m-series # Apple Silicon (MPS)
comfy install --cpu # CPU only (slow)
comfy install --fast-deps # use uv for deps
comfy install --skip-manager # skip ComfyUI-Manager
```
| Option | Description |
|--------|-------------|
| `--nvidia` / `--amd` / `--m-series` / `--cpu` | GPU type |
| `--cuda-version` | 11.8, 12.1, 12.4, 12.6, 12.8, 12.9, 13.0 |
| `--rocm-version` | 6.1, 6.2, 6.3, 7.0, 7.1 |
| `--fast-deps` | uv-based dependency resolution |
| `--skip-manager` | Don't install ComfyUI-Manager |
| `--skip-torch-or-directml` | Skip PyTorch install |
| `--version <ver>` | `0.2.0`, `latest`, `nightly` |
| `--commit <hash>` | Install specific commit |
| `--pr "#1234"` | Install from a PR |
| `--restore` | Restore deps for existing install |
### `comfy launch`
```bash
comfy launch # foreground :8188
comfy launch --background # background daemon
comfy launch -- --listen 0.0.0.0 # LAN-accessible
comfy launch -- --port 8190 # custom port
comfy launch -- --cpu # force CPU mode
comfy launch -- --lowvram # 6 GB cards
comfy launch --background -- --listen 0.0.0.0 --port 8190
```
Common extra args after `--`: `--listen`, `--port`, `--cpu`, `--lowvram`,
`--novram`, `--fp16-vae`, `--force-fp32`, `--disable-cuda-malloc`.
### `comfy stop`
```bash
comfy stop
```
### `comfy run`
Submit a raw workflow JSON to a running server. **Limited** — no parameter
injection, no structured output download. For agents, use
`scripts/run_workflow.py` instead.
```bash
comfy run --workflow workflow_api.json
comfy run --workflow workflow_api.json --host 10.0.0.5 --port 8188
comfy run --workflow workflow_api.json --timeout 300 --wait
```
### `comfy which`
```bash
comfy which # show targeted workspace
comfy --recent which
```
### `comfy set-default`
```bash
comfy set-default /path/to/ComfyUI
comfy set-default /path/to/ComfyUI --launch-extras="--listen 0.0.0.0"
```
### `comfy update`
```bash
comfy update # update ComfyUI core
comfy node update all # update all custom nodes
```
---
## `comfy node` — Custom Node Management
All node operations use ComfyUI-Manager (`cm-cli`) under the hood.
```bash
comfy node show installed # list installed
comfy node show enabled # list enabled
comfy node show all # all available in registry
comfy node simple-show installed # compact list
comfy node install comfyui-impact-pack
comfy node install <name> --uv-compile # ComfyUI-Manager v4.1+ unified resolver
comfy node uninstall <name>
comfy node update <name> | all
comfy node enable <name>
comfy node disable <name>
comfy node fix <name> # fix broken deps
comfy node install-deps --workflow=workflow.json
comfy node deps-in-workflow --workflow=w.json --output=deps.json
comfy node save-snapshot
comfy node restore-snapshot <file>
comfy node bisect start # binary-search a culprit node
comfy node bisect good
comfy node bisect bad
comfy node bisect reset
```
### Dependency Resolution Options
| Flag | Description |
|------|-------------|
| `--fast-deps` | comfy-cli built-in uv resolver |
| `--uv-compile` | ComfyUI-Manager v4.1+ unified resolver (recommended) |
| `--no-deps` | Skip dep installation |
Make `uv-compile` default: `comfy manager uv-compile-default true`
---
## `comfy model` — Model Management
```bash
comfy model list
comfy model list --relative-path models/checkpoints
comfy model download --url <URL>
comfy model download --url <URL> --relative-path models/loras
comfy model download --url <URL> --filename custom_name.safetensors
comfy model remove # interactive
comfy model remove --relative-path models/checkpoints --model-names "model.safetensors"
```
| Option | Description |
|--------|-------------|
| `--url` | Download URL (CivitAI, HuggingFace, direct) |
| `--relative-path` | Subdirectory under workspace (e.g. `models/checkpoints`) |
| `--filename` | Custom save filename |
| `--set-civitai-api-token` | Persist CivitAI token |
| `--set-hf-api-token` | Persist HuggingFace token |
| `--downloader` | `httpx` (default) or `aria2` |
Standard model directories:
```
ComfyUI/models/
├── checkpoints/ # Full model files
├── loras/ # LoRA adapters
├── vae/ # VAE models
├── controlnet/ # ControlNet models
├── clip/ # CLIP / T5 text encoders
├── clip_vision/ # CLIP vision encoders
├── upscale_models/ # ESRGAN / SwinIR / etc.
├── embeddings/ # Textual inversion embeddings
├── unet/ # Standalone UNet weights
├── diffusion_models/ # Flux / SD3 / Wan diffusion models
├── animatediff_models/ # AnimateDiff motion modules
├── ipadapter/ # IPAdapter weights
└── style_models/ # Style adapters
```
---
## `comfy manager` — ComfyUI-Manager Settings
```bash
comfy manager disable # disable Manager completely
comfy manager enable-gui # enable new GUI
comfy manager disable-gui # API-only
comfy manager enable-legacy-gui # legacy GUI
comfy manager uv-compile-default true # make --uv-compile the default
comfy manager clear # clear startup action
```
---
## `comfy pr-cache` — Frontend PR Cache
```bash
comfy pr-cache list
comfy pr-cache clean
comfy pr-cache clean 456
```
Cache expires after 7 days; max 10 builds.
---
## Configuration
| OS | Path |
|----|------|
| Linux | `~/.config/comfy-cli/config.ini` |
| macOS | `~/Library/Application Support/comfy-cli/config.ini` |
| Windows | `~/AppData/Local/comfy-cli/config.ini` |
Stores: default workspace, recent workspace, background server PID, API
tokens, manager GUI mode, launch extras.
## Discovery
Custom-node registry:
- https://registry.comfy.org/
Model browsers:
- https://huggingface.co/models
- https://civitai.com (NSFW; requires API token for many)
- https://comfyworkflows.com (community workflows)
@@ -0,0 +1,312 @@
# ComfyUI REST + WebSocket API Reference
ComfyUI exposes a REST + WebSocket interface for workflow execution and
management. **The same surface is used locally and on Comfy Cloud, with
auth/path differences.**
## Connection
| | Local ComfyUI | Comfy Cloud |
|---|---|---|
| Base URL | `http://127.0.0.1:8188` | `https://cloud.comfy.org` |
| API path prefix | none (`/prompt`, `/view`, …) | `/api/...` (`/api/prompt`, `/api/view`, …) |
| Auth | none (or bearer token if configured) | `X-API-Key` header |
| WebSocket | `ws://host:port/ws?clientId={uuid}` | `wss://cloud.comfy.org/ws?clientId={uuid}&token={API_KEY}` |
| `/api/view` response | direct bytes | 302 redirect → signed URL (use `curl -L`) |
The skill scripts route URLs automatically via `_common.resolve_url()`.
## Endpoint differences on Comfy Cloud
The cloud surface diverges from local ComfyUI in several ways. The skill
scripts handle these transparently; document them here so anyone calling
`curl` directly knows.
| Local path | Cloud path | Notes |
|------------|-----------|-------|
| `/system_stats` | `/api/system_stats` | Cloud version is **public** (no auth required) |
| `/object_info` | `/api/object_info` | **Paid tier only** — free returns 403 |
| `/queue` | `/api/queue` | Paid tier only |
| `/userdata` | `/api/userdata` | Paid tier only |
| `/prompt` (POST) | `/api/prompt` (POST) | Paid tier only |
| `/upload/image` | `/api/upload/image` | Paid tier only; `subfolder` accepted but ignored |
| `/upload/mask` | `/api/upload/mask` | Same as above |
| `/view` | `/api/view` | Paid tier only; **returns 302** to signed URL |
| `/history` | `/api/history_v2` | **Renamed**; old path returns 404 |
| `/history/{id}` | `/api/history_v2/{id}` or `/api/jobs/{id}` | Both work; `/jobs` returns full job |
| `/models` | `/api/experiment/models` | **Renamed** |
| `/models/{folder}` | `/api/experiment/models/{folder}` | **Renamed**; response shape differs (see below) |
### Cloud model-list response shape
- **Local:** `["a.safetensors", "b.safetensors", …]` — flat list of strings.
- **Cloud:** `[{"name": "a.safetensors", "pathIndex": 0}, …]` — list of objects.
- **Cloud 404 with `code: "folder_not_found"`** — folder is empty or unknown,
not an "endpoint missing" error. Distinguish by reading the body.
The skill helper `_common.parse_model_list()` normalizes both.
## Workflow Execution
### Submit Workflow
```bash
# Local
curl -X POST "http://127.0.0.1:8188/prompt" \
-H "Content-Type: application/json" \
-d '{"prompt": '"$(cat workflow_api.json)"', "client_id": "'"$(uuidgen)"'"}'
# Cloud
curl -X POST "https://cloud.comfy.org/api/prompt" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": '"$(cat workflow_api.json)"'}'
```
**Response:**
```json
{"prompt_id": "abc-123-def", "number": 1, "node_errors": {}}
```
If `node_errors` is non-empty, the workflow has validation errors (missing
nodes, bad inputs).
### Check Job Status (Cloud)
```bash
curl -X GET "https://cloud.comfy.org/api/job/{prompt_id}/status" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY"
```
| Status | Description |
| ------------- | ---------------------------------- |
| `pending` | Job is queued and waiting to start |
| `in_progress` | Job is currently executing |
| `completed` | Job finished successfully |
| `failed` | Job encountered an error |
| `cancelled` | Job was cancelled by user |
### Job detail with outputs (Cloud)
```bash
curl -X GET "https://cloud.comfy.org/api/jobs/{prompt_id}" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY"
```
Response includes `outputs` keyed by node ID. Cloud uses `video` (singular)
in the output structure; local uses `videos` (plural). The skill scripts
accept both.
### Get History (Local)
```bash
curl -s "http://127.0.0.1:8188/history" # all
curl -s "http://127.0.0.1:8188/history/{id}" # one prompt_id
```
Local entry shape:
```json
{
"<prompt_id>": {
"prompt": [...],
"outputs": {"<node_id>": {"images": [...]}},
"status": {
"status_str": "success" | "error",
"completed": true | false,
"messages": [["execution_start", {...}], ["execution_error", {...}], ]
}
}
}
```
**Important:** when reading status, check `status_str == "error"` BEFORE
checking `completed`, because both can be true for failed runs.
### Download Output
```bash
# Local (direct bytes)
curl -s "http://127.0.0.1:8188/view?filename=ComfyUI_00001_.png&subfolder=&type=output" \
-o output.png
# Cloud (302 → signed URL; -L follows; STRIP X-API-Key for the second hop)
curl -L "https://cloud.comfy.org/api/view?filename=...&type=output" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY" \
-o output.png
```
The skill's `run_workflow.py` strips `X-API-Key` automatically on the
cross-host redirect, so the signed URL never sees your auth.
## WebSocket Monitoring
Connect for real-time execution events.
```bash
# Local
wscat -c "ws://127.0.0.1:8188/ws?clientId=MY-UUID"
# Cloud
wscat -c "wss://cloud.comfy.org/ws?clientId=MY-UUID&token=$COMFY_CLOUD_API_KEY"
```
**Note:** on Cloud the `clientId` is currently ignored — all messages for a
user are broadcast to every connection. Filter messages client-side by
`data.prompt_id`.
### JSON Message Types
| Type | When | Key Fields |
|------|------|------------|
| `status` | Queue change | `status.exec_info.queue_remaining` |
| `notification` | User-friendly status string | `value` |
| `execution_start` | Workflow begins | `prompt_id` |
| `executing` | Node running (or end-of-run if `node` is null on local) | `node`, `prompt_id` |
| `progress` | Sampling steps | `node`, `value`, `max` |
| `progress_state` | Extended progress with per-node metadata | `nodes` (dict) |
| `executed` | Node output ready | `node`, `output` (with `images`/`video`/etc.) |
| `execution_cached` | Nodes skipped because of cache | `nodes` (list of IDs) |
| `execution_success` | All done | `prompt_id` |
| `execution_error` | Failure | `exception_type`, `exception_message`, `traceback`, `node_id` |
| `execution_interrupted` | Cancelled | `prompt_id` |
### Binary Frames (Preview Images)
| Type code | Meaning |
|-----------|---------|
| `0x00000001` | `PREVIEW_IMAGE``[type:4][image_type:4][data]` (image_type 1=JPEG, 2=PNG) |
| `0x00000003` | `TEXT``[type:4][nid_len:4][nid][text]` (UTF-8) |
| `0x00000004` | `PREVIEW_IMAGE_WITH_METADATA``[type:4][meta_len:4][json][image_data]` |
`scripts/ws_monitor.py --previews <dir>` saves preview frames to disk.
## File Upload
```bash
# Image
curl -X POST "http://127.0.0.1:8188/upload/image" \
-F "image=@photo.png" -F "type=input" -F "overwrite=true"
# Returns: {"name": "photo.png", "subfolder": "", "type": "input"}
# Mask (linked to a previously uploaded image)
curl -X POST "http://127.0.0.1:8188/upload/mask" \
-F "image=@mask.png" -F "type=input" \
-F 'original_ref={"filename":"photo.png","subfolder":"","type":"input"}'
```
Cloud equivalent: prepend `https://cloud.comfy.org/api` and add `-H "X-API-Key: $COMFY_CLOUD_API_KEY"`.
## Node & Model Discovery
```bash
# All node types and their input specs
curl -s "http://127.0.0.1:8188/object_info" | python3 -m json.tool
# Specific node
curl -s "http://127.0.0.1:8188/object_info/KSampler"
# Models per folder (local)
curl -s "http://127.0.0.1:8188/models/checkpoints"
curl -s "http://127.0.0.1:8188/models/loras"
# Models per folder (cloud — note the experimental prefix)
curl -s "https://cloud.comfy.org/api/experiment/models/checkpoints" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY"
```
## Queue Management
```bash
# View queue
curl -s "http://127.0.0.1:8188/queue"
# Clear all pending
curl -X POST "http://127.0.0.1:8188/queue" \
-H "Content-Type: application/json" \
-d '{"clear": true}'
# Delete specific items
curl -X POST "http://127.0.0.1:8188/queue" \
-H "Content-Type: application/json" \
-d '{"delete": ["prompt_id_1", "prompt_id_2"]}'
# Cancel currently-running job
curl -X POST "http://127.0.0.1:8188/interrupt"
```
## System Management
```bash
# Stats (VRAM, RAM, GPU, ComfyUI version)
curl -s "http://127.0.0.1:8188/system_stats"
# Free GPU memory
curl -X POST "http://127.0.0.1:8188/free" \
-H "Content-Type: application/json" \
-d '{"unload_models": true, "free_memory": true}'
```
## ComfyUI-Manager Endpoints (Optional)
These require ComfyUI-Manager installed. Useful for installing nodes/models
via the API instead of `comfy-cli`.
```bash
# Install a custom node from a git URL
curl -X POST "http://127.0.0.1:8188/manager/queue/install" \
-H "Content-Type: application/json" \
-d '{"git_url": "https://github.com/user/comfyui-node.git"}'
# Check install queue status
curl -s "http://127.0.0.1:8188/manager/queue/status"
# Install model
curl -X POST "http://127.0.0.1:8188/manager/queue/install_model" \
-H "Content-Type: application/json" \
-d '{"url": "https://...", "path": "models/checkpoints", "filename": "model.safetensors"}'
```
## POST /prompt Payload Format
```json
{
"prompt": {
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 42,
"steps": 20,
"cfg": 7.5,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
}
}
},
"client_id": "unique-uuid-for-ws-filtering",
"extra_data": {
"api_key_comfy_org": "optional-PARTNER-NODE-key (NOT the cloud auth key)"
}
}
```
- `prompt`: workflow graph in API format
- `client_id`: UUID — local server uses it to filter WebSocket events; cloud
ignores it.
- `extra_data.api_key_comfy_org`: ONLY required when the workflow uses
partner nodes (Flux Pro, Ideogram, etc.). Don't conflate with `X-API-Key`.
## Error Categories (cloud `execution_error` `exception_type`)
| Type | Meaning |
|------|---------|
| `ValidationError` | Bad workflow / inputs (often nicer to surface from `node_errors`) |
| `ModelDownloadError` | Required model not available |
| `ImageDownloadError` | Failed to fetch input image from URL |
| `OOMError` | Out of GPU memory |
| `InsufficientFundsError` | Account balance too low (partner nodes) |
| `InactiveSubscriptionError` | Subscription not active |
@@ -0,0 +1,226 @@
# ComfyUI Workflow JSON Format
## Two Formats — Only API Format Is Executable
**API format** is required for `/api/prompt` and every script in this skill.
The web UI also produces an "editor format" used for visual editing, which
**cannot** be submitted directly.
### API Format
Top-level keys are string node IDs. Each node has `class_type` and `inputs`:
```json
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
},
"_meta": {"title": "KSampler"}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
}
}
```
**Detection:** every top-level value has `class_type`. The skill's
`_common.is_api_format()` does this check.
### Editor Format (not directly executable)
Has `nodes[]` and `links[]` arrays — the visual graph. To convert: open in
ComfyUI's web UI and use **Workflow → Export (API)** (newer UI) or the
"Save (API Format)" button (older UI).
**Detection:** top-level has `"nodes"` and `"links"` keys.
## Inputs: Literals vs Links
```json
"inputs": {
"text": "a cat", // literal — modifiable
"seed": 42, // literal — modifiable
"clip": ["4", 1] // link — wiring; do NOT overwrite
}
```
Links are length-2 arrays of `[upstream_node_id, output_slot]`. The skill's
parameter injector refuses to overwrite a link with a literal (logs a
warning and skips).
## Common Node Types and Their Controllable Parameters
The full catalog lives in `scripts/_common.py` (`PARAM_PATTERNS` and
`MODEL_LOADERS`). Highlights:
### Text Prompts
| Node Class | Key Fields |
|------------|------------|
| `CLIPTextEncode` | `text` |
| `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
| `CLIPTextEncodeFlux` | `clip_l`, `t5xxl`, `guidance` |
To distinguish positive from negative the skill traces `KSampler.negative`
back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
back to `_meta.title` heuristics ("negative", "neg", "anti").
### Sampling
| Node Class | Key Fields |
|------------|------------|
| `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step` |
| `SamplerCustom` | `noise_seed`, `cfg`, `sampler`, `sigmas` |
| `SamplerCustomAdvanced` | `noise_seed` (via RandomNoise input) |
| `RandomNoise` | `noise_seed` |
| `BasicScheduler` | `steps`, `scheduler`, `denoise` |
| `KSamplerSelect` | `sampler_name` |
| `BasicGuider` / `CFGGuider` | `cfg` |
| `ModelSamplingFlux` | `max_shift`, `base_shift`, `width`, `height` |
| `SDTurboScheduler` | `steps`, `denoise` |
### Latent / Dimensions
| Node Class | Key Fields |
|------------|------------|
| `EmptyLatentImage` | `width`, `height`, `batch_size` |
| `EmptySD3LatentImage` | `width`, `height`, `batch_size` |
| `EmptyHunyuanLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyMochiLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyLTXVLatentVideo` | `width`, `height`, `length`, `batch_size` |
### Model Loading
| Node Class | Key Fields | Folder |
|------------|------------|--------|
| `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
| `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
| `LoraLoaderModelOnly` | `lora_name`, `strength_model` | `loras` |
| `VAELoader` | `vae_name` | `vae` |
| `ControlNetLoader` | `control_net_name` | `controlnet` |
| `CLIPLoader` | `clip_name` | `clip` |
| `DualCLIPLoader` | `clip_name1`, `clip_name2` | `clip` |
| `TripleCLIPLoader` | `clip_name1/2/3` | `clip` |
| `UNETLoader` | `unet_name` | `unet` |
| `DiffusionModelLoader` | `model_name` | `diffusion_models` |
| `UpscaleModelLoader` | `model_name` | `upscale_models` |
| `IPAdapterModelLoader` | `ipadapter_file` | `ipadapter` |
| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` | `animatediff_models` |
### Image Input/Output
| Node Class | Key Fields |
|------------|------------|
| `LoadImage` | `image` (server-side filename, after upload) |
| `LoadImageMask` | `image`, `channel` (`red` / `green` / `blue` / `alpha`) |
| `VAEEncode` / `VAEDecode` | (no controllable fields) |
| `VAEEncodeForInpaint` | `grow_mask_by` |
| `SaveImage` | `filename_prefix` |
| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong` |
### ControlNet
| Node Class | Key Fields |
|------------|------------|
| `ControlNetApply` | `strength` |
| `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |
### IPAdapter (community pack `comfyui_ipadapter_plus`)
| Node Class | Key Fields |
|------------|------------|
| `IPAdapterAdvanced` | `weight`, `start_at`, `end_at` |
| `IPAdapter` | `weight` |
### Embeddings (referenced inside prompt strings)
ComfyUI scans prompt text for `embedding:NAME` syntax. The skill's
`_common.iter_embedding_refs()` extracts these as model dependencies.
```text
"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
```
`extract_schema.py` and `check_deps.py` surface these in
`embedding_dependencies` / `missing_embeddings`.
## Parameter Injection Pattern
```python
import json, copy
with open("workflow_api.json") as f:
workflow = json.load(f)
wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024
```
`scripts/extract_schema.py` automates discovering which node IDs/fields
correspond to which user-facing parameters. It returns a `parameters` dict
that `run_workflow.py` reads to inject values from `--args`.
## Identifying Controllable Parameters (Heuristics)
For unknown workflows:
1. **Prompt text** — any `CLIPTextEncode.text`. Use connection tracing back
from `KSampler.positive` / `.negative` to disambiguate (don't trust
meta-title alone).
2. **Seed**`KSampler.seed` / `KSamplerAdvanced.noise_seed` / `RandomNoise.noise_seed`.
3. **Dimensions**`Empty*LatentImage.width/height` (must be multiples of 8).
4. **Steps / CFG**`KSampler.steps`, `KSampler.cfg`. Steps 2050 typical.
CFG 515 typical (Flux uses guidance, not CFG).
5. **Model / checkpoint**`CheckpointLoaderSimple.ckpt_name`. Filename must
match an installed file *exactly*.
6. **LoRA**`LoraLoader.lora_name`, `.strength_model`.
7. **Images for img2img / inpaint**`LoadImage.image`. Server-side filename
after upload.
8. **Denoise**`KSampler.denoise`. 0.01.0; 1.0 = ignore input image,
0.0 = pass through. Sweet spot for img2img: 0.40.7.
## Output Nodes
Output is produced by these node types. The skill's `OUTPUT_NODES` set
extends to common community packs.
| Node | Output Key | Content |
|------|-----------|---------|
| `SaveImage` | `images` | List of `{filename, subfolder, type}` |
| `PreviewImage` | `images` | Temporary preview (not saved) |
| `VHS_VideoCombine` | `gifs` (older) or `videos`/`video` (newer cloud) | Video file refs |
| `SaveAudio` | `audio` | Audio file refs |
| `SaveAnimatedWEBP` / `SaveAnimatedPNG` | `images` | Animated images |
| `Save3D` | `3d` | 3D asset refs |
After execution, fetch outputs from `/history/{prompt_id}` (local) or
`/api/jobs/{prompt_id}` (cloud) → `outputs``{node_id}``{key}`.
## Wrapper Variants
Some saved JSON files wrap the workflow under a `"prompt"` key (matching
the `/api/prompt` payload shape). The skill's `_common.unwrap_workflow()`
handles this — pass any of:
- raw API format: `{"3": {...}, "4": {...}}`
- wrapped: `{"prompt": {"3": {...}}, "client_id": "..."}`
It rejects editor format with a clear error and a re-export instruction.
@@ -0,0 +1,835 @@
"""
_common.py — Shared logic for ComfyUI skill scripts.
Single source of truth for:
- HTTP transport (with retry/backoff, streaming, timeout handling)
- Cloud detection and endpoint mapping (local ComfyUI vs Comfy Cloud)
- Workflow node-type catalogs (param patterns, model loaders, output nodes)
- API-format validation
- Path-traversal-safe file writes
- API-key loading from env / CLI
Stdlib-only by design (with optional `requests` upgrade if installed). Python 3.10+.
"""
from __future__ import annotations
import json
import os
import random
import re
import sys
import time
import uuid
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Iterator
from urllib.parse import urlparse
# Optional: prefer `requests` if installed (better redirects, streaming, header handling)
try:
import requests # type: ignore[import-not-found]
HAS_REQUESTS = True
except ImportError: # pragma: no cover - exercised via stdlib fallback
HAS_REQUESTS = False
import urllib.error
import urllib.request
# =============================================================================
# Constants & catalogs
# =============================================================================
DEFAULT_LOCAL_HOST = "http://127.0.0.1:8188"
DEFAULT_CLOUD_HOST = "https://cloud.comfy.org"
ENV_API_KEY = "COMFY_CLOUD_API_KEY"
# Connection / retry defaults
DEFAULT_HTTP_TIMEOUT = 60 # seconds — single-attempt request timeout
DEFAULT_RETRIES = 3 # total attempts including the first
RETRY_BASE_DELAY = 1.0 # seconds — exponential backoff base
RETRY_MAX_DELAY = 30.0 # seconds — cap on backoff
RETRY_STATUS_CODES = {408, 429, 500, 502, 503, 504, 522, 524}
# Streaming download chunk size (bytes)
DOWNLOAD_CHUNK_SIZE = 1 << 16 # 64 KiB
# Heuristic: workflows with these node types tend to be slow → larger default timeout
SLOW_OUTPUT_NODES = {
"VHS_VideoCombine", "SaveAnimatedWEBP", "SaveAnimatedPNG",
"SaveVideo", "SaveAudio", "SaveAnimateDiffVideo",
"SVD_img2vid_Conditioning",
"WanVideoSampler", "HunyuanVideoSampler",
"CogVideoSampler", "LTXVideoSampler",
}
# ---------------------------------------------------------------------------
# Output node catalog (extensible — community packs add their own)
# ---------------------------------------------------------------------------
OUTPUT_NODES: set[str] = {
# Built-in
"SaveImage", "PreviewImage",
"SaveAudio", "SaveVideo", "PreviewAudio", "PreviewVideo",
"SaveAnimatedWEBP", "SaveAnimatedPNG",
# Common community packs
"VHS_VideoCombine", # Video Helper Suite
"ImageSave", # Was Node Suite
"Image Save", # Was Node Suite (alt name)
"easy imageSave", # easy-use
"Image Save With Metadata",
"PreviewImage|pysssss", # pysssss preview
"ShowText|pysssss",
"SaveLatent",
"SaveGLB", # 3D
"Save3D",
}
# ---------------------------------------------------------------------------
# Folder aliases — handle ComfyUI's gradual folder renames
# ---------------------------------------------------------------------------
# When `check_deps.py` queries `/models/<folder>` and gets 404 / empty,
# it tries each alias in turn. Critical for Comfy Cloud which has fully
# migrated to the new naming (unet → diffusion_models, clip → text_encoders).
FOLDER_ALIASES: dict[str, list[str]] = {
"unet": ["unet", "diffusion_models"],
"diffusion_models": ["diffusion_models", "unet"],
"clip": ["clip", "text_encoders"],
"text_encoders": ["text_encoders", "clip"],
"controlnet": ["controlnet", "control_net"],
}
def folder_aliases_for(folder: str) -> list[str]:
"""Return the search order of folder names (primary first)."""
return FOLDER_ALIASES.get(folder, [folder])
# ---------------------------------------------------------------------------
# Model-loader catalog: class_type -> (input field, model folder)
# ---------------------------------------------------------------------------
# A loader can have multiple fields (e.g., DualCLIPLoader has clip_name1 and
# clip_name2). We list them with explicit entries. The folder name is the
# *canonical* one; FOLDER_ALIASES is consulted when querying.
MODEL_LOADERS: dict[str, list[tuple[str, str]]] = {
# Checkpoints
"CheckpointLoaderSimple": [("ckpt_name", "checkpoints")],
"CheckpointLoader": [("ckpt_name", "checkpoints")],
"CheckpointLoader (Simple)": [("ckpt_name", "checkpoints")],
"ImageOnlyCheckpointLoader": [("ckpt_name", "checkpoints")],
"unCLIPCheckpointLoader": [("ckpt_name", "checkpoints")],
# LoRA
"LoraLoader": [("lora_name", "loras")],
"LoraLoaderModelOnly": [("lora_name", "loras")],
"LoraLoaderTagsQuery": [("lora_name", "loras")],
# VAE
"VAELoader": [("vae_name", "vae")],
# ControlNet
"ControlNetLoader": [("control_net_name", "controlnet")],
"DiffControlNetLoader": [("control_net_name", "controlnet")],
"ControlNetLoaderAdvanced": [("control_net_name", "controlnet")],
# CLIP / text encoders (primary "clip" folder; check_deps tries text_encoders too)
"CLIPLoader": [("clip_name", "clip")],
"DualCLIPLoader": [("clip_name1", "clip"), ("clip_name2", "clip")],
"TripleCLIPLoader": [("clip_name1", "clip"), ("clip_name2", "clip"), ("clip_name3", "clip")],
"CLIPVisionLoader": [("clip_name", "clip_vision")],
# UNET / Diffusion model (primary "unet"; check_deps tries diffusion_models too)
"UNETLoader": [("unet_name", "unet")],
"DiffusionModelLoader": [("model_name", "diffusion_models")],
"UNETLoaderGGUF": [("unet_name", "unet")],
# Upscaler
"UpscaleModelLoader": [("model_name", "upscale_models")],
# Style / GLIGEN / Hypernetwork
"StyleModelLoader": [("style_model_name", "style_models")],
"GLIGENLoader": [("gligen_name", "gligen")],
"HypernetworkLoader": [("hypernetwork_name", "hypernetworks")],
# IPAdapter family (community).
# Note: IPAdapterUnifiedLoader's `preset` and IPAdapterInsightFaceLoader's
# `provider` are enums (not file paths), so they're intentionally omitted —
# check_deps would otherwise treat enum values as missing model files.
"IPAdapterModelLoader": [("ipadapter_file", "ipadapter")],
"InstantIDModelLoader": [("instantid_file", "instantid")],
# AnimateDiff / video
"ADE_LoadAnimateDiffModel": [("model_name", "animatediff_models")],
"ADE_AnimateDiffLoaderWithContext": [("model_name", "animatediff_models")],
"ADE_AnimateDiffLoaderGen1": [("model_name", "animatediff_models")],
# Photomaker
"PhotoMakerLoader": [("photomaker_model_name", "photomaker")],
# Sampler / scheduler models
"ModelSamplingFlux": [], # parametric only
}
# ---------------------------------------------------------------------------
# Param patterns: (class_type, field_name) -> friendly_name
# Order matters — first match wins for naming. Use _meta.title for disambiguation.
# ---------------------------------------------------------------------------
PARAM_PATTERNS: list[tuple[str, str, str]] = [
# ---- Prompts ----
("CLIPTextEncode", "text", "prompt"),
("CLIPTextEncodeSDXL", "text_g", "prompt"),
("CLIPTextEncodeSDXL", "text_l", "prompt_l"),
("CLIPTextEncodeSDXLRefiner", "text", "refiner_prompt"),
("CLIPTextEncodeFlux", "clip_l", "prompt_l"),
("CLIPTextEncodeFlux", "t5xxl", "prompt"),
("CLIPTextEncodeFlux", "guidance", "guidance"),
("smZ CLIPTextEncode", "text", "prompt"),
("BNK_CLIPTextEncodeAdvanced", "text", "prompt"),
# ---- Standard sampling ----
("KSampler", "seed", "seed"),
("KSampler", "steps", "steps"),
("KSampler", "cfg", "cfg"),
("KSampler", "sampler_name", "sampler_name"),
("KSampler", "scheduler", "scheduler"),
("KSampler", "denoise", "denoise"),
("KSamplerAdvanced", "noise_seed", "seed"),
("KSamplerAdvanced", "steps", "steps"),
("KSamplerAdvanced", "cfg", "cfg"),
("KSamplerAdvanced", "sampler_name", "sampler_name"),
("KSamplerAdvanced", "scheduler", "scheduler"),
("KSamplerAdvanced", "start_at_step", "start_at_step"),
("KSamplerAdvanced", "end_at_step", "end_at_step"),
# ---- Modern sampler chain (Flux / SD3 / SDXL refiner via SamplerCustom) ----
("RandomNoise", "noise_seed", "seed"),
("BasicScheduler", "steps", "steps"),
("BasicScheduler", "scheduler", "scheduler"),
("BasicScheduler", "denoise", "denoise"),
("KSamplerSelect", "sampler_name", "sampler_name"),
# NB: BasicGuider has no cfg input (it just bundles model+conditioning).
("CFGGuider", "cfg", "cfg"),
("DualCFGGuider", "cfg_conds", "cfg"),
("DualCFGGuider", "cfg_cond2_negative", "cfg_negative"),
("ModelSamplingFlux", "max_shift", "max_shift"),
("ModelSamplingFlux", "base_shift", "base_shift"),
("ModelSamplingFlux", "width", "model_width"),
("ModelSamplingFlux", "height", "model_height"),
("ModelSamplingSD3", "shift", "shift"),
("ModelSamplingDiscrete", "sampling", "sampling"),
("SDTurboScheduler", "steps", "steps"),
("SDTurboScheduler", "denoise", "denoise"),
("SamplerCustom", "noise_seed", "seed"),
("SamplerCustom", "cfg", "cfg"),
# NB: SamplerCustomAdvanced takes a NOISE input (from RandomNoise) — no seed field directly.
# ---- Dimensions / latent ----
("EmptyLatentImage", "width", "width"),
("EmptyLatentImage", "height", "height"),
("EmptyLatentImage", "batch_size", "batch_size"),
("EmptySD3LatentImage", "width", "width"),
("EmptySD3LatentImage", "height", "height"),
("EmptySD3LatentImage", "batch_size", "batch_size"),
("EmptyHunyuanLatentVideo", "width", "width"),
("EmptyHunyuanLatentVideo", "height", "height"),
("EmptyHunyuanLatentVideo", "length", "length"),
("EmptyHunyuanLatentVideo", "batch_size", "batch_size"),
("EmptyMochiLatentVideo", "width", "width"),
("EmptyMochiLatentVideo", "height", "height"),
("EmptyMochiLatentVideo", "length", "length"),
("EmptyLTXVLatentVideo", "width", "width"),
("EmptyLTXVLatentVideo", "height", "height"),
("EmptyLTXVLatentVideo", "length", "length"),
("LatentUpscale", "width", "upscale_width"),
("LatentUpscale", "height", "upscale_height"),
("LatentUpscaleBy", "scale_by", "scale_by"),
("ImageScale", "width", "width"),
("ImageScale", "height", "height"),
# ---- Image input ----
("LoadImage", "image", "image"),
("LoadImageMask", "image", "mask_image"),
("LoadImageOutput", "image", "image"),
("VHS_LoadVideo", "video", "video"),
("VHS_LoadAudio", "audio", "audio"),
# ---- Model selection (sometimes useful to swap per run) ----
("CheckpointLoaderSimple", "ckpt_name", "ckpt_name"),
("CheckpointLoader", "ckpt_name", "ckpt_name"),
("ImageOnlyCheckpointLoader", "ckpt_name", "ckpt_name"),
("VAELoader", "vae_name", "vae_name"),
("UNETLoader", "unet_name", "unet_name"),
("DiffusionModelLoader", "model_name", "diffusion_model_name"),
("UpscaleModelLoader", "model_name", "upscale_model_name"),
("CLIPLoader", "clip_name", "clip_name"),
("DualCLIPLoader", "clip_name1", "clip_name1"),
("DualCLIPLoader", "clip_name2", "clip_name2"),
("ControlNetLoader", "control_net_name", "controlnet_name"),
# ---- LoRA ----
("LoraLoader", "lora_name", "lora_name"),
("LoraLoader", "strength_model", "lora_strength"),
("LoraLoader", "strength_clip", "lora_strength_clip"),
("LoraLoaderModelOnly", "lora_name", "lora_name"),
("LoraLoaderModelOnly", "strength_model", "lora_strength"),
# ---- ControlNet ----
("ControlNetApply", "strength", "controlnet_strength"),
("ControlNetApplyAdvanced", "strength", "controlnet_strength"),
("ControlNetApplyAdvanced", "start_percent", "controlnet_start"),
("ControlNetApplyAdvanced", "end_percent", "controlnet_end"),
# ---- IPAdapter ----
("IPAdapterAdvanced", "weight", "ipadapter_weight"),
("IPAdapterAdvanced", "start_at", "ipadapter_start"),
("IPAdapterAdvanced", "end_at", "ipadapter_end"),
("IPAdapter", "weight", "ipadapter_weight"),
# ---- Upscale ----
("ImageUpscaleWithModel", "upscale_method", "upscale_method"),
# ---- AnimateDiff ----
("ADE_AnimateDiffLoaderWithContext", "motion_scale", "motion_scale"),
("ADE_AnimateDiffLoaderGen1", "motion_scale", "motion_scale"),
# ---- Video / Save ----
("VHS_VideoCombine", "frame_rate", "frame_rate"),
("VHS_VideoCombine", "format", "video_format"),
("VHS_VideoCombine", "filename_prefix", "filename_prefix"),
("SaveImage", "filename_prefix", "filename_prefix"),
# ---- Hunyuan / Wan / LTX video ----
("HunyuanVideoSampler", "seed", "seed"),
("HunyuanVideoSampler", "steps", "steps"),
("HunyuanVideoSampler", "cfg", "cfg"),
("WanVideoSampler", "seed", "seed"),
("WanVideoSampler", "steps", "steps"),
("WanVideoSampler", "cfg", "cfg"),
("LTXVScheduler", "max_shift", "max_shift"),
("LTXVScheduler", "base_shift", "base_shift"),
# ---- rgthree primitives (often used as user-facing inputs) ----
("Seed (rgthree)", "seed", "seed"),
("Image Comparer (rgthree)", "image_a", "image"),
("Power Lora Loader (rgthree)", "PowerLoraLoaderHeaderWidget", "_lora_header"),
# ---- Easy-use / utility primitives ----
("PrimitiveNode", "value", "primitive_value"),
("easy seed", "seed", "seed"),
("easy positive", "positive", "prompt"),
("easy negative", "negative", "negative_prompt"),
("easy fullLoader", "ckpt_name", "ckpt_name"),
("easy fullLoader", "vae_name", "vae_name"),
("easy fullLoader", "lora_name", "lora_name"),
("easy fullLoader", "positive", "prompt"),
("easy fullLoader", "negative", "negative_prompt"),
]
# Prompt-like fields whose value should be scanned for embedding references
PROMPT_FIELDS = {"text", "text_g", "text_l", "t5xxl", "clip_l", "positive", "negative"}
# Pattern matches: embedding:name, embedding:name.pt, embedding:name:1.2, (embedding:name:1.2)
# Word-boundary at start avoids matching things like "no_embedding:foo".
EMBEDDING_REGEX = re.compile(
r"(?:^|[\s,(\[])embedding\s*:\s*([A-Za-z0-9_\-\./\\]+?)(?:\.(?:pt|safetensors|bin))?(?=[\s:,)\(\]]|$)",
re.IGNORECASE,
)
# =============================================================================
# Cloud detection & endpoint routing
# =============================================================================
CLOUD_DOMAIN_SUFFIXES = (".comfy.org",)
CLOUD_DOMAIN_EXACT = {"cloud.comfy.org"}
def is_cloud_host(host: str) -> bool:
"""True if the host points at Comfy Cloud (or staging/preview subdomain)."""
parsed = urlparse(host if "://" in host else f"http://{host}")
hostname = (parsed.hostname or "").lower()
if hostname in CLOUD_DOMAIN_EXACT:
return True
return any(hostname.endswith(s) for s in CLOUD_DOMAIN_SUFFIXES)
def build_cloud_aware_url(base: str, path: str, *, force_cloud: bool | None = None) -> str:
"""Build a URL that adds /api prefix when targeting Comfy Cloud.
Local ComfyUI accepts both `/foo` and `/api/foo` for many endpoints.
Cloud requires `/api/foo`.
`path` should be a path component (e.g. "/prompt") or full path with query
(e.g. "/view?filename=x").
"""
base = base.rstrip("/")
cloud = is_cloud_host(base) if force_cloud is None else force_cloud
if not path.startswith("/"):
path = "/" + path
if cloud and not path.startswith("/api/"):
path = "/api" + path
return base + path
def cloud_endpoint(path: str) -> str:
"""Map a cloud endpoint path to its current canonical form.
Handles known renames documented in the Comfy Cloud API:
/history -> /history_v2
/models/<f> -> /experiment/models/<f>
/models -> /experiment/models
"""
if path.startswith("/history") and not path.startswith("/history_v2"):
return "/history_v2" + path[len("/history"):]
if path.startswith("/models/"):
return "/experiment/models/" + path[len("/models/"):]
if path == "/models":
return "/experiment/models"
return path
def resolve_url(base: str, path: str, *, is_cloud: bool | None = None) -> str:
"""Top-level URL resolver. Applies cloud rename + /api prefix as needed."""
cloud = is_cloud_host(base) if is_cloud is None else is_cloud
if cloud:
path = cloud_endpoint(path)
return build_cloud_aware_url(base, path, force_cloud=cloud)
# =============================================================================
# API key resolution
# =============================================================================
def resolve_api_key(explicit: str | None) -> str | None:
"""Look up API key from CLI flag → env var. Strips whitespace and quotes."""
val = explicit if explicit else os.environ.get(ENV_API_KEY)
if val is None:
return None
val = val.strip().strip("'\"")
return val or None
# =============================================================================
# HTTP transport
# =============================================================================
@dataclass
class HTTPResponse:
status: int
headers: dict[str, str]
body: bytes
url: str # final URL after redirects
def text(self, encoding: str = "utf-8") -> str:
return self.body.decode(encoding, errors="replace")
def json(self) -> Any:
return json.loads(self.body.decode("utf-8", errors="replace"))
def _sleep_backoff(attempt: int, base: float = RETRY_BASE_DELAY, cap: float = RETRY_MAX_DELAY) -> None:
"""Sleep with full-jitter exponential backoff."""
delay = min(cap, base * (2 ** attempt))
delay = random.uniform(0, delay)
time.sleep(delay)
def http_request(
method: str,
url: str,
*,
headers: dict[str, str] | None = None,
json_body: Any = None,
data: bytes | None = None,
files: dict | None = None,
form: dict | None = None,
timeout: float = DEFAULT_HTTP_TIMEOUT,
follow_redirects: bool = True,
retries: int = DEFAULT_RETRIES,
stream: bool = False,
sink: Path | None = None,
) -> HTTPResponse:
"""Single entry point for all HTTP traffic.
Behavior:
- Retries on connection errors and on HTTP statuses in RETRY_STATUS_CODES,
with exponential backoff + jitter.
- For cross-host redirects, drops Authorization-style headers (so signed
URLs don't leak the API key to S3/CloudFront).
- When `stream=True` and `sink` is a Path, streams the response body to
disk in 64 KiB chunks instead of buffering.
Either `json_body`, `data`, or `files`+`form` may be supplied (mutually exclusive).
"""
if headers is None:
headers = {}
headers = dict(headers) # copy
headers.setdefault("User-Agent", "hermes-comfyui-skill/5.0")
if files or form is not None:
# Multipart upload — needs `requests`. The stdlib fallback lacks
# multipart encoding helpers; raise a clear error.
if not HAS_REQUESTS:
raise RuntimeError(
"Multipart upload requires the `requests` package. "
"Install with: pip install requests"
)
last_exc: Exception | None = None
for attempt in range(retries):
try:
resp = _http_once(
method=method, url=url, headers=headers,
json_body=json_body, data=data, files=files, form=form,
timeout=timeout, follow_redirects=follow_redirects,
stream=stream, sink=sink,
)
if resp.status in RETRY_STATUS_CODES and attempt + 1 < retries:
_sleep_backoff(attempt)
continue
return resp
except (TimeoutError, ConnectionError, OSError) as e:
last_exc = e
if attempt + 1 < retries:
_sleep_backoff(attempt)
continue
raise
# Should not reach here unless retries was 0
if last_exc:
raise last_exc
raise RuntimeError("http_request: retries exhausted with no response")
_SENSITIVE_HEADERS = ("x-api-key", "authorization", "cookie")
if HAS_REQUESTS:
class _StripSensitiveOnRedirectSession(requests.Session):
"""Session that drops sensitive headers on cross-host redirects.
`requests` already strips `Authorization` cross-host (rebuild_auth),
but it does NOT strip custom headers like `X-API-Key`. We override
`rebuild_auth` to additionally strip every header in
`_SENSITIVE_HEADERS` when the destination is a different host —
critical when ComfyUI Cloud's `/api/view` redirects to a signed S3 URL.
"""
def rebuild_auth(self, prepared_request, response): # type: ignore[override]
super().rebuild_auth(prepared_request, response)
try:
old_url = response.request.url
new_url = prepared_request.url
old_host = (urlparse(old_url).hostname or "").lower()
new_host = (urlparse(new_url).hostname or "").lower()
if old_host and new_host and old_host != new_host:
headers = prepared_request.headers
for key in list(headers.keys()):
if key.lower() in _SENSITIVE_HEADERS:
del headers[key]
except Exception:
# Defensive: never let header stripping break a redirect.
pass
def _http_once(
*, method: str, url: str, headers: dict[str, str],
json_body: Any, data: bytes | None, files: dict | None, form: dict | None,
timeout: float, follow_redirects: bool,
stream: bool, sink: Path | None,
) -> HTTPResponse:
"""One HTTP attempt. No retry."""
if HAS_REQUESTS:
kwargs: dict[str, Any] = {
"method": method, "url": url, "headers": headers,
"timeout": timeout, "allow_redirects": follow_redirects,
}
if json_body is not None:
kwargs["json"] = json_body
elif data is not None:
kwargs["data"] = data
elif files is not None or form is not None:
kwargs["files"] = files
kwargs["data"] = form
if stream:
kwargs["stream"] = True
# Use the subclass that strips sensitive headers cross-host
with _StripSensitiveOnRedirectSession() as s:
try:
r = s.request(**kwargs)
if stream and sink is not None:
sink.parent.mkdir(parents=True, exist_ok=True)
with sink.open("wb") as f:
for chunk in r.iter_content(DOWNLOAD_CHUNK_SIZE):
if chunk:
f.write(chunk)
body = b"" # already drained
else:
body = r.content
return HTTPResponse(
status=r.status_code,
headers={k: v for k, v in r.headers.items()},
body=body,
url=r.url,
)
except requests.exceptions.RequestException as e:
# Convert to TimeoutError / ConnectionError so the retry loop
# picks them up uniformly with the stdlib path.
if isinstance(e, requests.exceptions.Timeout):
raise TimeoutError(str(e)) from e
raise ConnectionError(str(e)) from e
# ---------- stdlib fallback ----------
if json_body is not None:
body_bytes = json.dumps(json_body).encode("utf-8")
headers.setdefault("Content-Type", "application/json")
else:
body_bytes = data
req = urllib.request.Request(url, data=body_bytes, headers=headers, method=method)
# urllib follows redirects by default. We need to:
# 1) intercept cross-host redirects and drop X-API-Key
# 2) optionally NOT follow redirects when follow_redirects=False
class _RedirectHandler(urllib.request.HTTPRedirectHandler):
def __init__(self, original_host: str, follow: bool):
self.original_host = original_host
self.follow = follow
def redirect_request(self, req2, fp, code, msg, hdrs, newurl):
if not self.follow:
return None
new_host = (urlparse(newurl).hostname or "").lower()
if new_host != self.original_host:
# Build a new request with cleaned headers
clean_headers = {
k: v for k, v in req2.header_items()
if k.lower() not in ("x-api-key", "authorization", "cookie")
}
new_req = urllib.request.Request(newurl, headers=clean_headers, method="GET")
return new_req
return super().redirect_request(req2, fp, code, msg, hdrs, newurl)
original_host = (urlparse(url).hostname or "").lower()
opener = urllib.request.build_opener(_RedirectHandler(original_host, follow_redirects))
try:
resp = opener.open(req, timeout=timeout)
except urllib.error.HTTPError as e:
return HTTPResponse(
status=e.code,
headers=dict(e.headers) if e.headers else {},
body=e.read() or b"",
url=getattr(e, "url", url),
)
final_url = resp.geturl()
final_status = resp.status
final_headers = dict(resp.headers)
if stream and sink is not None:
sink.parent.mkdir(parents=True, exist_ok=True)
with sink.open("wb") as f:
while True:
chunk = resp.read(DOWNLOAD_CHUNK_SIZE)
if not chunk:
break
f.write(chunk)
return HTTPResponse(status=final_status, headers=final_headers, body=b"", url=final_url)
return HTTPResponse(status=final_status, headers=final_headers, body=resp.read(), url=final_url)
def http_get(url: str, **kwargs: Any) -> HTTPResponse:
return http_request("GET", url, **kwargs)
def http_post(url: str, **kwargs: Any) -> HTTPResponse:
return http_request("POST", url, **kwargs)
# =============================================================================
# Workflow validation & helpers
# =============================================================================
def is_api_format(workflow: Any) -> bool:
"""API format = top-level dict where each value has `class_type`."""
if not isinstance(workflow, dict):
return False
if "nodes" in workflow and "links" in workflow:
return False
for v in workflow.values():
if isinstance(v, dict) and "class_type" in v:
return True
return False
def unwrap_workflow(payload: Any) -> dict:
"""Unwrap common wrapper variants. Returns API-format workflow or raises ValueError."""
if isinstance(payload, dict) and is_api_format(payload):
return payload
# Some files wrap workflow under "prompt" key (e.g. saved /prompt payloads)
if isinstance(payload, dict) and "prompt" in payload and is_api_format(payload["prompt"]):
return payload["prompt"]
# Editor format
if isinstance(payload, dict) and "nodes" in payload and "links" in payload:
raise ValueError(
"Workflow is in editor format (has top-level 'nodes' and 'links' arrays). "
"Re-export from ComfyUI using 'Workflow → Export (API)' (newer UI) "
"or 'Save (API Format)' (older UI)."
)
raise ValueError(
"Workflow is not in API format. Each top-level entry must have a 'class_type' field."
)
def is_link(value: Any) -> bool:
"""True if `value` is a [node_id, output_index] connection (length-2 list)."""
return (
isinstance(value, list)
and len(value) == 2
and isinstance(value[0], str)
and isinstance(value[1], int)
)
def iter_nodes(workflow: dict) -> Iterator[tuple[str, dict]]:
"""Yield (node_id, node) for each valid API-format node."""
for node_id, node in workflow.items():
if isinstance(node, dict) and "class_type" in node:
yield node_id, node
def iter_model_deps(workflow: dict) -> Iterator[dict]:
"""Yield {node_id, class_type, field, value, folder} for each model dependency."""
for node_id, node in iter_nodes(workflow):
cls = node["class_type"]
if cls not in MODEL_LOADERS:
continue
inputs = node.get("inputs", {}) or {}
for field_name, folder in MODEL_LOADERS[cls]:
val = inputs.get(field_name)
if val and isinstance(val, str) and not is_link(val):
yield {
"node_id": node_id,
"class_type": cls,
"field": field_name,
"value": val,
"folder": folder,
}
def iter_embedding_refs(workflow: dict) -> Iterator[tuple[str, str]]:
"""Yield (node_id, embedding_name) for every embedding mention in prompts."""
for node_id, node in iter_nodes(workflow):
inputs = node.get("inputs", {}) or {}
for field_name, val in inputs.items():
if field_name not in PROMPT_FIELDS:
continue
if not isinstance(val, str):
continue
for m in EMBEDDING_REGEX.finditer(val):
yield node_id, m.group(1)
# =============================================================================
# Path safety
# =============================================================================
def safe_path_join(base: Path, *parts: str) -> Path:
"""Join paths, raising if the result escapes `base`.
Server-supplied filenames may contain `../` etc. This guards against
path-traversal attacks when downloading outputs.
"""
base_resolved = base.resolve()
candidate = base.joinpath(*parts).resolve()
try:
candidate.relative_to(base_resolved)
except ValueError as e:
raise ValueError(
f"Refusing path traversal: {candidate} is outside {base_resolved}"
) from e
return candidate
def media_type_from_filename(filename: str) -> str:
ext = Path(filename).suffix.lower()
if ext in (".mp4", ".webm", ".avi", ".mov", ".mkv", ".gif", ".webp"):
return "video"
if ext in (".wav", ".mp3", ".flac", ".ogg", ".m4a"):
return "audio"
if ext in (".glb", ".obj", ".ply", ".gltf"):
return "3d"
if ext in (".json", ".txt", ".md"):
return "text"
return "image"
def looks_like_video_workflow(workflow: dict) -> bool:
"""Used to bump default timeout for video workflows."""
for _, node in iter_nodes(workflow):
if node["class_type"] in SLOW_OUTPUT_NODES:
return True
if node["class_type"].lower().startswith(("animatediff", "ade_", "wanvideo", "hunyuanvideo", "ltxvideo", "cogvideo")):
return True
return False
# =============================================================================
# Seed handling
# =============================================================================
# ComfyUI's max seed range. Many UIs treat `-1` as "randomize on submit".
SEED_MAX = 2**63 - 1
SEED_MIN = 0
def coerce_seed(value: Any) -> int:
"""Convert -1 or None to a fresh random seed; otherwise return int(value).
Accepts numeric -1 OR string "-1" (both treated as "randomize"). Other
parse failures raise TypeError/ValueError for the caller to surface.
"""
if value is None:
return random.randint(SEED_MIN, SEED_MAX)
# Stringly-typed -1 from CLI / JSON should also randomize
if isinstance(value, str) and value.strip() == "-1":
return random.randint(SEED_MIN, SEED_MAX)
if value == -1:
return random.randint(SEED_MIN, SEED_MAX)
return int(value)
# =============================================================================
# Cloud model-list normalization
# =============================================================================
def parse_model_list(payload: Any) -> set[str]:
"""Normalize model-list responses from local ComfyUI vs Comfy Cloud.
Local: `["a.safetensors", "b.safetensors"]`
Cloud: `[{"name": "a.safetensors", "pathIndex": 0}, ...]`
"""
if not isinstance(payload, list):
return set()
out: set[str] = set()
for item in payload:
if isinstance(item, str):
out.add(item)
elif isinstance(item, dict):
name = item.get("name") or item.get("filename") or item.get("path")
if isinstance(name, str):
out.add(name)
return out
# =============================================================================
# Misc utilities
# =============================================================================
def new_client_id() -> str:
return str(uuid.uuid4())
def fmt_kv(d: dict) -> str:
"""Pretty key=value for log lines."""
return " ".join(f"{k}={v!r}" for k, v in d.items())
def emit_json(obj: Any, *, indent: int = 2) -> None:
"""Print JSON to stdout. Centralised so behavior can be tweaked (e.g., --raw)."""
print(json.dumps(obj, indent=indent, default=str))
def log(msg: str) -> None:
"""stderr log with consistent prefix (so JSON stdout stays clean)."""
print(f"[comfyui-skill] {msg}", file=sys.stderr)
@@ -0,0 +1,225 @@
#!/usr/bin/env python3
"""
auto_fix_deps.py — Run check_deps.py, then attempt to install whatever is missing.
For local servers:
- Missing custom nodes → `comfy node install <package>`
- Missing models → `comfy model download` (only if a URL is supplied via
--model-source-file or detected via well-known names)
For cloud: prints what would be needed but cannot install (cloud preinstalls
custom nodes and most models server-side; if something genuinely isn't there,
ask Comfy support).
This is conservative: it never installs without an explicit URL for models
(downloading the wrong model is hard to undo). Custom nodes from the registry
are auto-installed by name.
Usage:
python3 auto_fix_deps.py workflow_api.json
python3 auto_fix_deps.py workflow_api.json --models-from-file urls.json
python3 auto_fix_deps.py workflow_api.json --dry-run
"""
from __future__ import annotations
import argparse
import json
import shutil
import subprocess
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY, emit_json, log, resolve_api_key,
)
from check_deps import check_deps # noqa: E402
from _common import unwrap_workflow # noqa: E402
def comfy_cli_available() -> str | None:
"""Return command prefix for comfy-cli, or None."""
if shutil.which("comfy"):
return "comfy"
if shutil.which("uvx"):
return "uvx --from comfy-cli comfy"
return None
def run_cmd(cmd: list[str], *, dry_run: bool = False) -> tuple[int, str]:
if dry_run:
return 0, "[dry-run]"
log(f"$ {' '.join(cmd)}")
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
out = (proc.stdout or "") + (proc.stderr or "")
return proc.returncode, out
def install_node(package: str, *, dry_run: bool = False, comfy_cmd: str = "comfy") -> bool:
cmd = comfy_cmd.split() + ["--skip-prompt", "node", "install", package]
code, _ = run_cmd(cmd, dry_run=dry_run)
return code == 0
def install_model(url: str, folder: str, filename: str | None = None,
*, dry_run: bool = False, comfy_cmd: str = "comfy",
hf_token: str | None = None, civitai_token: str | None = None) -> bool:
cmd = comfy_cmd.split() + [
"--skip-prompt", "model", "download",
"--url", url,
"--relative-path", f"models/{folder}",
]
if filename:
cmd.extend(["--filename", filename])
if hf_token:
cmd.extend(["--set-hf-api-token", hf_token])
if civitai_token:
cmd.extend(["--set-civitai-api-token", civitai_token])
code, _ = run_cmd(cmd, dry_run=dry_run)
return code == 0
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="Run check_deps and install whatever is missing")
p.add_argument("workflow")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST)
p.add_argument("--api-key", help=f"or set ${ENV_API_KEY}")
p.add_argument("--models-from-file",
help="JSON file mapping {model_filename: download_url} for models that need install")
p.add_argument("--hf-token", help="HuggingFace token for downloads")
p.add_argument("--civitai-token", help="CivitAI token for downloads")
p.add_argument("--dry-run", action="store_true",
help="Show what would be installed without doing it")
p.add_argument("--no-restart", action="store_true",
help="Don't suggest restarting the server after node install")
args = p.parse_args(argv)
api_key = resolve_api_key(args.api_key)
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
emit_json({"error": f"Workflow not found: {args.workflow}"})
return 1
try:
with wf_path.open() as f:
workflow = unwrap_workflow(json.load(f))
except (ValueError, json.JSONDecodeError) as e:
emit_json({"error": str(e)})
return 1
report = check_deps(workflow, host=args.host, api_key=api_key)
if report["is_ready"]:
emit_json({"status": "ready", "report": report})
return 0
if report["is_cloud"]:
emit_json({
"status": "cannot_fix_cloud",
"reason": "Comfy Cloud preinstalls nodes; if something is genuinely missing, contact support.",
"report": report,
})
return 1
comfy_cmd = comfy_cli_available()
if not comfy_cmd:
emit_json({
"status": "cannot_fix",
"reason": "comfy-cli not on PATH; install with `pip install comfy-cli` or `pipx install comfy-cli`",
"report": report,
})
return 1
actions: list[dict] = []
failures: list[dict] = []
# ---- Install missing custom nodes ----
seen_packages: set[str] = set()
for entry in report["missing_nodes"]:
cmd = entry.get("fix_command", "")
if cmd.startswith("comfy node install "):
package = cmd.split(" ")[-1]
if package in seen_packages:
continue
seen_packages.add(package)
ok = install_node(package, dry_run=args.dry_run, comfy_cmd=comfy_cmd)
(actions if ok else failures).append({
"kind": "node", "package": package, "node_class": entry["class_type"],
"ok": ok,
})
else:
failures.append({
"kind": "node", "node_class": entry["class_type"],
"ok": False, "reason": "No registry mapping known. " + entry.get("fix_hint", ""),
})
# ---- Install missing models (only when URL provided) ----
sources: dict[str, str] = {}
if args.models_from_file:
try:
sources = json.loads(Path(args.models_from_file).read_text())
except (OSError, json.JSONDecodeError) as e:
log(f"Could not read --models-from-file: {e}")
for entry in report["missing_models"]:
filename = entry["value"]
url = sources.get(filename)
if not url:
failures.append({
"kind": "model", "filename": filename, "folder": entry["folder"],
"ok": False, "reason": "No URL provided in --models-from-file. "
"Refusing to guess.",
})
continue
ok = install_model(
url, entry["folder"], filename,
dry_run=args.dry_run, comfy_cmd=comfy_cmd,
hf_token=args.hf_token, civitai_token=args.civitai_token,
)
(actions if ok else failures).append({
"kind": "model", "filename": filename, "folder": entry["folder"],
"url": url, "ok": ok,
})
# ---- Embeddings ----
for entry in report["missing_embeddings"]:
emb_name = entry["embedding_name"]
# Try common extensions in user-supplied source map
url = (sources.get(f"{emb_name}.pt")
or sources.get(f"{emb_name}.safetensors")
or sources.get(emb_name))
if not url:
failures.append({
"kind": "embedding", "name": emb_name,
"ok": False, "reason": "No URL provided in --models-from-file.",
})
continue
target_filename = (
f"{emb_name}.safetensors" if url.endswith(".safetensors")
else f"{emb_name}.pt"
)
ok = install_model(
url, "embeddings", target_filename,
dry_run=args.dry_run, comfy_cmd=comfy_cmd,
hf_token=args.hf_token, civitai_token=args.civitai_token,
)
(actions if ok else failures).append({
"kind": "embedding", "name": emb_name, "url": url, "ok": ok,
})
needs_restart = any(a["kind"] == "node" and a.get("ok") for a in actions)
emit_json({
"status": "fixed" if not failures else "partial",
"actions_taken": actions,
"failures": failures,
"needs_server_restart": needs_restart and not args.no_restart,
"restart_hint": "comfy stop && comfy launch --background",
"dry_run": args.dry_run,
})
return 0 if not failures else 1
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,437 @@
#!/usr/bin/env python3
"""
check_deps.py — Verify a ComfyUI workflow's dependencies (custom nodes, models,
embeddings) against a running server.
Improvements over v1:
- Cloud-aware endpoint mapping (handles `/api/experiment/models/{folder}` and
`/api/object_info` variants verified against live cloud API)
- Distinguishes 200-empty (genuinely no models in folder) vs 404
(folder doesn't exist) vs 403 (auth/tier issue) — no silent passes
- Outputs concrete remediation commands (e.g. `comfy node install <name>`)
when nodes are missing
- Detects embedding references inside prompt strings as model deps
- Skips check on cloud free tier `/api/object_info` (403) without false alarm
- Accepts API key from CLI flag OR $COMFY_CLOUD_API_KEY env var
Usage:
python3 check_deps.py workflow_api.json
python3 check_deps.py workflow_api.json --host 127.0.0.1 --port 8188
python3 check_deps.py workflow_api.json --host https://cloud.comfy.org
Stdlib-only. Python 3.10+.
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY,
emit_json, folder_aliases_for, http_get, is_cloud_host,
iter_embedding_refs, iter_model_deps, iter_nodes, parse_model_list,
resolve_api_key, resolve_url, unwrap_workflow,
)
# Known node → custom-node-package map. When a workflow needs a node we don't
# recognize, suggesting the right `comfy node install ...` makes the difference
# between a working agent and a stuck one.
NODE_TO_PACKAGE: dict[str, str] = {
# rgthree (Reroute is JS-only and doesn't appear in /object_info)
"Power Lora Loader (rgthree)": "rgthree-comfy",
"Image Comparer (rgthree)": "rgthree-comfy",
"Seed (rgthree)": "rgthree-comfy",
"Display Any (rgthree)": "rgthree-comfy",
"Display Int (rgthree)": "rgthree-comfy",
# Impact pack
"FaceDetailer": "comfyui-impact-pack",
"DetailerForEach": "comfyui-impact-pack",
"BboxDetectorSEGS": "comfyui-impact-pack",
"SAMLoader": "comfyui-impact-pack",
"ImpactWildcardProcessor": "comfyui-impact-pack",
# Impact subpack (separate package)
"UltralyticsDetectorProvider": "comfyui-impact-subpack",
# Was Node Suite
"Image Save": "was-node-suite-comfyui",
"Number Counter": "was-node-suite-comfyui",
"Text String": "was-node-suite-comfyui",
# easy-use
"easy fullLoader": "comfyui-easy-use",
"easy positive": "comfyui-easy-use",
"easy negative": "comfyui-easy-use",
"easy seed": "comfyui-easy-use",
"easy imageSave": "comfyui-easy-use",
# Video Helper Suite
"VHS_VideoCombine": "comfyui-videohelpersuite",
"VHS_LoadVideo": "comfyui-videohelpersuite",
"VHS_LoadAudio": "comfyui-videohelpersuite",
# AnimateDiff
"ADE_AnimateDiffLoaderWithContext": "comfyui-animatediff-evolved",
"ADE_AnimateDiffLoaderGen1": "comfyui-animatediff-evolved",
"ADE_LoadAnimateDiffModel": "comfyui-animatediff-evolved",
# ControlNet aux preprocessors (full class names)
"CannyEdgePreprocessor": "comfyui_controlnet_aux",
"DWPreprocessor": "comfyui_controlnet_aux",
"OpenposePreprocessor": "comfyui_controlnet_aux",
"DepthAnythingPreprocessor": "comfyui_controlnet_aux",
"Zoe_DepthAnythingPreprocessor": "comfyui_controlnet_aux",
"AnimalPosePreprocessor": "comfyui_controlnet_aux",
# IPAdapter Plus
"IPAdapterAdvanced": "comfyui_ipadapter_plus",
"IPAdapterUnifiedLoader": "comfyui_ipadapter_plus",
"IPAdapterModelLoader": "comfyui_ipadapter_plus",
"IPAdapterInsightFaceLoader": "comfyui_ipadapter_plus",
# InstantID
"InstantIDModelLoader": "comfyui_instantid",
"ApplyInstantID": "comfyui_instantid",
# Comfy essentials (note: registry slug uses underscore, not hyphen)
"GetImageSize+": "comfyui_essentials",
"ImageBatchMultiple+": "comfyui_essentials",
# pysssss
"ShowText|pysssss": "comfyui-custom-scripts",
"PreviewImage|pysssss": "comfyui-custom-scripts",
# SUPIR
"SUPIR_Upscale": "comfyui-supir",
"SUPIR_first_stage": "comfyui-supir",
# GGUF (case-sensitive registry slug)
"UNETLoaderGGUF": "ComfyUI-GGUF",
"DualCLIPLoaderGGUF": "ComfyUI-GGUF",
# Florence2
"Florence2Run": "comfyui-florence2",
# WAS
"Image Filter Adjustments": "was-node-suite-comfyui",
# Photomaker (case-sensitive)
"PhotoMakerLoader": "ComfyUI-PhotoMaker-Plus",
# Wan video (case-sensitive)
"WanVideoSampler": "ComfyUI-WanVideoWrapper",
"WanVideoModelLoader": "ComfyUI-WanVideoWrapper",
}
# Nodes whose package isn't on the comfy registry — need git-URL install via
# ComfyUI-Manager. We surface a helpful hint instead of an unrunnable command.
NODE_TO_GIT_URL: dict[str, str] = {
"HunyuanVideoSampler": "https://github.com/kijai/ComfyUI-HunyuanVideoWrapper",
"HunyuanVideoModelLoader": "https://github.com/kijai/ComfyUI-HunyuanVideoWrapper",
}
def fetch_object_info(url: str, headers: dict) -> tuple[set[str] | None, dict | None]:
"""Returns (installed_node_set, error_info). Error info is a dict if we
couldn't query (e.g. cloud free tier), else None.
"""
r = http_get(url, headers=headers, retries=2, timeout=30)
if r.status == 200:
try:
data = r.json()
if isinstance(data, dict):
return set(data.keys()), None
except Exception:
pass
return None, {"http_status": 200, "reason": "non-dict response"}
if r.status == 403:
try:
body = r.json()
except Exception:
body = {"raw": r.text()[:200]}
return None, {"http_status": 403, "reason": "forbidden", "body": body}
if r.status == 404:
return None, {"http_status": 404, "reason": "endpoint not found"}
return None, {"http_status": r.status, "reason": "unexpected", "body": r.text()[:200]}
def _fetch_one_folder(
base: str, folder: str, headers: dict, *, is_cloud: bool,
) -> tuple[set[str] | None, dict | None]:
"""Single-folder fetch, no aliasing. Returns (installed_set, error_info)."""
url = resolve_url(base, f"/models/{folder}", is_cloud=is_cloud)
r = http_get(url, headers=headers, retries=2, timeout=30)
if r.status == 200:
try:
return parse_model_list(r.json()), None
except Exception:
return set(), {"http_status": 200, "reason": "non-list response"}
if r.status == 404:
body_text = r.text()
try:
body = r.json()
except Exception:
body = {"raw": body_text[:200]}
code = body.get("code") if isinstance(body, dict) else None
if code == "folder_not_found":
# Folder is genuinely empty/missing on server — not the same as
# "endpoint missing". Return empty set with informational error.
return set(), {"http_status": 404, "reason": "folder_empty_or_unknown", "body": body}
return None, {"http_status": 404, "reason": "endpoint not found", "body": body}
if r.status == 403:
try:
body = r.json()
except Exception:
body = {}
return None, {"http_status": 403, "reason": "forbidden", "body": body}
return None, {"http_status": r.status, "reason": "unexpected"}
def fetch_models_for_folder(
base: str, folder: str, headers: dict, *, is_cloud: bool,
) -> tuple[set[str] | None, dict | None]:
"""Fetch installed models for a folder, trying aliases.
Folder renames over time (e.g. unet → diffusion_models, clip → text_encoders)
mean a workflow asking for a model in `unet` may need to look in
`diffusion_models`. We union models from every reachable alias.
Returns (combined_set | None, last_error | None).
"""
aliases = folder_aliases_for(folder)
combined: set[str] = set()
any_success = False
last_err: dict | None = None
for alias in aliases:
models, err = _fetch_one_folder(base, alias, headers, is_cloud=is_cloud)
if models is not None:
combined.update(models)
any_success = True
last_err = None
else:
last_err = err
if not any_success:
return None, last_err
return combined, None
def fetch_embeddings(base: str, headers: dict, *, is_cloud: bool) -> tuple[set[str] | None, dict | None]:
"""Local ComfyUI exposes /embeddings; cloud uses /experiment/models/embeddings."""
if is_cloud:
return fetch_models_for_folder(base, "embeddings", headers, is_cloud=True)
# Local: dedicated /embeddings returns a flat list of names
r = http_get(resolve_url(base, "/embeddings", is_cloud=False), headers=headers, retries=2)
if r.status == 200:
try:
data = r.json()
if isinstance(data, list):
# Strip extensions from the registered names since prompt syntax
# usually omits them ("embedding:goodvibes" vs "goodvibes.pt")
names = set()
for n in data:
if isinstance(n, str):
names.add(n)
# Also store stem for fuzzy matching
names.add(Path(n).stem)
return names, None
except Exception:
pass
return None, {"http_status": r.status, "reason": "unexpected"}
def normalize_for_match(name: str) -> set[str]:
"""Generate matching variants of a model name (with/without extension, slashes, etc.)"""
s = {name}
s.add(Path(name).stem)
s.add(Path(name).name)
# ComfyUI sometimes strips/keeps the leading folder
if "/" in name or "\\" in name:
flat = name.replace("\\", "/").split("/")[-1]
s.add(flat)
s.add(Path(flat).stem)
return {x for x in s if x}
def model_present(needed: str, installed: set[str]) -> bool:
if not installed:
return False
needed_variants = normalize_for_match(needed)
installed_norm: set[str] = set()
for inst in installed:
installed_norm.update(normalize_for_match(inst))
return bool(needed_variants & installed_norm)
def suggest_install_command(node_class: str) -> str | None:
pkg = NODE_TO_PACKAGE.get(node_class)
if pkg:
return f"comfy node install {pkg}"
return None
def suggest_git_url(node_class: str) -> str | None:
"""For nodes not on the registry, return a git URL the user can hand to
ComfyUI-Manager's `/manager/queue/install` endpoint."""
return NODE_TO_GIT_URL.get(node_class)
def check_deps(
workflow: dict, host: str, *, api_key: str | None = None,
) -> dict:
headers: dict[str, str] = {}
if api_key:
headers["X-API-Key"] = api_key
is_cloud = is_cloud_host(host)
base = host.rstrip("/")
# ---- 1. Required nodes ----
required_nodes: set[str] = set()
for _, node in iter_nodes(workflow):
required_nodes.add(node["class_type"])
object_info_url = resolve_url(base, "/object_info", is_cloud=is_cloud)
installed_nodes, obj_err = fetch_object_info(object_info_url, headers)
missing_nodes: list[dict] = []
node_check_skipped = False
if installed_nodes is None:
# Couldn't query (e.g. cloud free tier). Don't false-alarm; mark skipped.
node_check_skipped = True
else:
for cls in sorted(required_nodes):
if cls not in installed_nodes:
entry = {"class_type": cls}
cmd = suggest_install_command(cls)
git_url = suggest_git_url(cls)
if cmd:
entry["fix_command"] = cmd
elif git_url:
entry["fix_git_url"] = git_url
entry["fix_hint"] = (
f"Not on registry. Install via Manager with this git URL: {git_url}"
)
else:
entry["fix_hint"] = (
"Search https://registry.comfy.org or "
"use ComfyUI-Manager UI to find the package providing this node."
)
missing_nodes.append(entry)
# ---- 2. Required models ----
model_cache: dict[str, tuple[set[str] | None, dict | None]] = {}
missing_models: list[dict] = []
folder_errors: dict[str, dict] = {}
for dep in iter_model_deps(workflow):
folder = dep["folder"]
if folder not in model_cache:
model_cache[folder] = fetch_models_for_folder(
base, folder, headers, is_cloud=is_cloud,
)
installed, err = model_cache[folder]
if installed is None:
# Couldn't enumerate this folder — record once
folder_errors.setdefault(folder, err or {})
# Don't flag as missing (we don't know); the folder_errors block surfaces this
continue
if not model_present(dep["value"], installed):
entry = dict(dep)
entry["fix_hint"] = (
f"comfy model download --url <URL> --relative-path models/{folder} "
f"--filename {dep['value']!r}"
)
missing_models.append(entry)
# ---- 3. Embedding refs in prompts ----
emb_installed, emb_err = fetch_embeddings(base, headers, is_cloud=is_cloud)
missing_embeddings: list[dict] = []
seen_emb: set[tuple[str, str]] = set()
for nid, emb_name in iter_embedding_refs(workflow):
if (nid, emb_name) in seen_emb:
continue
seen_emb.add((nid, emb_name))
if emb_installed is None:
# Couldn't enumerate — skip silently here, surface the error in the
# folder_errors block
continue
if not model_present(emb_name, emb_installed):
missing_embeddings.append({
"node_id": nid,
"embedding_name": emb_name,
"folder": "embeddings",
"fix_hint": (
f"Download {emb_name}.pt or .safetensors and place in "
f"models/embeddings/, or `comfy model download --url <URL> "
f"--relative-path models/embeddings`"
),
})
if emb_err and emb_installed is None:
folder_errors.setdefault("embeddings", emb_err)
is_ready = (
not node_check_skipped
and not missing_nodes
and not missing_models
and not missing_embeddings
)
return {
"is_ready": is_ready,
"node_check_skipped": node_check_skipped,
"node_check_skip_reason": obj_err if node_check_skipped else None,
"missing_nodes": missing_nodes,
"missing_models": missing_models,
"missing_embeddings": missing_embeddings,
"folder_errors": folder_errors,
# 0 is a legitimate count (e.g. empty server). Use None only when not queried.
"installed_node_count": len(installed_nodes) if installed_nodes is not None else None,
"required_node_count": len(required_nodes),
"required_nodes": sorted(required_nodes),
"host": base,
"is_cloud": is_cloud,
}
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="Check ComfyUI workflow dependencies against a running server")
p.add_argument("workflow", help="Path to workflow API JSON file")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST, help="ComfyUI server URL")
p.add_argument("--port", type=int, help="Server port (overrides --host port)")
p.add_argument("--api-key", help=f"API key for cloud (or set ${ENV_API_KEY} env var)")
p.add_argument("--strict", action="store_true",
help="Exit non-zero if node check is skipped (e.g. on cloud free tier)")
args = p.parse_args(argv)
host = args.host
if args.port is not None:
# Strip any port from host and append --port
from urllib.parse import urlparse, urlunparse
parsed = urlparse(host if "://" in host else f"http://{host}")
new_netloc = f"{parsed.hostname}:{args.port}"
host = urlunparse(parsed._replace(netloc=new_netloc))
api_key = resolve_api_key(args.api_key)
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
emit_json({"error": f"Workflow file not found: {args.workflow}"})
return 1
try:
with wf_path.open() as f:
payload = json.load(f)
workflow = unwrap_workflow(payload)
except ValueError as e:
emit_json({"error": str(e)})
return 1
except json.JSONDecodeError as e:
emit_json({"error": f"Invalid JSON: {e}"})
return 1
try:
result = check_deps(workflow, host=host, api_key=api_key)
except Exception as e:
emit_json({"error": f"Dep check failed: {e}", "host": host})
return 1
emit_json(result)
if not result["is_ready"]:
return 1
if args.strict and result["node_check_skipped"]:
return 1
return 0
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,286 @@
#!/usr/bin/env bash
# ComfyUI Setup — Install, launch, and verify using the official comfy-cli.
#
# Improvements over v1:
# - Prefers `pipx` / `uvx` over global `pip install` (avoids polluting system Python)
# - Idempotent: detects already-running server and skips re-launch
# - Configurable port via --port=N (default 8188)
# - Configurable workspace via --workspace=PATH
# - Persistent log file in /tmp/comfyui_setup.<pid>.log for debugging
# - SIGINT trap cleans up partial state
# - Refuses local install when hardware_check.py verdict is "cloud"
# - Forwards extra flags to comfy-cli (e.g. --cuda-version=12.4)
#
# Usage:
# bash scripts/comfyui_setup.sh
# (auto-detects GPU; uses recommendation from hardware_check.py)
# bash scripts/comfyui_setup.sh --nvidia
# bash scripts/comfyui_setup.sh --m-series --port=8190
# bash scripts/comfyui_setup.sh --amd --workspace=/data/comfy
#
# Flags:
# --nvidia | --amd | --m-series | --cpu GPU selection (skips hw check)
# --port=N HTTP port (default 8188)
# --workspace=PATH ComfyUI install location
# --skip-launch Install only, don't start server
# --force-cloud-override Install locally even if hw says cloud
# -- Pass remaining args to `comfy install`
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HARDWARE_CHECK="$SCRIPT_DIR/hardware_check.py"
LOG_FILE="/tmp/comfyui_setup.$$.log"
PORT=8188
WORKSPACE=""
GPU_FLAG=""
SKIP_LAUNCH=0
FORCE_CLOUD_OVERRIDE=0
EXTRA_INSTALL_ARGS=()
cleanup() {
local exit_code=$?
if [ $exit_code -ne 0 ]; then
echo "==> Setup exited with status $exit_code. Log: $LOG_FILE" >&2
fi
exit $exit_code
}
trap cleanup EXIT INT TERM
log() { echo "==> $*" | tee -a "$LOG_FILE" >&2; }
err() { echo "ERROR: $*" | tee -a "$LOG_FILE" >&2; }
# --- Argument parsing ---
PASSTHROUGH=0
for arg in "$@"; do
if [ "$PASSTHROUGH" -eq 1 ]; then
EXTRA_INSTALL_ARGS+=("$arg")
continue
fi
case "$arg" in
--nvidia|--amd|--m-series|--cpu)
GPU_FLAG="$arg"
;;
--port=*)
PORT="${arg#*=}"
;;
--workspace=*)
WORKSPACE="${arg#*=}"
;;
--skip-launch)
SKIP_LAUNCH=1
;;
--force-cloud-override)
FORCE_CLOUD_OVERRIDE=1
;;
--)
PASSTHROUGH=1
;;
--help|-h)
# Print the leading comment block, stripping the `# ` prefix.
# Stops at the first blank line which separates docs from code.
awk '
NR == 1 { next } # skip shebang
/^[^#]/ { exit } # stop at first non-comment line
/^$/ { exit } # ...or first blank line
{ sub(/^# ?/, ""); print }
' "$0"
exit 0
;;
*)
err "Unknown argument: $arg"
exit 64
;;
esac
done
log "Logging to $LOG_FILE"
# --- Step 0: Hardware check (skipped if user gave an explicit GPU flag) ---
if [ -z "$GPU_FLAG" ]; then
if [ ! -f "$HARDWARE_CHECK" ]; then
log "hardware_check.py not found — defaulting to --nvidia"
GPU_FLAG="--nvidia"
else
log "Running hardware check…"
set +e
HW_JSON="$(python3 "$HARDWARE_CHECK" --json 2>>"$LOG_FILE")"
HW_EXIT=$?
set -e
if [ -z "$HW_JSON" ]; then
err "hardware_check.py produced no output (exit $HW_EXIT). Pass an explicit flag."
exit 1
fi
echo "$HW_JSON" | tee -a "$LOG_FILE" >&2
VERDICT="$(echo "$HW_JSON" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("verdict",""))')"
FLAG="$(echo "$HW_JSON" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("comfy_cli_flag") or "")')"
if [ "$VERDICT" = "cloud" ] && [ "$FORCE_CLOUD_OVERRIDE" -ne 1 ]; then
log ""
log "Hardware check: this machine is not suitable for local ComfyUI."
log "Recommended: Comfy Cloud — https://platform.comfy.org"
log ""
log "To override and force a local install, re-run with --force-cloud-override"
log "or pass an explicit GPU flag (--nvidia|--amd|--m-series|--cpu)."
exit 2
fi
if [ "$VERDICT" = "marginal" ]; then
log "Hardware check: verdict is MARGINAL."
log " SD1.5 should work; SDXL/Flux may be slow or OOM."
log " Consider Comfy Cloud for heavier workflows: https://platform.comfy.org"
fi
if [ -z "$FLAG" ]; then
log "hardware_check could not pick a comfy-cli flag. Defaulting to --nvidia."
log "(For Intel Arc or unsupported hardware, use the manual install path.)"
GPU_FLAG="--nvidia"
else
GPU_FLAG="$FLAG"
fi
fi
fi
log "GPU flag: $GPU_FLAG"
log "Port: $PORT"
[ -n "$WORKSPACE" ] && log "Workspace: $WORKSPACE"
[ "${#EXTRA_INSTALL_ARGS[@]}" -gt 0 ] && log "Extra install args: ${EXTRA_INSTALL_ARGS[*]}"
# --- Step 1: Install comfy-cli (prefer pipx / uvx over global pip) ---
COMFY_BIN=""
if command -v comfy >/dev/null 2>&1; then
COMFY_BIN="comfy"
log "comfy-cli already on PATH: $(comfy -v 2>/dev/null || echo 'unknown version')"
elif command -v uvx >/dev/null 2>&1; then
log "Using uvx (no install needed)"
COMFY_BIN="uvx --from comfy-cli comfy"
elif command -v pipx >/dev/null 2>&1; then
log "Installing comfy-cli via pipx…"
pipx install comfy-cli >>"$LOG_FILE" 2>&1
COMFY_BIN="comfy"
# pipx adds shims to ~/.local/bin which may need to be on PATH
if ! command -v comfy >/dev/null 2>&1; then
if [ -x "$HOME/.local/bin/comfy" ]; then
export PATH="$HOME/.local/bin:$PATH"
COMFY_BIN="$HOME/.local/bin/comfy"
fi
fi
else
log "Neither pipx nor uvx found. Falling back to pip install --user…"
log " (Recommend installing pipx: https://pipx.pypa.io)"
if ! pip install --user comfy-cli >>"$LOG_FILE" 2>&1; then
# macOS: PEP 668 externally-managed-environment may block --user
log "pip install --user failed. Retrying with --break-system-packages…"
pip install --user --break-system-packages comfy-cli >>"$LOG_FILE" 2>&1 || {
err "Could not install comfy-cli. Install pipx or uv first."
exit 1
}
fi
# Resolve the actual `comfy` script — pip --user puts it in:
# Linux: ~/.local/bin/comfy
# macOS: ~/Library/Python/<ver>/bin/comfy OR ~/.local/bin/comfy
COMFY_BIN=""
for candidate in "$HOME/.local/bin/comfy" \
"$HOME/Library/Python/3.13/bin/comfy" \
"$HOME/Library/Python/3.12/bin/comfy" \
"$HOME/Library/Python/3.11/bin/comfy" \
"$HOME/Library/Python/3.10/bin/comfy"; do
if [ -x "$candidate" ]; then
COMFY_BIN="$candidate"
export PATH="$(dirname "$candidate"):$PATH"
break
fi
done
if [ -z "$COMFY_BIN" ]; then
if command -v comfy >/dev/null 2>&1; then
COMFY_BIN="comfy"
else
err "Installed comfy-cli but couldn't find the 'comfy' script."
err "Add the right Python user-bin directory to PATH and retry."
exit 1
fi
fi
fi
# --- Step 2: Disable analytics tracking (avoid interactive prompt) ---
log "Disabling analytics tracking…"
$COMFY_BIN --skip-prompt tracking disable >>"$LOG_FILE" 2>&1 || true
# --- Step 3: Install ComfyUI ---
WORKSPACE_ARG=()
if [ -n "$WORKSPACE" ]; then
WORKSPACE_ARG=(--workspace "$WORKSPACE")
fi
if $COMFY_BIN "${WORKSPACE_ARG[@]}" which 2>/dev/null | grep -q "ComfyUI"; then
EXISTING_WS="$($COMFY_BIN "${WORKSPACE_ARG[@]}" which 2>/dev/null || true)"
log "ComfyUI already installed at: $EXISTING_WS"
else
log "Installing ComfyUI ($GPU_FLAG)…"
if ! $COMFY_BIN "${WORKSPACE_ARG[@]}" --skip-prompt install "$GPU_FLAG" "${EXTRA_INSTALL_ARGS[@]}" >>"$LOG_FILE" 2>&1; then
err "Install failed. Tail of log:"
tail -20 "$LOG_FILE" >&2
exit 1
fi
fi
if [ "$SKIP_LAUNCH" -eq 1 ]; then
log "Setup complete (--skip-launch). Run \`$COMFY_BIN launch --background -- --port $PORT\` when ready."
exit 0
fi
# --- Step 4: Detect already-running server ---
if curl -fsS "http://127.0.0.1:$PORT/system_stats" >/dev/null 2>&1; then
log "Server already running on port $PORT — skipping launch."
log "Stop with \`$COMFY_BIN stop\` if you want a fresh start."
curl -fsS "http://127.0.0.1:$PORT/system_stats" | python3 -m json.tool 2>/dev/null || true
log "Done."
exit 0
fi
# --- Step 5: Launch ---
log "Launching ComfyUI in background on port $PORT"
LAUNCH_EXTRAS=("--" "--port" "$PORT")
if ! $COMFY_BIN "${WORKSPACE_ARG[@]}" launch --background "${LAUNCH_EXTRAS[@]}" >>"$LOG_FILE" 2>&1; then
err "Background launch failed. Tail of log:"
tail -20 "$LOG_FILE" >&2
err "Try foreground launch to see real-time errors: $COMFY_BIN launch -- --port $PORT"
exit 1
fi
# --- Step 6: Wait for server ---
log "Waiting for server…"
MAX_WAIT=60
ELAPSED=0
while [ $ELAPSED -lt $MAX_WAIT ]; do
if curl -fsS "http://127.0.0.1:$PORT/system_stats" >/dev/null 2>&1; then
log "Server is running!"
curl -fsS "http://127.0.0.1:$PORT/system_stats" | python3 -m json.tool 2>/dev/null || true
break
fi
sleep 2
ELAPSED=$((ELAPSED + 2))
done
if [ $ELAPSED -ge $MAX_WAIT ]; then
err "Server did not start within ${MAX_WAIT}s."
err "Inspect log: $LOG_FILE"
err "Or run foreground: $COMFY_BIN launch -- --port $PORT"
exit 1
fi
log ""
log "Setup complete!"
log " Server: http://127.0.0.1:$PORT"
log " Web UI: http://127.0.0.1:$PORT (open in browser)"
log " Stop: $COMFY_BIN stop"
log " Log: $LOG_FILE (kept until shell closes)"
log ""
log "Next steps:"
log " - Download a model: $COMFY_BIN model download --url <URL> --relative-path models/checkpoints"
log " - Run a workflow: python3 $SCRIPT_DIR/run_workflow.py --workflow <file.json> --args '{...}'"
# Disable trap on success path
trap - EXIT
@@ -0,0 +1,315 @@
#!/usr/bin/env python3
"""
extract_schema.py — Analyze a ComfyUI API-format workflow and extract
controllable parameters.
Improvements over v1:
- Catalogs live in `_common.py`, shared with `check_deps.py`
- Coverage expanded for Flux / SD3 / Wan / Hunyuan / LTX / IPAdapter / rgthree
- Symmetric duplicate-name resolution: ALL duplicates get a node-id suffix
(instead of "first wins, second renamed"), so callers see consistent names
- Negative prompt detected by tracing `KSampler.negative` connections back to
the source CLIPTextEncode (more reliable than meta-title heuristic)
- Embedding references in prompt text are extracted as model dependencies
- Detects Primitive nodes that drive other nodes' inputs (and surfaces them
as the user-facing parameter)
- Reroutes are followed when tracing connections
Usage:
python3 extract_schema.py workflow_api.json
python3 extract_schema.py workflow_api.json --output schema.json
Stdlib-only. Python 3.10+.
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from typing import Any
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
OUTPUT_NODES, PARAM_PATTERNS, PROMPT_FIELDS,
is_link, iter_embedding_refs, iter_model_deps, iter_nodes, unwrap_workflow,
)
# Sampler nodes whose `positive` / `negative` connections we trace
SAMPLER_NODE_FAMILY = {
"KSampler", "KSamplerAdvanced",
"SamplerCustom", "SamplerCustomAdvanced",
"BasicGuider", "CFGGuider", "DualCFGGuider",
}
def infer_type(value: Any) -> str:
if isinstance(value, bool):
return "bool"
if isinstance(value, int):
return "int"
if isinstance(value, float):
return "float"
if isinstance(value, str):
return "string"
if isinstance(value, list):
return "link"
if isinstance(value, dict):
return "object"
return "unknown"
def trace_to_node(workflow: dict, link: list, *, max_hops: int = 8) -> str | None:
"""Follow a [node_id, slot] link, hopping through Reroute / Primitive nodes
if needed, to find the *upstream* node id that holds the actual value/input.
Bounded by both `max_hops` AND a visited-set to prevent infinite loops on
pathological graphs.
"""
if not is_link(link):
return None
nid: str | None = link[0]
visited: set[str] = set()
for _ in range(max_hops):
if nid is None or nid in visited:
return nid
visited.add(nid)
node = workflow.get(nid)
if not isinstance(node, dict):
return None
cls = node.get("class_type", "")
# Reroute / Primitive / passthrough wrappers
if cls in ("Reroute", "PrimitiveNode", "Note", "easy showAnything"):
inputs = node.get("inputs", {}) or {}
# Find first link-shaped input and follow it
next_link = next((v for v in inputs.values() if is_link(v)), None)
if next_link is None:
return nid
nid = next_link[0]
continue
return nid
return nid
def find_negative_prompt_node(workflow: dict) -> str | None:
"""Trace `negative` input of a sampler back to the source text encoder."""
for nid, node in iter_nodes(workflow):
if node["class_type"] not in SAMPLER_NODE_FAMILY:
continue
inputs = node.get("inputs", {}) or {}
neg = inputs.get("negative")
if not is_link(neg):
continue
src = trace_to_node(workflow, neg)
if src and isinstance(workflow.get(src), dict):
cls = workflow[src].get("class_type", "")
if cls.startswith("CLIPTextEncode") or cls in ("smZ CLIPTextEncode", "BNK_CLIPTextEncodeAdvanced"):
return src
return None
def find_positive_prompt_node(workflow: dict) -> str | None:
for nid, node in iter_nodes(workflow):
if node["class_type"] not in SAMPLER_NODE_FAMILY:
continue
inputs = node.get("inputs", {}) or {}
pos = inputs.get("positive")
if not is_link(pos):
continue
src = trace_to_node(workflow, pos)
if src and isinstance(workflow.get(src), dict):
cls = workflow[src].get("class_type", "")
if cls.startswith("CLIPTextEncode") or cls in ("smZ CLIPTextEncode", "BNK_CLIPTextEncodeAdvanced"):
return src
return None
def extract_schema(workflow: dict) -> dict:
"""Extract controllable parameters from a workflow.
Returns:
{
"parameters": { friendly_name: {node_id, field, type, value, ...} },
"output_nodes": [node_id, ...],
"model_dependencies": [{node_id, class_type, field, value, folder}],
"embedding_dependencies": [{node_id, embedding_name, found_in_field, value_excerpt}],
"summary": {...}
}
"""
output_nodes: list[str] = []
# First pass: identify positive / negative prompt nodes via connection tracing
pos_node = find_positive_prompt_node(workflow)
neg_node = find_negative_prompt_node(workflow)
# ----- collect raw parameter candidates -----
# Each candidate = (friendly_name, node_id, field, value)
# We resolve duplicate friendly_names AFTER the loop so dedup is symmetric.
raw_params: list[dict] = []
for node_id, node in iter_nodes(workflow):
cls = node["class_type"]
inputs = node.get("inputs", {}) or {}
if cls in OUTPUT_NODES:
output_nodes.append(node_id)
# Match this node against PARAM_PATTERNS
for p_class, p_field, friendly in PARAM_PATTERNS:
if cls != p_class:
continue
if p_field not in inputs:
continue
value = inputs[p_field]
t = infer_type(value)
if t == "link":
continue # connections aren't directly controllable
actual_name = friendly
# Disambiguate prompt vs negative_prompt by connection tracing
if friendly == "prompt":
if node_id == neg_node and pos_node != neg_node:
actual_name = "negative_prompt"
elif node_id == pos_node:
actual_name = "prompt"
else:
# Fallback: use _meta.title hints if present
meta_title = (node.get("_meta") or {}).get("title", "").lower()
if any(t_ in meta_title for t_ in ("negative", "neg", "-prompt", "anti")):
actual_name = "negative_prompt"
raw_params.append({
"name_hint": actual_name,
"node_id": node_id,
"field": p_field,
"type": t,
"value": value,
"class_type": cls,
})
# ----- symmetric duplicate-name resolution -----
# Group by name_hint. If a hint appears once, keep it. If multiple, suffix
# ALL with their node_id. Always-stable, always-uniquely-addressable.
by_name: dict[str, list[dict]] = {}
for r in raw_params:
by_name.setdefault(r["name_hint"], []).append(r)
parameters: dict[str, dict] = {}
for name, entries in by_name.items():
if len(entries) == 1:
r = entries[0]
parameters[name] = {
"node_id": r["node_id"], "field": r["field"],
"type": r["type"], "value": r["value"],
"class_type": r["class_type"],
}
else:
# Sort by node_id (string-natural) for stability
entries.sort(key=lambda x: (str(x["node_id"]).zfill(8), x["field"]))
for r in entries:
full_name = f"{name}_{r['node_id']}"
parameters[full_name] = {
"node_id": r["node_id"], "field": r["field"],
"type": r["type"], "value": r["value"],
"class_type": r["class_type"],
"alias_of": name,
}
# ----- model dependencies -----
model_deps = list(iter_model_deps(workflow))
# ----- embedding dependencies (in prompt text) -----
embedding_deps: list[dict] = []
seen_emb: set[tuple[str, str]] = set()
for nid, emb_name in iter_embedding_refs(workflow):
key = (nid, emb_name)
if key in seen_emb:
continue
seen_emb.add(key)
# Find which field had the reference, for context
node = workflow.get(nid, {})
inputs = node.get("inputs", {}) or {}
found_field = None
excerpt = None
for fname, fval in inputs.items():
if isinstance(fval, str) and fname in PROMPT_FIELDS and emb_name in fval:
found_field = fname
excerpt = fval[:120]
break
embedding_deps.append({
"node_id": nid,
"embedding_name": emb_name,
"field": found_field,
"value_excerpt": excerpt,
"folder": "embeddings",
})
# ----- summary -----
summary = {
"parameter_count": len(parameters),
"output_node_count": len(output_nodes),
"model_dep_count": len(model_deps),
"embedding_dep_count": len(embedding_deps),
"has_negative_prompt": "negative_prompt" in parameters,
"has_seed": "seed" in parameters or any(p.startswith("seed_") for p in parameters),
"is_video_workflow": any(
workflow.get(n, {}).get("class_type", "") in {
"VHS_VideoCombine", "SaveVideo", "SaveAnimatedWEBP", "SaveAnimatedPNG",
} for n in output_nodes
),
}
return {
"parameters": parameters,
"output_nodes": output_nodes,
"model_dependencies": model_deps,
"embedding_dependencies": embedding_deps,
"summary": summary,
}
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="Extract controllable parameters from a ComfyUI workflow")
p.add_argument("workflow", help="Path to workflow API JSON file")
p.add_argument("--output", "-o", help="Output file (default: stdout)")
p.add_argument("--summary-only", action="store_true",
help="Only print the summary block")
args = p.parse_args(argv)
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
print(f"Error: {wf_path} not found", file=sys.stderr)
return 1
try:
with wf_path.open() as f:
payload = json.load(f)
workflow = unwrap_workflow(payload)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
except json.JSONDecodeError as e:
print(f"Error: invalid JSON — {e}", file=sys.stderr)
return 1
schema = extract_schema(workflow)
if args.summary_only:
out = json.dumps(schema["summary"], indent=2)
else:
out = json.dumps(schema, indent=2, default=str)
if args.output:
Path(args.output).write_text(out)
print(f"Schema written to {args.output}", file=sys.stderr)
else:
print(out)
return 0
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,158 @@
#!/usr/bin/env python3
"""
fetch_logs.py — Retrieve workflow execution diagnostics from a ComfyUI server.
When a workflow errors, the server's /history (local) or /jobs (cloud) entry
contains the full Python traceback. This script makes it easy to fetch by
prompt_id, with sensible formatting.
Usage:
python3 fetch_logs.py <prompt_id>
python3 fetch_logs.py <prompt_id> --host https://cloud.comfy.org
python3 fetch_logs.py --tail-queue # show currently queued/running jobs
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY, emit_json, http_get, is_cloud_host,
resolve_api_key, resolve_url,
)
def fetch_history_entry(host: str, headers: dict, prompt_id: str, *, is_cloud: bool) -> dict:
if is_cloud:
# Try /jobs/{id} first
url = resolve_url(host, f"/jobs/{prompt_id}", is_cloud=True)
r = http_get(url, headers=headers, retries=2, timeout=30)
if r.status == 200:
try:
return {"ok": True, "entry": r.json(), "source": "/api/jobs"}
except Exception:
pass
# Fallback to history_v2
url = resolve_url(host, f"/history/{prompt_id}", is_cloud=True)
r = http_get(url, headers=headers, retries=2, timeout=30)
try:
data = r.json()
except Exception:
data = None
if r.status == 200 and data:
return {"ok": True, "entry": data, "source": "/api/history_v2"}
return {"ok": False, "http_status": r.status, "body": r.text()[:500]}
url = resolve_url(host, f"/history/{prompt_id}", is_cloud=False)
r = http_get(url, headers=headers, retries=2, timeout=30)
if r.status != 200:
return {"ok": False, "http_status": r.status, "body": r.text()[:500]}
try:
data = r.json()
except Exception:
return {"ok": False, "reason": "non-JSON response"}
if not isinstance(data, dict) or prompt_id not in data:
return {"ok": False, "reason": "prompt_id not found in history",
"history_keys": list(data.keys())[:5] if isinstance(data, dict) else []}
return {"ok": True, "entry": data[prompt_id], "source": "/history"}
def fetch_queue(host: str, headers: dict) -> dict:
url = resolve_url(host, "/queue")
r = http_get(url, headers=headers, retries=2, timeout=15)
try:
data = r.json()
except Exception:
data = {"raw": r.text()[:500]}
return {"http_status": r.status, "data": data}
def extract_diagnostics(entry: dict) -> dict:
"""Pull out the parts a human cares about: status, errors, traceback, timing."""
diag: dict = {}
status = entry.get("status") or {}
diag["status_str"] = status.get("status_str")
diag["completed"] = status.get("completed")
messages = status.get("messages") or []
diag["execution_log"] = []
for msg in messages:
if isinstance(msg, list) and len(msg) >= 2:
mtype, mdata = msg[0], msg[1]
diag["execution_log"].append({"type": mtype, "data": mdata})
else:
diag["execution_log"].append(msg)
# Look for execution_error inside messages
errors = []
for msg in messages:
if isinstance(msg, list) and len(msg) >= 2 and msg[0] == "execution_error":
errors.append(msg[1])
if errors:
diag["errors"] = errors
# Cloud's /jobs response shape: top-level outputs / status / etc.
if "outputs" in entry:
out = entry["outputs"] or {}
if isinstance(out, dict):
diag["output_node_ids"] = list(out.keys())
# Count file refs across all output buckets (images / video / etc.)
total = 0
for node_output in out.values():
if not isinstance(node_output, dict):
continue
for v in node_output.values():
if isinstance(v, list):
total += len(v)
diag["output_count"] = total
else:
diag["output_node_ids"] = []
diag["output_count"] = 0
return diag
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="Fetch workflow execution diagnostics")
p.add_argument("prompt_id", nargs="?", help="prompt_id to look up")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST)
p.add_argument("--api-key", help=f"or set ${ENV_API_KEY}")
p.add_argument("--raw", action="store_true",
help="Print the full history entry instead of the digest")
p.add_argument("--tail-queue", action="store_true",
help="Show currently running/pending jobs instead")
args = p.parse_args(argv)
api_key = resolve_api_key(args.api_key)
headers = {"X-API-Key": api_key} if api_key else {}
is_cloud = is_cloud_host(args.host)
if args.tail_queue:
emit_json(fetch_queue(args.host, headers))
return 0
if not args.prompt_id:
print("Error: prompt_id is required (or use --tail-queue)", file=sys.stderr)
return 1
res = fetch_history_entry(args.host, headers, args.prompt_id, is_cloud=is_cloud)
if not res.get("ok"):
emit_json(res)
return 1
if args.raw:
emit_json(res)
return 0
diag = extract_diagnostics(res["entry"])
diag["source"] = res.get("source")
diag["prompt_id"] = args.prompt_id
emit_json(diag)
return 0 if diag.get("status_str") not in ("error",) else 1
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,497 @@
#!/usr/bin/env python3
"""hardware_check.py — Detect whether this machine can realistically run ComfyUI locally.
Improvements over v1:
- Multi-GPU detection: scans all NVIDIA / AMD GPUs, picks the best one (most VRAM)
- Apple Silicon: detects Rosetta-via-x86_64 false negative; warns instead of misclassifying
- Apple generation: defaults to None (unknown) instead of mis-tagging as M1
- WSL2 detection: identifies WSL2 + nvidia-smi situation explicitly
- ROCm: prefers `rocm-smi --json` for new ROCm 6.x output
- Disk space check: warns if /home or workspace volume has < 25 GB free
- PyTorch verification (optional): tries to import torch and check device availability
- Windows: prefers PowerShell `Get-CimInstance` over deprecated `wmic`
- More accurate VRAM thresholds and verdict reasons
Emits a structured JSON report. Exit codes match `verdict`:
0 → ok
1 → marginal
2 → cloud
Usage:
python3 hardware_check.py [--json] [--check-pytorch]
"""
from __future__ import annotations
import json
import os
import platform
import re
import shutil
import subprocess
import sys
from typing import Any
# Thresholds (GiB).
MIN_VRAM_GB_USABLE = 6
OK_VRAM_GB = 8
GREAT_VRAM_GB = 12
MIN_MAC_RAM_GB = 16
OK_MAC_RAM_GB = 32
MIN_FREE_DISK_GB = 25 # ComfyUI core ~5 GB + one model ~524 GB
_COMFY_CLI_FLAG = {
"nvidia": "--nvidia",
"amd": "--amd",
"apple-silicon": "--m-series",
"intel": None,
"comfy-cloud": None,
"cpu": "--cpu",
}
def _run(cmd: list[str], timeout: int = 8) -> str:
try:
out = subprocess.run(
cmd, capture_output=True, text=True, timeout=timeout, check=False
)
return (out.stdout or "") + (out.stderr or "")
except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
return ""
def is_wsl() -> bool:
"""Return True when running under Windows Subsystem for Linux."""
if platform.system() != "Linux":
return False
if "microsoft" in platform.release().lower() or "wsl" in platform.release().lower():
return True
try:
with open("/proc/version", "r") as fh:
return "microsoft" in fh.read().lower()
except OSError:
return False
def is_rosetta() -> bool:
"""Return True when Python is running translated under Rosetta on Apple Silicon."""
if platform.system() != "Darwin":
return False
if platform.machine() == "arm64":
return False
# x86_64 on Darwin — could be Intel Mac or Rosetta. Probe sysctl.
out = _run(["sysctl", "-in", "sysctl.proc_translated"]).strip()
return out == "1"
def detect_nvidia() -> dict | None:
"""Detect NVIDIA GPUs. Returns the GPU with the most VRAM, plus list of all."""
if not shutil.which("nvidia-smi"):
return None
out = _run([
"nvidia-smi",
"--query-gpu=index,name,memory.total,driver_version",
"--format=csv,noheader,nounits",
])
if not out.strip():
return None
gpus = []
for line in out.strip().splitlines():
parts = [p.strip() for p in line.split(",")]
if len(parts) < 3:
continue
try:
idx = int(parts[0])
name = parts[1]
vram_mb = int(parts[2])
except ValueError:
continue
driver = parts[3] if len(parts) > 3 else ""
gpus.append({
"vendor": "nvidia",
"index": idx,
"name": name,
"vram_gb": round(vram_mb / 1024, 1),
"driver": driver,
})
if not gpus:
return None
# Pick GPU with most VRAM
best = max(gpus, key=lambda g: g["vram_gb"])
if len(gpus) > 1:
best["all_gpus"] = gpus
return best
def detect_rocm() -> dict | None:
if not shutil.which("rocm-smi"):
return None
# Prefer JSON output (new ROCm 6.x)
out = _run(["rocm-smi", "--showproductname", "--showmeminfo", "vram", "--json"])
if out.strip().startswith("{"):
try:
data = json.loads(out)
cards = []
for card_id, info in data.items():
if not card_id.startswith("card"):
continue
name = (info.get("Card series") or info.get("Card model")
or info.get("Marketing Name") or "AMD GPU")
vram_b = info.get("VRAM Total Memory (B)") or info.get("vram_total_memory_b") or 0
try:
vram_b = int(vram_b)
except (ValueError, TypeError):
vram_b = 0
cards.append({
"vendor": "amd",
"name": str(name).strip(),
"vram_gb": round(vram_b / (1024**3), 1),
"driver": "rocm",
})
if cards:
best = max(cards, key=lambda c: c["vram_gb"])
if len(cards) > 1:
best["all_gpus"] = cards
return best
except json.JSONDecodeError:
pass
# Fall back to text parsing
out = _run(["rocm-smi", "--showproductname", "--showmeminfo", "vram"])
if not out.strip():
return None
name_m = re.search(r"Card (?:series|model|Marketing Name):\s*(.+)", out)
vram_m = re.search(r"VRAM Total Memory \(B\):\s*(\d+)", out)
vram_gb = round(int(vram_m.group(1)) / (1024**3), 1) if vram_m else 0.0
return {
"vendor": "amd",
"name": name_m.group(1).strip() if name_m else "AMD GPU",
"vram_gb": vram_gb,
"driver": "rocm",
}
def detect_apple_silicon() -> dict | None:
if platform.system() != "Darwin":
return None
if platform.machine() != "arm64":
return None
chip = _run(["sysctl", "-n", "machdep.cpu.brand_string"]).strip()
m = re.search(r"Apple M(\d+)", chip)
generation = int(m.group(1)) if m else None
mem_bytes = 0
try:
mem_bytes = int(_run(["sysctl", "-n", "hw.memsize"]).strip() or 0)
except ValueError:
pass
ram_gb = round(mem_bytes / (1024**3), 1) if mem_bytes else 0.0
# Detect chip variant ("Pro", "Max", "Ultra") — affects performance even at same gen
variant = None
for v in ("Ultra", "Max", "Pro"):
if v in chip:
variant = v
break
return {
"vendor": "apple",
"name": chip or "Apple Silicon",
"generation": generation,
"variant": variant,
"unified_memory_gb": ram_gb,
}
def detect_intel_arc() -> dict | None:
if platform.system() not in ("Linux", "Windows"):
return None
if shutil.which("clinfo"):
out = _run(["clinfo", "--list"])
if "Intel" in out and ("Arc" in out or "Xe" in out):
return {"vendor": "intel", "name": "Intel Arc/Xe", "vram_gb": 0.0}
# Windows: try Get-CimInstance
if platform.system() == "Windows" and shutil.which("powershell"):
out = _run(["powershell", "-NoProfile",
"Get-CimInstance Win32_VideoController | Select-Object Name | Format-List"])
if "Intel" in out and ("Arc" in out or "Iris Xe" in out):
return {"vendor": "intel", "name": "Intel Arc/Iris Xe", "vram_gb": 0.0}
return None
def total_system_ram_gb() -> float:
sysname = platform.system()
if sysname == "Darwin":
try:
return round(int(_run(["sysctl", "-n", "hw.memsize"]).strip() or 0) / (1024**3), 1)
except ValueError:
return 0.0
if sysname == "Linux":
try:
with open("/proc/meminfo", "r") as fh:
for line in fh:
if line.startswith("MemTotal:"):
kb = int(line.split()[1])
return round(kb / (1024**2), 1)
except OSError:
return 0.0
if sysname == "Windows":
if shutil.which("powershell"):
out = _run([
"powershell", "-NoProfile",
"(Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory",
])
m = re.search(r"(\d{8,})", out)
if m:
return round(int(m.group(1)) / (1024**3), 1)
# Fall back to wmic for older Windows
out = _run(["wmic", "ComputerSystem", "get", "TotalPhysicalMemory"])
m = re.search(r"(\d{6,})", out)
if m:
return round(int(m.group(1)) / (1024**3), 1)
return 0.0
def total_free_disk_gb(path: str = ".") -> float:
try:
usage = shutil.disk_usage(path)
return round(usage.free / (1024**3), 1)
except OSError:
return 0.0
def check_pytorch_cuda() -> dict | None:
"""Optional PyTorch availability check. Only run when --check-pytorch is set."""
try:
import torch # type: ignore[import-not-found]
except Exception as e:
return {"available": False, "reason": f"torch not importable: {e}"}
info: dict[str, Any] = {
"available": True,
"torch_version": torch.__version__,
}
try:
info["cuda_available"] = bool(torch.cuda.is_available())
if info["cuda_available"]:
info["cuda_device_count"] = torch.cuda.device_count()
info["cuda_device_0"] = torch.cuda.get_device_name(0)
except Exception:
info["cuda_available"] = False
try:
info["mps_available"] = bool(torch.backends.mps.is_available())
except Exception:
info["mps_available"] = False
return info
def classify(gpu: dict | None, ram_gb: float, free_disk_gb: float, *, wsl: bool, rosetta: bool) -> tuple[str, str, list[str]]:
notes: list[str] = []
if rosetta:
notes.append(
"Detected Python running under Rosetta on Apple Silicon. "
"ComfyUI MPS support requires native ARM64 Python — install via "
"`brew install python` or arm64 Miniforge, then re-run."
)
return "cloud", "comfy-cloud", notes
if wsl and gpu and gpu["vendor"] == "nvidia":
notes.append("Detected WSL2 + NVIDIA — confirm `nvidia-smi` works in your WSL distro before installing.")
if free_disk_gb and free_disk_gb < MIN_FREE_DISK_GB:
notes.append(
f"Free disk space ({free_disk_gb} GB) is below the {MIN_FREE_DISK_GB} GB recommended minimum. "
"ComfyUI core (~5 GB) plus one SDXL model (~6.5 GB) needs space; Flux Dev needs ~24 GB."
)
# Host RAM matters even for discrete-GPU systems: ComfyUI swaps model
# weights through CPU RAM when shuffling between text encoders / VAE / UNet.
# Apple's unified-memory check is handled below so don't double-warn.
if ram_gb and ram_gb < 8 and gpu and gpu.get("vendor") != "apple":
notes.append(
f"System RAM ({ram_gb} GB) is low. ComfyUI swaps model weights through "
"host RAM; <8 GB causes severe slowdowns. 16+ GB recommended."
)
if gpu is None:
notes.append(
"No supported accelerator found (NVIDIA CUDA / AMD ROCm / Apple Silicon / Intel Arc)."
)
notes.append(
"CPU-only ComfyUI works but is unusably slow for modern models — use Comfy Cloud."
)
return "cloud", "comfy-cloud", notes
if gpu["vendor"] == "apple":
gen = gpu.get("generation")
variant = gpu.get("variant")
mem = gpu.get("unified_memory_gb", 0.0)
gen_str = f"M{gen}" if gen else "Apple Silicon"
if variant:
gen_str += f" {variant}"
if mem < MIN_MAC_RAM_GB:
notes.append(
f"{gen_str} with {mem} GB unified memory — below the {MIN_MAC_RAM_GB} GB practical minimum."
)
notes.append("SD1.5 may work; SDXL/Flux will swap or OOM. Recommend Comfy Cloud.")
return "cloud", "comfy-cloud", notes
if mem < OK_MAC_RAM_GB:
notes.append(
f"{gen_str} with {mem} GB — SDXL works but slow. Flux/video likely too tight."
)
return "marginal", "apple-silicon", notes
notes.append(f"{gen_str} with {mem} GB unified memory — good for SDXL/Flux.")
return "ok", "apple-silicon", notes
if gpu["vendor"] == "intel":
notes.append("Intel Arc detected — ComfyUI IPEX support is experimental; Comfy Cloud is more reliable.")
return "marginal", "intel", notes
# Discrete NVIDIA / AMD
vram = gpu.get("vram_gb", 0.0)
name = gpu["name"]
if vram < MIN_VRAM_GB_USABLE:
notes.append(
f"{name} has only {vram} GB VRAM — below the {MIN_VRAM_GB_USABLE} GB practical minimum."
)
notes.append("Most modern models won't load. Recommend Comfy Cloud.")
return "cloud", "comfy-cloud", notes
if vram < OK_VRAM_GB:
notes.append(
f"{name} ({vram} GB VRAM) — SD1.5 works, SDXL tight, Flux/video unlikely."
)
return "marginal", gpu["vendor"], notes
if vram < GREAT_VRAM_GB:
notes.append(f"{name} ({vram} GB VRAM) — SDXL comfortable, Flux possible with optimizations.")
return "ok", gpu["vendor"], notes
notes.append(f"{name} ({vram} GB VRAM) — can run everything including Flux/video.")
return "ok", gpu["vendor"], notes
def build_report(*, check_pytorch: bool = False) -> dict:
sysname = platform.system()
arch = platform.machine()
ram_gb = total_system_ram_gb()
free_disk_gb = total_free_disk_gb(os.path.expanduser("~"))
rosetta = is_rosetta()
wsl = is_wsl()
gpu = (
detect_nvidia()
or detect_rocm()
or detect_apple_silicon()
or detect_intel_arc()
)
# Intel Mac: arm64 detect failed AND no other GPU paths
if gpu is None and sysname == "Darwin" and arch != "arm64" and not rosetta:
notes = [
"Intel Mac detected — no MPS backend available.",
"ComfyUI will fall back to CPU which is unusably slow. Use Comfy Cloud.",
]
report = {
"os": sysname,
"arch": arch,
"system_ram_gb": ram_gb,
"free_disk_gb": free_disk_gb,
"wsl": False,
"rosetta": False,
"gpu": None,
"verdict": "cloud",
"recommended_install_path": "comfy-cloud",
"comfy_cli_flag": None,
"notes": notes,
"install_urls": _install_urls(),
}
if check_pytorch:
report["pytorch"] = check_pytorch_cuda()
return report
verdict, install_path, notes = classify(
gpu, ram_gb, free_disk_gb, wsl=wsl, rosetta=rosetta,
)
report = {
"os": sysname,
"arch": arch,
"system_ram_gb": ram_gb,
"free_disk_gb": free_disk_gb,
"wsl": wsl,
"rosetta": rosetta,
"gpu": gpu,
"verdict": verdict,
"recommended_install_path": install_path,
"comfy_cli_flag": _COMFY_CLI_FLAG.get(install_path),
"notes": notes,
"install_urls": _install_urls(),
}
if check_pytorch:
report["pytorch"] = check_pytorch_cuda()
return report
def _install_urls() -> dict:
return {
"desktop": "https://docs.comfy.org/installation/desktop",
"manual": "https://docs.comfy.org/installation/manual_install",
"comfy_cli": "https://docs.comfy.org/comfy-cli/getting-started",
"cloud": "https://platform.comfy.org",
}
def main(argv: list[str] | None = None) -> int:
import argparse
p = argparse.ArgumentParser(description="Check whether this machine can run ComfyUI locally.")
p.add_argument("--json", action="store_true", help="Emit machine-readable JSON only")
p.add_argument("--check-pytorch", action="store_true",
help="Also probe `torch` for CUDA/MPS availability (slower)")
args = p.parse_args(argv)
report = build_report(check_pytorch=args.check_pytorch)
if args.json:
print(json.dumps(report, indent=2))
else:
print(f"OS: {report['os']} ({report['arch']})")
if report.get("wsl"):
print("Env: WSL2")
if report.get("rosetta"):
print("Env: Rosetta (x86_64 Python on Apple Silicon)")
print(f"RAM: {report['system_ram_gb']} GB")
print(f"Free disk: {report['free_disk_gb']} GB (~/)")
if report["gpu"]:
g = report["gpu"]
if g["vendor"] == "apple":
print(f"GPU: {g['name']}{g.get('unified_memory_gb', 0)} GB unified memory")
else:
print(f"GPU: {g['name']}{g.get('vram_gb', 0)} GB VRAM")
if g.get("all_gpus") and len(g["all_gpus"]) > 1:
print(f" ({len(g['all_gpus'])} GPUs total; using best by VRAM)")
else:
print("GPU: (none detected)")
print(f"Verdict: {report['verdict']}{report['recommended_install_path']}")
if report["comfy_cli_flag"]:
print(f" run: comfy --skip-prompt install {report['comfy_cli_flag']}")
if report.get("pytorch"):
pt = report["pytorch"]
if pt.get("available"):
line = f"PyTorch: {pt.get('torch_version')}"
if pt.get("cuda_available"):
line += f" + CUDA ({pt.get('cuda_device_0', '?')})"
if pt.get("mps_available"):
line += " + MPS"
print(line)
else:
print(f"PyTorch: not available — {pt.get('reason')}")
for n in report["notes"]:
print(f"{n}")
if report["verdict"] == "ok":
return 0
if report["verdict"] == "marginal":
return 1
return 2
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,223 @@
#!/usr/bin/env python3
"""
health_check.py — One-stop verification that the ComfyUI environment is ready.
Runs through the verification checklist:
1. comfy-cli on PATH
2. server reachable (/system_stats)
3. at least one checkpoint installed
4. (optional) a specific workflow's deps are met
5. (optional) actually submit a tiny test workflow and verify round-trip
Usage:
python3 health_check.py
python3 health_check.py --host https://cloud.comfy.org
python3 health_check.py --workflow my.json
python3 health_check.py --smoke-test # actually submit a tiny workflow
"""
from __future__ import annotations
import argparse
import json
import shutil
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY, emit_json, http_get, parse_model_list,
resolve_api_key, resolve_url, unwrap_workflow,
)
def comfy_cli_status() -> dict:
if shutil.which("comfy"):
return {"available": True, "method": "comfy", "path": shutil.which("comfy")}
if shutil.which("uvx"):
return {"available": True, "method": "uvx",
"hint": "Invoke as `uvx --from comfy-cli comfy ...`"}
return {
"available": False,
"hint": "Install with: pipx install comfy-cli (or `pip install comfy-cli`)",
}
def server_status(host: str, headers: dict) -> dict:
url = resolve_url(host, "/system_stats")
try:
r = http_get(url, headers=headers, retries=2, timeout=10)
if r.status == 200:
try:
stats = r.json() or {}
except Exception:
stats = {}
return {"reachable": True, "url": url, "stats": stats}
return {"reachable": False, "url": url, "http_status": r.status, "body": r.text()[:200]}
except Exception as e:
return {"reachable": False, "url": url, "error": str(e)}
def checkpoint_status(host: str, headers: dict) -> dict:
url = resolve_url(host, "/models/checkpoints")
try:
r = http_get(url, headers=headers, retries=2, timeout=15)
except Exception as e:
return {"queryable": False, "error": str(e)}
if r.status != 200:
return {"queryable": False, "http_status": r.status, "url": url, "body": r.text()[:200]}
try:
models = parse_model_list(r.json())
except Exception:
models = set()
return {"queryable": True, "count": len(models),
"first_few": sorted(models)[:5]}
SMOKE_WORKFLOW = {
# Minimal SD1.5 workflow that doesn't depend on rare nodes.
# 256x256 + 1 step is the smallest config that doesn't trigger SDXL/Flux
# validation errors while still executing fast.
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 1, "steps": 1, "cfg": 7.0,
"sampler_name": "euler", "scheduler": "normal", "denoise": 1.0,
"model": ["4", 0], "positive": ["6", 0], "negative": ["7", 0],
"latent_image": ["5", 0],
},
},
"4": {"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "REPLACE_ME"}},
"5": {"class_type": "EmptyLatentImage",
"inputs": {"width": 256, "height": 256, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode",
"inputs": {"text": "test", "clip": ["4", 1]}},
"7": {"class_type": "CLIPTextEncode",
"inputs": {"text": "", "clip": ["4", 1]}},
"9": {"class_type": "SaveImage",
"inputs": {"filename_prefix": "smoke", "images": ["3", 0]}},
}
def smoke_test(host: str, headers: dict, ckpt_name: str | None) -> dict:
"""Submit a tiny workflow and verify the server accepts it.
Cancels the job immediately after acceptance so we don't burn GPU
time / cloud minutes on a smoke test.
"""
if not ckpt_name:
return {"ran": False, "reason": "no checkpoint available"}
wf = json.loads(json.dumps(SMOKE_WORKFLOW))
wf["4"]["inputs"]["ckpt_name"] = ckpt_name
# Lazy import to avoid circular issues
from run_workflow import ComfyRunner
api_key = headers.get("X-API-Key")
runner = ComfyRunner(host=host, api_key=api_key)
sub = runner.submit(wf)
if "_http_error" in sub:
return {"ran": True, "submitted": False,
"http_status": sub["_http_error"], "body": sub.get("body")}
pid = sub.get("prompt_id")
if not pid:
return {"ran": True, "submitted": False, "response": sub}
# Cancel so we don't actually waste compute on the smoke test.
cancelled = False
try:
cancelled = runner.cancel(pid)
except Exception:
pass
return {
"ran": True, "submitted": True, "prompt_id": pid,
"cancelled_after_submit": cancelled,
"note": "Submission accepted; cancelled to avoid running the full pipeline.",
}
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="One-stop ComfyUI health check")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST)
p.add_argument("--api-key", help=f"or set ${ENV_API_KEY}")
p.add_argument("--workflow", help="Optional: also run check_deps on this workflow")
p.add_argument("--smoke-test", action="store_true",
help="Submit a tiny test workflow and verify round-trip")
p.add_argument("--strict", action="store_true",
help="Exit non-zero on any non-pass condition (including warnings)")
args = p.parse_args(argv)
api_key = resolve_api_key(args.api_key)
headers = {"X-API-Key": api_key} if api_key else {}
cli = comfy_cli_status()
server = server_status(args.host, headers)
ckpts = checkpoint_status(args.host, headers) if server.get("reachable") else None
# ---- workflow check ----
workflow_check: dict | None = None
if args.workflow:
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
workflow_check = {"error": "workflow file not found"}
else:
try:
with wf_path.open() as f:
workflow = unwrap_workflow(json.load(f))
from check_deps import check_deps
workflow_check = check_deps(workflow, host=args.host, api_key=api_key)
except (ValueError, json.JSONDecodeError) as e:
workflow_check = {"error": str(e)}
smoke = None
if args.smoke_test and server.get("reachable"):
first_ckpt = ckpts["first_few"][0] if ckpts and ckpts.get("first_few") else None
smoke = smoke_test(args.host, headers, first_ckpt)
# ---- verdict ----
verdict = "pass"
reasons: list[str] = []
if not server.get("reachable"):
verdict = "fail"
reasons.append("server unreachable")
if ckpts and ckpts.get("queryable") and ckpts.get("count", 0) == 0:
verdict = "warn" if verdict == "pass" else verdict
reasons.append("no checkpoints installed")
if workflow_check and workflow_check.get("error"):
verdict = "fail"
reasons.append(f"workflow check failed: {workflow_check['error']}")
elif workflow_check and not workflow_check.get("is_ready"):
if workflow_check.get("node_check_skipped"):
reasons.append("node check skipped (cloud free tier)")
else:
verdict = "fail"
reasons.append("workflow has missing deps")
if smoke and smoke.get("ran") and not smoke.get("submitted"):
verdict = "fail"
reasons.append("smoke-test submission failed")
if not cli.get("available"):
verdict = "warn" if verdict == "pass" else verdict
reasons.append("comfy-cli not on PATH (lifecycle commands won't work)")
report = {
"verdict": verdict,
"reasons": reasons,
"host": args.host,
"comfy_cli": cli,
"server": server,
"checkpoints": ckpts,
"workflow_check": workflow_check,
"smoke_test": smoke,
}
emit_json(report)
if verdict == "pass":
return 0
if verdict == "warn":
return 1 if args.strict else 0
return 1
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,243 @@
#!/usr/bin/env python3
"""
run_batch.py — Run a workflow many times, varying parameters per run.
Two modes:
1. --count N --randomize-seed
Submit N runs, each with a fresh random seed. Use for quick variations.
2. --sweep '{"seed": [1,2,3], "steps": [20,30]}'
Cartesian product of values. With cloud subscription, runs in parallel
up to your tier's concurrent-job limit.
Both modes write each run's outputs into output-dir/run_NNN/.
Examples:
python3 run_batch.py --workflow flux_dev.json \
--args '{"prompt": "a cat"}' \
--count 8 --randomize-seed \
--output-dir ./outputs/cat-batch
python3 run_batch.py --workflow sdxl.json \
--args '{"prompt": "abstract"}' \
--sweep '{"seed": [1,2,3], "steps": [20, 40]}' \
--output-dir ./outputs/sweep
"""
from __future__ import annotations
import argparse
import itertools
import json
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY, coerce_seed, emit_json, log,
looks_like_video_workflow, resolve_api_key, unwrap_workflow,
)
from run_workflow import ( # noqa: E402
ComfyRunner, download_outputs, inject_params,
)
from extract_schema import extract_schema # noqa: E402
def expand_sweep(sweep: dict, base_args: dict, count: int, randomize_seed: bool) -> list[dict]:
"""Generate a list of args dicts for each run."""
if sweep:
# Cartesian product
keys = list(sweep.keys())
values = [sweep[k] if isinstance(sweep[k], list) else [sweep[k]] for k in keys]
runs = []
for combo in itertools.product(*values):
ar = dict(base_args)
for k, v in zip(keys, combo):
ar[k] = v
runs.append(ar)
return runs
# Count mode
runs = []
for _ in range(count):
ar = dict(base_args)
if randomize_seed:
ar["seed"] = coerce_seed(None)
runs.append(ar)
return runs
def execute_one(
runner: ComfyRunner, workflow: dict, schema: dict, args: dict,
*, output_dir: Path, timeout: int, ws: bool,
) -> dict:
wf, warnings = inject_params(workflow, schema, args)
sub = runner.submit(wf)
if "_http_error" in sub:
return {"status": "error", "error": "submission HTTP error",
"details": sub.get("body"), "args": args}
pid = sub.get("prompt_id")
if not pid:
return {"status": "error", "error": "no prompt_id", "response": sub, "args": args}
if sub.get("node_errors"):
return {"status": "error", "error": "validation failed",
"node_errors": sub["node_errors"], "args": args}
if ws:
result = runner.monitor_ws(pid, timeout=timeout)
else:
result = runner.poll_status(pid, timeout=timeout)
if result["status"] != "success":
return {
"status": result["status"],
"prompt_id": pid,
"details": result.get("data"),
"args": args,
}
outputs = result.get("outputs") or runner.get_outputs(pid)
downloaded = download_outputs(runner, outputs, output_dir, preserve_subfolder=False)
return {
"status": "success",
"prompt_id": pid,
"args": args,
"outputs": downloaded,
"warnings": warnings,
}
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(
description="Submit a workflow many times with varying parameters.",
)
p.add_argument("--workflow", required=True)
p.add_argument("--args", default="{}", help="Base parameters JSON")
p.add_argument("--count", type=int, default=0,
help="Number of runs (use with --randomize-seed)")
p.add_argument("--sweep", default="",
help='JSON dict of param→list of values. Cartesian product. '
'e.g. \'{"seed":[1,2,3],"cfg":[5,8]}\'')
p.add_argument("--randomize-seed", action="store_true",
help="In --count mode, vary seed per run")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST)
p.add_argument("--api-key", help=f"or set ${ENV_API_KEY}")
p.add_argument("--partner-key")
p.add_argument("--parallel", type=int, default=1,
help="Concurrent submissions (cloud: up to your tier limit). "
"Default 1 (sequential)")
p.add_argument("--output-dir", default="./outputs/batch")
p.add_argument("--timeout", type=int, default=0)
p.add_argument("--ws", action="store_true")
p.add_argument("--continue-on-error", action="store_true",
help="Don't stop the batch when a run fails")
args = p.parse_args(argv)
if args.count <= 0 and not args.sweep:
emit_json({"error": "Specify --count N or --sweep '{...}'"})
return 1
base_args = json.loads(args.args) if args.args.strip() else {}
sweep = json.loads(args.sweep) if args.sweep.strip() else {}
# Validate sweep shape
if sweep:
if not isinstance(sweep, dict):
emit_json({"error": "--sweep must be a JSON object {param: [values]}"})
return 1
empty = [k for k, v in sweep.items() if isinstance(v, list) and len(v) == 0]
if empty:
emit_json({"error": f"--sweep parameters have empty value lists: {empty}"})
return 1
# If user passed BOTH --sweep and --count/--randomize-seed, --sweep wins
if args.count or args.randomize_seed:
log("--sweep set; ignoring --count / --randomize-seed (sweep defines the runs)")
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
emit_json({"error": f"Workflow not found: {args.workflow}"})
return 1
try:
with wf_path.open() as f:
workflow = unwrap_workflow(json.load(f))
except (ValueError, json.JSONDecodeError) as e:
emit_json({"error": str(e)})
return 1
schema = extract_schema(workflow)
runs = expand_sweep(sweep, base_args, args.count, args.randomize_seed)
log(f"Planned {len(runs)} run(s)")
api_key = resolve_api_key(args.api_key)
runner = ComfyRunner(host=args.host, api_key=api_key, partner_key=args.partner_key)
ok, info = runner.check_server()
if not ok:
emit_json({"error": "Cannot reach server", "details": info, "host": args.host})
return 1
timeout = args.timeout
if timeout <= 0:
timeout = 900 if looks_like_video_workflow(workflow) else 300
base_dir = Path(args.output_dir).expanduser()
base_dir.mkdir(parents=True, exist_ok=True)
results: list[dict] = []
failures = 0
if args.parallel > 1:
with ThreadPoolExecutor(max_workers=args.parallel) as ex:
future_to_idx = {}
for i, ar in enumerate(runs):
run_dir = base_dir / f"run_{i:04d}"
fut = ex.submit(
execute_one, runner, workflow, schema, ar,
output_dir=run_dir, timeout=timeout, ws=args.ws,
)
future_to_idx[fut] = i
for fut in as_completed(future_to_idx):
i = future_to_idx[fut]
try:
r = fut.result()
except Exception as e:
r = {"status": "error", "error": str(e), "args": runs[i]}
r["index"] = i
results.append(r)
if r["status"] != "success":
failures += 1
log(f" run {i}{r['status']}: {r.get('error','?')}")
if not args.continue_on_error:
log(" --continue-on-error not set; aborting batch")
break
else:
log(f" run {i} → success: {len(r.get('outputs', []))} files")
else:
for i, ar in enumerate(runs):
run_dir = base_dir / f"run_{i:04d}"
r = execute_one(runner, workflow, schema, ar,
output_dir=run_dir, timeout=timeout, ws=args.ws)
r["index"] = i
results.append(r)
if r["status"] != "success":
failures += 1
log(f" run {i}{r['status']}: {r.get('error','?')}")
if not args.continue_on_error:
log(" --continue-on-error not set; aborting batch")
break
else:
log(f" run {i} → success: {len(r.get('outputs', []))} files")
results.sort(key=lambda x: x.get("index", 0))
emit_json({
"status": "success" if failures == 0 else "partial",
"total": len(runs),
"completed": sum(1 for r in results if r["status"] == "success"),
"failed": failures,
"output_dir": str(base_dir),
"results": results,
})
return 0 if failures == 0 else 1
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,796 @@
#!/usr/bin/env python3
"""
run_workflow.py — Inject parameters into a ComfyUI workflow, submit it, monitor
execution, and download outputs.
Improvements over v1:
- Cloud-aware URL routing (handles /api prefix and /history_v2 / /experiment/models renames)
- API key from CLI flag OR $COMFY_CLOUD_API_KEY env var
- WebSocket progress monitoring (--ws), with HTTP polling fallback
- Streaming download (no whole-file buffering — handles GB-size video outputs)
- Path-traversal-safe output writes
- Subfolder-aware download paths (no silent overwrites)
- Retry with exponential backoff on transient errors
- Status-error correctly classified before "completed: true"
- Image upload helper (--input-image NAME=PATH)
- Auto-randomize seed when value is -1 or omitted on a randomize-seed flag
- Auto-extends timeout heuristically for video workflows
- Editor-format detection with helpful error
- Doesn't pollute extra_data.api_key_comfy_org with the cloud auth key
unless --partner-key is provided (correct semantic per cloud docs)
Usage:
# Local server
python3 run_workflow.py --workflow workflow_api.json \
--args '{"prompt": "a cat", "seed": 42}' \
--output-dir ./outputs
# Cloud server (API key from env var)
export COMFY_CLOUD_API_KEY="comfyui-xxxxxxx"
python3 run_workflow.py --workflow workflow_api.json \
--args '{"prompt": "a cat"}' \
--host https://cloud.comfy.org \
--output-dir ./outputs
# With image input (auto-uploads, then references)
python3 run_workflow.py --workflow img2img.json \
--input-image image=./photo.png \
--args '{"prompt": "make it cyberpunk"}'
# WebSocket real-time progress
python3 run_workflow.py --workflow flux_dev.json \
--args '{"prompt": "..."}' \
--ws
Stdlib-only by default (Python 3.10+). Will use `requests`/`websocket-client`
if installed for nicer behavior.
"""
from __future__ import annotations
import argparse
import copy
import json
import sys
import time
from pathlib import Path
from typing import Any
from urllib.parse import urlencode, urlparse
# Local import — _common.py sits next to this script.
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY,
coerce_seed, emit_json, http_get, http_post, http_request,
is_cloud_host, is_link, log, looks_like_video_workflow,
media_type_from_filename, new_client_id, resolve_api_key, resolve_url,
safe_path_join, unwrap_workflow,
)
# =============================================================================
# Runner
# =============================================================================
class WorkflowRunError(Exception):
"""Raised when a workflow run fails (validation, execution, timeout)."""
def __init__(self, status: str, message: str, **details: Any):
super().__init__(message)
self.status = status
self.message = message
self.details = details
def to_dict(self) -> dict:
d = {"status": self.status, "error": self.message}
d.update(self.details)
return d
class ComfyRunner:
def __init__(
self,
host: str = DEFAULT_LOCAL_HOST,
api_key: str | None = None,
client_id: str | None = None,
partner_key: str | None = None,
):
self.host = host.rstrip("/")
self.api_key = api_key
self.partner_key = partner_key
self.is_cloud = is_cloud_host(self.host)
self.client_id = client_id or new_client_id()
@property
def headers(self) -> dict[str, str]:
h: dict[str, str] = {}
if self.api_key:
h["X-API-Key"] = self.api_key
return h
def _url(self, path: str) -> str:
return resolve_url(self.host, path, is_cloud=self.is_cloud)
# ---------- server health ----------
def check_server(self) -> tuple[bool, dict | None]:
try:
r = http_get(self._url("/system_stats"), headers=self.headers, retries=2)
if r.status == 200:
try:
return True, r.json()
except Exception:
return True, None
return False, {"http_status": r.status, "body": r.text()[:500]}
except Exception as e:
return False, {"error": str(e)}
# ---------- upload ----------
def upload_image(self, path: Path, *, image_type: str = "input", overwrite: bool = True,
endpoint: str = "/upload/image", extra_form: dict | None = None) -> dict:
"""Upload an image file via multipart. Returns server-side ref dict."""
if not path.exists():
raise FileNotFoundError(f"input image not found: {path}")
# Stream the file via a handle to avoid OOM on huge inputs (16MP+ photos).
with path.open("rb") as fh:
files = {"image": (path.name, fh)}
form = {"type": image_type}
if overwrite:
form["overwrite"] = "true"
if extra_form:
form.update({k: str(v) for k, v in extra_form.items()})
r = http_request(
"POST", self._url(endpoint),
headers=self.headers, files=files, form=form,
timeout=300, retries=2,
)
if r.status != 200:
raise WorkflowRunError(
"upload_failed",
f"Upload of {path.name} failed: HTTP {r.status}",
body=r.text()[:500],
)
try:
return r.json()
except Exception:
return {"name": path.name}
def upload_mask(self, path: Path, original_ref: dict) -> dict:
"""Upload an inpaint mask, linked to a previously uploaded source image.
`original_ref` should be the dict returned by `upload_image()` for the
source image (or `{"filename": ..., "subfolder": ..., "type": "input"}`).
"""
return self.upload_image(
path,
endpoint="/upload/mask",
extra_form={
"subfolder": "clipspace",
"original_ref": json.dumps(original_ref),
},
)
# ---------- submit ----------
def submit(self, workflow: dict) -> dict:
payload: dict[str, Any] = {"prompt": workflow, "client_id": self.client_id}
if self.partner_key:
payload["extra_data"] = {"api_key_comfy_org": self.partner_key}
r = http_post(self._url("/prompt"), headers=self.headers, json_body=payload, timeout=120)
try:
body = r.json()
except Exception:
body = {"raw": r.text()[:500]}
if r.status != 200:
return {"_http_error": r.status, "body": body}
return body
# ---------- HTTP polling ----------
def poll_status(self, prompt_id: str, *, timeout: float = 300.0,
initial_interval: float = 1.5, max_interval: float = 8.0) -> dict:
start = time.time()
interval = initial_interval
while time.time() - start < timeout:
if self.is_cloud:
r = http_get(
self._url(f"/job/{prompt_id}/status"),
headers=self.headers, retries=2, timeout=30,
)
if r.status == 200:
try:
data = r.json()
except Exception:
data = {}
s = data.get("status")
if s == "completed":
return {"status": "success", "data": data}
if s in ("failed",):
return {"status": "error", "data": data}
if s == "cancelled":
return {"status": "cancelled", "data": data}
# pending / in_progress → continue
elif r.status == 404:
# Cloud sometimes 404s briefly between submit and dispatcher pickup
pass
else:
# transient error — retry loop covers it
pass
else:
# Local: /history/{id} grows once execution completes
r = http_get(
self._url(f"/history/{prompt_id}"),
headers=self.headers, retries=2, timeout=30,
)
if r.status == 200:
try:
data = r.json() or {}
except Exception:
data = {}
entry = data.get(prompt_id)
if isinstance(entry, dict):
st = entry.get("status") or {}
# IMPORTANT: check error first — `completed: true` can coexist with errors
status_str = st.get("status_str")
if status_str == "error":
return {"status": "error", "data": entry}
if st.get("completed", False):
return {"status": "success", "outputs": entry.get("outputs", {})}
# not in history yet → continue polling
time.sleep(interval)
interval = min(max_interval, interval * 1.4)
return {"status": "timeout", "elapsed": time.time() - start}
# ---------- WebSocket monitoring ----------
def monitor_ws(self, prompt_id: str, *, timeout: float = 300.0,
on_progress: Any = None) -> dict:
"""Connect to /ws and listen until execution_success / execution_error.
Falls back to HTTP polling if `websocket-client` is not installed.
Returns same shape as poll_status.
"""
try:
import websocket # type: ignore[import-not-found]
except ImportError:
log("websocket-client not installed; falling back to HTTP polling")
return self.poll_status(prompt_id, timeout=timeout)
# Build WS URL. Preserve any base-path components the user gave us
# (e.g. http://example.com/comfyui → ws://example.com/comfyui/ws).
parsed = urlparse(self.host)
scheme = "wss" if parsed.scheme == "https" else "ws"
netloc = parsed.netloc
base_path = parsed.path.rstrip("/")
ws_url = f"{scheme}://{netloc}{base_path}/ws?clientId={self.client_id}"
if self.is_cloud and self.api_key:
ws_url += f"&token={self.api_key}"
outputs: dict[str, Any] = {}
error_payload: dict[str, Any] | None = None
success = False
seen_executed = False
ws = websocket.create_connection(ws_url, timeout=timeout)
try:
ws.settimeout(timeout)
deadline = time.time() + timeout
while time.time() < deadline:
msg = ws.recv()
if isinstance(msg, bytes):
# Binary preview frame — ignore for now; ws_monitor.py prints them
continue
try:
payload = json.loads(msg)
except Exception:
continue
mtype = payload.get("type", "")
mdata = payload.get("data", {}) or {}
# Filter to our job (cloud broadcasts; local filters via client_id)
pid = mdata.get("prompt_id")
if pid is not None and pid != prompt_id:
continue
if mtype == "progress":
if callable(on_progress):
on_progress({
"type": "progress",
"value": mdata.get("value"),
"max": mdata.get("max"),
"node": mdata.get("node"),
})
elif mtype == "progress_state":
if callable(on_progress):
on_progress({"type": "progress_state", "nodes": mdata.get("nodes", {})})
elif mtype == "executing":
node = mdata.get("node")
if callable(on_progress):
on_progress({"type": "executing", "node": node})
# When `node` is None on a local server, that signals end-of-run
if node is None and not self.is_cloud and seen_executed:
success = True
break
elif mtype == "executed":
seen_executed = True
nid = mdata.get("node")
out = mdata.get("output") or {}
if nid:
outputs[nid] = out
elif mtype == "notification":
if callable(on_progress):
on_progress({"type": "notification", "message": mdata.get("value", "")})
elif mtype == "execution_success":
success = True
break
elif mtype == "execution_error":
error_payload = mdata
break
elif mtype == "execution_interrupted":
error_payload = {"interrupted": True, **mdata}
break
finally:
try:
ws.close()
except Exception:
pass
if error_payload is not None:
return {"status": "error", "data": error_payload}
if success:
return {"status": "success", "outputs": outputs}
return {"status": "timeout", "elapsed": timeout}
# ---------- outputs ----------
def get_outputs(self, prompt_id: str) -> dict:
if self.is_cloud:
# Try /jobs/{id} first (returns full job with outputs); fall back to /history_v2
r = http_get(self._url(f"/jobs/{prompt_id}"), headers=self.headers, retries=2)
if r.status == 200:
try:
return (r.json() or {}).get("outputs", {}) or {}
except Exception:
pass
# Fallback
r = http_get(self._url(f"/history/{prompt_id}"), headers=self.headers, retries=2)
if r.status == 200:
try:
body = r.json() or {}
except Exception:
body = {}
if isinstance(body, dict) and prompt_id in body:
return body[prompt_id].get("outputs", {}) or {}
if isinstance(body, dict) and "outputs" in body:
return body["outputs"] or {}
return {}
# Local
r = http_get(self._url(f"/history/{prompt_id}"), headers=self.headers, retries=2)
if r.status != 200:
return {}
try:
body = r.json() or {}
except Exception:
return {}
entry = body.get(prompt_id) or {}
return entry.get("outputs", {}) or {}
def download_output(
self, *, filename: str, subfolder: str, file_type: str,
output_dir: Path, preserve_subfolder: bool = True, overwrite: bool = False,
) -> Path:
"""Stream a single output to disk. Path-traversal-safe."""
params = {"filename": filename, "subfolder": subfolder, "type": file_type}
url = self._url("/view") + "?" + urlencode(params)
# Compute target path safely. If preserve_subfolder, include subfolder in the
# local path; otherwise put the file in output_dir flat.
target_parts: list[str] = []
if preserve_subfolder and subfolder:
target_parts.extend(p for p in subfolder.split("/") if p and p not in (".", ".."))
target_parts.append(filename)
out_path = safe_path_join(output_dir, *target_parts)
if out_path.exists() and not overwrite:
stem, suffix = out_path.stem, out_path.suffix
i = 1
while True:
candidate = out_path.with_name(f"{stem}_{i}{suffix}")
if not candidate.exists():
out_path = candidate
break
i += 1
out_path.parent.mkdir(parents=True, exist_ok=True)
# Stream download. Two-step for cloud: get the 302, then fetch signed URL
# so we don't accidentally send X-API-Key to the storage backend.
# The HTTP transport already strips X-API-Key on cross-host redirect
# via _strip_api_key_on_redirect, so a single follow_redirects=True call
# is safe AND simpler.
r = http_request(
"GET", url, headers=self.headers,
timeout=600, retries=3, follow_redirects=True,
stream=True, sink=out_path,
)
if r.status != 200:
try:
if out_path.exists():
out_path.unlink()
except Exception:
pass
raise WorkflowRunError(
"download_failed",
f"Download of {filename} failed: HTTP {r.status}",
url=url,
)
return out_path
# ---------- queue / cancel ----------
def cancel(self, prompt_id: str | None = None) -> bool:
if prompt_id:
r = http_post(
self._url("/queue"), headers=self.headers,
json_body={"delete": [prompt_id]}, retries=1,
)
return r.status == 200
# Interrupt currently running
r = http_post(self._url("/interrupt"), headers=self.headers, retries=1)
return r.status == 200
# =============================================================================
# Schema / parameter injection
# =============================================================================
def _inline_schema(workflow: dict) -> dict:
"""Generate schema using the sibling extract_schema module."""
from extract_schema import extract_schema # noqa: WPS433
return extract_schema(workflow)
def load_schema(schema_path: str | None, workflow: dict) -> dict:
if schema_path:
with open(schema_path) as f:
return json.load(f)
return _inline_schema(workflow)
def inject_params(
workflow: dict, schema: dict, args: dict,
*, randomize_seed_if_unset: bool = False,
) -> tuple[dict, list[str]]:
"""Inject user args into the workflow. Returns (new_workflow, warnings)."""
wf = copy.deepcopy(workflow)
params = schema.get("parameters", {}) or {}
warnings: list[str] = []
# Auto-randomize seed when it's -1 in args, or when randomize_seed_if_unset
# and user didn't pass a seed.
if "seed" in params:
if "seed" in args and args["seed"] in (None, -1, "-1"):
args = dict(args)
args["seed"] = coerce_seed(args["seed"])
warnings.append(f"seed=-1 expanded to {args['seed']}")
elif randomize_seed_if_unset and "seed" not in args:
args = dict(args)
args["seed"] = coerce_seed(None)
warnings.append(f"seed auto-randomized to {args['seed']}")
for name, value in args.items():
if name not in params:
warnings.append(f"unknown parameter '{name}' (not in schema), skipping")
continue
m = params[name]
nid, field = m["node_id"], m["field"]
node = wf.get(nid)
if not isinstance(node, dict) or "inputs" not in node:
warnings.append(f"node '{nid}' for parameter '{name}' missing in workflow")
continue
# Refuse to overwrite a link with a literal — would silently break wiring
cur = node["inputs"].get(field)
if is_link(cur):
warnings.append(
f"parameter '{name}' targets {nid}.{field} which is currently a link; "
f"refusing to overwrite (set the schema to point at the source node instead)"
)
continue
node["inputs"][field] = value
return wf, warnings
# =============================================================================
# Output download helper
# =============================================================================
def download_outputs(
runner: ComfyRunner, outputs: dict, output_dir: Path,
*, preserve_subfolder: bool = True, overwrite: bool = False,
) -> list[dict]:
"""Walk the outputs dict and download every file. Cloud uses `video` (singular);
local uses `videos` (plural). We accept both."""
output_dir.mkdir(parents=True, exist_ok=True)
downloaded: list[dict] = []
OUTPUT_KEYS = ("images", "gifs", "videos", "video", "audio", "files", "models", "3d")
for node_id, node_output in (outputs or {}).items():
if not isinstance(node_output, dict):
continue
for key in OUTPUT_KEYS:
entries = node_output.get(key)
if not entries:
continue
if not isinstance(entries, list):
entries = [entries]
for fi in entries:
if not isinstance(fi, dict):
continue
filename = fi.get("filename") or ""
if not filename:
continue
subfolder = fi.get("subfolder") or ""
file_type = fi.get("type") or "output"
try:
out_path = runner.download_output(
filename=filename, subfolder=subfolder, file_type=file_type,
output_dir=output_dir, preserve_subfolder=preserve_subfolder,
overwrite=overwrite,
)
downloaded.append({
"file": str(out_path),
"node_id": node_id,
"type": media_type_from_filename(filename),
"filename": filename,
"subfolder": subfolder,
"source_type": file_type,
})
except Exception as e:
log(f"WARN: failed to download {filename}: {e}")
return downloaded
# =============================================================================
# CLI
# =============================================================================
def parse_input_image_arg(spec: str) -> tuple[str, Path]:
"""Parse `name=path` (or `path` alone, defaulting to name='image')."""
if "=" in spec:
name, path = spec.split("=", 1)
return name.strip(), Path(path).expanduser()
return "image", Path(spec).expanduser()
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(
description="Run a ComfyUI workflow with parameter injection.",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
p.add_argument("--workflow", required=True, help="Path to workflow API JSON file")
p.add_argument("--args", default="{}",
help="JSON parameters to inject (or `@/path/to/args.json`)")
p.add_argument("--schema", help="Path to schema JSON (auto-generated if omitted)")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST, help="ComfyUI server URL")
p.add_argument("--api-key",
help=f"API key for cloud (or set ${ENV_API_KEY} env var)")
p.add_argument("--partner-key",
help="Partner-node API key (extra_data.api_key_comfy_org). "
"Required for Flux Pro / Ideogram / etc. Defaults to --api-key if not set.")
p.add_argument("--output-dir", default="./outputs", help="Directory to save outputs")
p.add_argument("--timeout", type=int, default=0,
help="Max seconds to wait (0=auto: 300 / 900 for video workflows)")
p.add_argument("--input-image", action="append", default=[],
help="Upload local image before running. Format: `name=path` or `path`. "
"The `name` becomes the value injected into the matching schema parameter.")
p.add_argument("--randomize-seed", action="store_true",
help="If schema has a 'seed' parameter and --args didn't set one, randomize it")
p.add_argument("--ws", action="store_true",
help="Use WebSocket for real-time progress (requires `websocket-client`)")
p.add_argument("--no-download", action="store_true", help="Skip downloading outputs")
p.add_argument("--flat-output", action="store_true",
help="Don't preserve server-side subfolder structure when saving outputs")
p.add_argument("--overwrite", action="store_true",
help="Overwrite existing files instead of appending _1, _2, ...")
p.add_argument("--submit-only", action="store_true",
help="Submit and return prompt_id without waiting")
p.add_argument("--client-id", help="Override generated client_id (UUID)")
p.add_argument("--use-partner-key-as-auth", action="store_true",
help="(Compat) Use --partner-key value as cloud X-API-Key. Don't use unless you know why.")
args = p.parse_args(argv)
# ---- Load workflow ----
wf_path = Path(args.workflow).expanduser()
if not wf_path.exists():
emit_json({"error": f"Workflow file not found: {args.workflow}"})
return 1
try:
with wf_path.open() as f:
workflow_raw = json.load(f)
workflow = unwrap_workflow(workflow_raw)
except ValueError as e:
emit_json({"error": str(e)})
return 1
except json.JSONDecodeError as e:
emit_json({"error": f"Invalid JSON in workflow file: {e}"})
return 1
# ---- Parse user args ----
args_str = args.args
if args_str.startswith("@"):
try:
args_str = Path(args_str[1:]).read_text()
except OSError as e:
emit_json({"error": f"Cannot read args file: {e}"})
return 1
try:
user_args = json.loads(args_str) if args_str.strip() else {}
except json.JSONDecodeError as e:
emit_json({"error": f"Invalid --args JSON: {e}"})
return 1
if not isinstance(user_args, dict):
emit_json({"error": "--args must be a JSON object"})
return 1
# ---- Resolve API key ----
api_key = resolve_api_key(args.api_key)
partner_key = args.partner_key or None
if args.use_partner_key_as_auth and not api_key and partner_key:
api_key = partner_key
# ---- Connect ----
runner = ComfyRunner(
host=args.host, api_key=api_key, partner_key=partner_key,
client_id=args.client_id,
)
# Server reachability
ok, info = runner.check_server()
if not ok:
emit_json({
"error": f"Cannot reach server at {args.host}",
"details": info,
"hint": (
"Check `comfy launch --background` is running for local, "
f"or set ${ENV_API_KEY} for cloud."
),
})
return 1
# ---- Upload input images ----
upload_warnings: list[str] = []
for spec in args.input_image:
try:
param_name, path = parse_input_image_arg(spec)
except Exception as e:
emit_json({"error": f"Bad --input-image spec '{spec}': {e}"})
return 1
try:
ref = runner.upload_image(path)
except Exception as e:
emit_json({"error": f"Upload failed for {path}: {e}"})
return 1
# Register as a user arg so inject_params consumes it through the schema
uploaded_name = ref.get("name") or path.name
if param_name not in user_args:
user_args[param_name] = uploaded_name
# ---- Inject params ----
schema = load_schema(args.schema, workflow)
workflow, inj_warnings = inject_params(
workflow, schema, user_args, randomize_seed_if_unset=args.randomize_seed,
)
warnings = upload_warnings + inj_warnings
for w in warnings:
log(f"WARN: {w}")
# ---- Submit ----
submit_resp = runner.submit(workflow)
if "_http_error" in submit_resp:
emit_json({
"error": "Submission HTTP error",
"http_status": submit_resp["_http_error"],
"body": submit_resp.get("body"),
})
return 1
if isinstance(submit_resp.get("error"), dict):
emit_json({
"error": "Workflow validation failed",
"details": submit_resp["error"],
"node_errors": submit_resp.get("node_errors"),
})
return 1
prompt_id = submit_resp.get("prompt_id")
if not prompt_id:
emit_json({"error": "No prompt_id in submit response", "response": submit_resp})
return 1
node_errors = submit_resp.get("node_errors") or {}
if node_errors:
emit_json({"error": "Workflow validation failed", "node_errors": node_errors})
return 1
if args.submit_only:
emit_json({"status": "submitted", "prompt_id": prompt_id, "warnings": warnings})
return 0
# ---- Wait ----
timeout = args.timeout
if timeout <= 0:
timeout = 900 if looks_like_video_workflow(workflow) else 300
log(f"Submitted: prompt_id={prompt_id}, waiting (timeout={timeout}s)…")
def _on_progress(evt: dict) -> None:
t = evt.get("type")
if t == "progress":
log(f" step {evt.get('value')}/{evt.get('max')} on node {evt.get('node')}")
elif t == "executing":
node = evt.get("node")
if node:
log(f" executing node {node}")
try:
if args.ws:
wait_result = runner.monitor_ws(prompt_id, timeout=timeout, on_progress=_on_progress)
else:
wait_result = runner.poll_status(prompt_id, timeout=timeout)
except KeyboardInterrupt:
log(f"Interrupted — cancelling job {prompt_id} on server…")
try:
runner.cancel(prompt_id)
except Exception as e:
log(f" (cancel request failed: {e})")
emit_json({
"status": "interrupted",
"prompt_id": prompt_id,
"note": "Ctrl+C received; sent cancellation to server.",
})
return 130
if wait_result["status"] == "timeout":
emit_json({
"status": "timeout",
"prompt_id": prompt_id,
"elapsed": wait_result.get("elapsed"),
"hint": "Re-run with larger --timeout, or use --submit-only and check later.",
})
return 1
if wait_result["status"] == "error":
emit_json({"status": "error", "prompt_id": prompt_id, "details": wait_result.get("data")})
return 1
if wait_result["status"] == "cancelled":
emit_json({"status": "cancelled", "prompt_id": prompt_id})
return 1
# ---- Outputs ----
outputs = wait_result.get("outputs")
if not outputs:
outputs = runner.get_outputs(prompt_id)
if args.no_download:
emit_json({
"status": "success", "prompt_id": prompt_id,
"outputs": outputs, "warnings": warnings,
})
return 0
downloaded = download_outputs(
runner, outputs, Path(args.output_dir).expanduser(),
preserve_subfolder=not args.flat_output, overwrite=args.overwrite,
)
emit_json({
"status": "success",
"prompt_id": prompt_id,
"outputs": downloaded,
"warnings": warnings,
})
return 0
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,267 @@
#!/usr/bin/env python3
"""
ws_monitor.py — Real-time ComfyUI WebSocket monitor.
Connects to /ws and pretty-prints execution events: node start/finish, sampling
progress, cached nodes, errors. Optionally writes preview frames to disk.
Useful for:
- Watching a long-running job in real time without parsing JSON yourself
- Saving in-progress preview frames for video / animation workflows
- Debugging "why is this hanging?" — see exactly which node is stuck
Usage:
# Local — watch all jobs from this client_id
python3 ws_monitor.py
# Cloud — watch a specific prompt_id
python3 ws_monitor.py --host https://cloud.comfy.org \
--prompt-id abc-123-def
# Save preview frames to ./previews/
python3 ws_monitor.py --previews ./previews
Requires: websocket-client (`pip install websocket-client`).
Falls back to a clear error message when not installed.
"""
from __future__ import annotations
import argparse
import json
import struct
import sys
from pathlib import Path
from urllib.parse import urlparse
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _common import ( # noqa: E402
DEFAULT_LOCAL_HOST, ENV_API_KEY, log, new_client_id, resolve_api_key, is_cloud_host,
)
# Binary frame types from ComfyUI WebSocket protocol
BINARY_PREVIEW_IMAGE = 1
BINARY_TEXT = 3
BINARY_PREVIEW_IMAGE_WITH_METADATA = 4
# Image type codes inside PREVIEW_IMAGE
IMAGE_TYPE_JPEG = 1
IMAGE_TYPE_PNG = 2
# ANSI escape codes (works on most modern terminals)
RESET = "\033[0m"
DIM = "\033[2m"
BOLD = "\033[1m"
GREEN = "\033[32m"
YELLOW = "\033[33m"
RED = "\033[31m"
CYAN = "\033[36m"
def fmt_color(s: str, color: str, *, color_on: bool = True) -> str:
return f"{color}{s}{RESET}" if color_on else s
def parse_binary_frame(data: bytes) -> dict | None:
if len(data) < 8:
return None
type_code = struct.unpack(">I", data[0:4])[0]
if type_code == BINARY_PREVIEW_IMAGE:
image_type = struct.unpack(">I", data[4:8])[0]
ext = "jpg" if image_type == IMAGE_TYPE_JPEG else "png" if image_type == IMAGE_TYPE_PNG else "bin"
return {
"kind": "preview",
"image_type": image_type,
"ext": ext,
"image_bytes": data[8:],
}
if type_code == BINARY_PREVIEW_IMAGE_WITH_METADATA:
if len(data) < 12:
return None
meta_len = struct.unpack(">I", data[4:8])[0]
meta_end = 8 + meta_len
if len(data) < meta_end:
return None
try:
meta = json.loads(data[8:meta_end].decode("utf-8"))
except Exception:
meta = {"raw": data[8:meta_end][:200].decode("utf-8", "replace")}
return {
"kind": "preview_with_metadata",
"metadata": meta,
"image_bytes": data[meta_end:],
"ext": "png",
}
if type_code == BINARY_TEXT:
if len(data) < 8:
return None
nid_len = struct.unpack(">I", data[4:8])[0]
nid_end = 8 + nid_len
if len(data) < nid_end:
return None
return {
"kind": "text",
"node_id": data[8:nid_end].decode("utf-8", "replace"),
"text": data[nid_end:].decode("utf-8", "replace"),
}
return {"kind": "unknown", "type_code": type_code, "size": len(data)}
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(description="Real-time ComfyUI WebSocket monitor")
p.add_argument("--host", default=DEFAULT_LOCAL_HOST, help="ComfyUI server URL")
p.add_argument("--api-key", help=f"API key for cloud (or set ${ENV_API_KEY} env var)")
p.add_argument("--client-id", default=None, help="Client ID (default: random UUID)")
p.add_argument("--prompt-id", default=None,
help="Filter to a specific prompt_id (default: all jobs)")
p.add_argument("--previews", default=None,
help="Directory to save in-progress preview frames")
p.add_argument("--no-color", action="store_true", help="Disable ANSI colour")
p.add_argument("--timeout", type=float, default=600.0,
help="Hard cap on monitor duration (default 600s)")
args = p.parse_args(argv)
try:
import websocket # type: ignore[import-not-found]
except ImportError:
print(json.dumps({
"error": "websocket-client not installed",
"install": "pip install websocket-client",
}))
return 1
api_key = resolve_api_key(args.api_key)
cloud = is_cloud_host(args.host)
client_id = args.client_id or new_client_id()
# Build WS URL preserving any base-path component (e.g. behind reverse proxy).
parsed = urlparse(args.host if "://" in args.host else f"http://{args.host}")
scheme = "wss" if parsed.scheme == "https" else "ws"
netloc = parsed.netloc
base_path = parsed.path.rstrip("/")
ws_url = f"{scheme}://{netloc}{base_path}/ws?clientId={client_id}"
if cloud and api_key:
ws_url += f"&token={api_key}"
color_on = not args.no_color and sys.stdout.isatty()
preview_dir = Path(args.previews).expanduser() if args.previews else None
if preview_dir:
preview_dir.mkdir(parents=True, exist_ok=True)
log(f"Saving previews to {preview_dir}")
log(f"Connecting to {ws_url} (client_id={client_id})")
if args.prompt_id:
log(f"Filtering messages to prompt_id={args.prompt_id}")
ws = websocket.create_connection(ws_url, timeout=args.timeout)
ws.settimeout(args.timeout)
preview_counter = 0
try:
while True:
try:
msg = ws.recv()
except websocket.WebSocketTimeoutException:
log(f"Idle for {args.timeout}s — exiting")
return 0
if isinstance(msg, bytes):
parsed = parse_binary_frame(msg)
if parsed is None:
continue
if parsed["kind"] in ("preview", "preview_with_metadata") and preview_dir:
img_bytes = parsed.get("image_bytes", b"")
if img_bytes:
ext = parsed.get("ext", "png")
out = preview_dir / f"preview_{preview_counter:05d}.{ext}"
out.write_bytes(img_bytes)
preview_counter += 1
log(f" [preview] saved {out.name} ({len(img_bytes)} bytes)")
continue
try:
payload = json.loads(msg)
except Exception:
continue
mtype = payload.get("type", "")
mdata = payload.get("data", {}) or {}
pid = mdata.get("prompt_id")
if args.prompt_id and pid and pid != args.prompt_id:
continue
if mtype == "status":
qr = mdata.get("status", {}).get("exec_info", {}).get("queue_remaining", "?")
print(fmt_color(f"[status] queue_remaining={qr}", DIM, color_on=color_on))
elif mtype == "execution_start":
print(fmt_color(f"[start] prompt_id={pid}", BOLD, color_on=color_on))
elif mtype == "executing":
node = mdata.get("node")
if node:
print(fmt_color(f" [executing] node={node}", CYAN, color_on=color_on))
else:
print(fmt_color(f" [executing] (workflow done) prompt_id={pid}", DIM, color_on=color_on))
elif mtype == "progress":
v, m = mdata.get("value", 0), mdata.get("max", 0)
pct = (v / m * 100) if m else 0
print(f" [progress] {v}/{m} ({pct:5.1f}%) node={mdata.get('node')}")
elif mtype == "progress_state":
# Newer extended progress message
nodes = mdata.get("nodes") or {}
running = [k for k, v in nodes.items() if v.get("running")]
if running:
print(fmt_color(f" [progress_state] running={running}", DIM, color_on=color_on))
elif mtype == "executed":
node = mdata.get("node")
out = mdata.get("output") or {}
summary_parts = []
for key in ("images", "video", "videos", "gifs", "audio", "files"):
if out.get(key):
summary_parts.append(f"{key}={len(out[key])}")
summary = ", ".join(summary_parts) if summary_parts else "(no files)"
print(fmt_color(f" [executed] node={node} {summary}", GREEN, color_on=color_on))
elif mtype == "execution_cached":
cached = mdata.get("nodes") or []
if cached:
print(fmt_color(f" [cached] {len(cached)} nodes skipped", DIM, color_on=color_on))
elif mtype == "execution_success":
print(fmt_color(f"[success] prompt_id={pid}", GREEN + BOLD, color_on=color_on))
if args.prompt_id:
return 0
elif mtype == "execution_error":
exc_type = mdata.get("exception_type", "?")
exc_msg = mdata.get("exception_message", "?")
print(fmt_color(f"[error] {exc_type}: {exc_msg}", RED + BOLD, color_on=color_on))
tb = mdata.get("traceback")
if tb:
if isinstance(tb, list):
for line in tb:
print(fmt_color(f" {line}", RED, color_on=color_on))
else:
print(fmt_color(f" {tb}", RED, color_on=color_on))
if args.prompt_id:
return 1
elif mtype == "execution_interrupted":
print(fmt_color(f"[interrupted] prompt_id={pid}", YELLOW, color_on=color_on))
if args.prompt_id:
return 1
elif mtype == "notification":
v = mdata.get("value", "")
print(fmt_color(f"[notification] {v}", DIM, color_on=color_on))
else:
# Unknown / lightly-used types: print compactly
print(fmt_color(f"[{mtype}] {json.dumps(mdata, default=str)[:200]}", DIM, color_on=color_on))
except KeyboardInterrupt:
log("Interrupted")
return 130
finally:
try:
ws.close()
except Exception:
pass
if __name__ == "__main__":
sys.exit(main())
@@ -0,0 +1,50 @@
# ComfyUI Skill Tests
Pytest suite covering the skill's scripts. Pure-stdlib unit tests run
without any setup; cloud integration tests need a Comfy Cloud API key.
## Running
```bash
# Unit tests only (no network required) — runs in <1s
python3 -m pytest tests/ -c tests/pytest.ini -o addopts="-p no:xdist"
# Including cloud integration tests
COMFY_CLOUD_API_KEY="comfyui-..." python3 -m pytest tests/ \
-c tests/pytest.ini -o addopts="-p no:xdist"
# Just cloud tests
COMFY_CLOUD_API_KEY="comfyui-..." python3 -m pytest tests/test_cloud_integration.py \
-c tests/pytest.ini -o addopts="-p no:xdist" -v
```
The `-c` and `-o` overrides isolate this suite from any parent
`pyproject.toml` pytest config (e.g. the `-n auto` from a parent repo).
## Test files
| File | Coverage |
|------|----------|
| `test_common.py` | Cloud detection, URL routing, format validation, embeddings, paths, seeds, model-list parsing, folder aliases |
| `test_extract_schema.py` | Connection tracing, positive/negative prompt detection, dedup logic, embedding deps |
| `test_run_workflow.py` | Param injection (incl. -1 seed, link refusal), output download walk, runner construction |
| `test_check_deps.py` | Model-name fuzzy matching, install command suggestions |
| `test_cloud_integration.py` | Live cloud API contract tests (auto-skipped without API key) |
## Adding tests
When you change a script:
1. Add a unit test if the change is pure logic (cloud detection, parsing, etc.)
2. Add a cloud integration test if the change depends on cloud API behavior
(use `pytestmark = pytest.mark.cloud` so it auto-skips without a key)
3. Workflow fixtures live in `conftest.py` (`sd15_workflow`, `flux_workflow`,
`video_workflow`)
## Why the explicit `-c` / `-o`?
The parent hermes-agent repo's `pyproject.toml` enables `pytest-xdist` by
default (`-n auto`). This suite is small enough that parallelism isn't
worth the complexity, and pytest-xdist isn't always installed in the user's
environment. The `-c tests/pytest.ini -o addopts="-p no:xdist"` flags make
the suite run identically regardless of the parent project's config.
@@ -0,0 +1,64 @@
"""Pytest configuration for the comfyui skill test suite.
Adds `scripts/` to sys.path so tests can `from _common import ...`, and
provides a few common fixtures.
"""
from __future__ import annotations
import json
import os
import sys
from pathlib import Path
import pytest
ROOT = Path(__file__).resolve().parent.parent
SCRIPTS = ROOT / "scripts"
WORKFLOWS = ROOT / "workflows"
sys.path.insert(0, str(SCRIPTS))
@pytest.fixture
def sd15_workflow() -> dict:
return json.loads((WORKFLOWS / "sd15_txt2img.json").read_text())
@pytest.fixture
def flux_workflow() -> dict:
return json.loads((WORKFLOWS / "flux_dev_txt2img.json").read_text())
@pytest.fixture
def video_workflow() -> dict:
return json.loads((WORKFLOWS / "wan_video_t2v.json").read_text())
@pytest.fixture
def workflows_dir() -> Path:
return WORKFLOWS
@pytest.fixture
def scripts_dir() -> Path:
return SCRIPTS
@pytest.fixture
def cloud_key() -> str | None:
"""Cloud API key if set, otherwise None.
Tests that need cloud connectivity should skip when this is None.
"""
return os.environ.get("COMFY_CLOUD_API_KEY")
def pytest_collection_modifyitems(config, items):
"""Auto-skip cloud tests when no API key is set."""
if os.environ.get("COMFY_CLOUD_API_KEY"):
return
skip_cloud = pytest.mark.skip(reason="Set COMFY_CLOUD_API_KEY to run cloud tests")
for item in items:
if "cloud" in item.keywords:
item.add_marker(skip_cloud)
@@ -0,0 +1,5 @@
[pytest]
markers =
cloud: tests that hit live Comfy Cloud API (require COMFY_CLOUD_API_KEY)
testpaths = .
addopts = -p no:xdist
@@ -0,0 +1,68 @@
"""Tests for check_deps.py — focuses on parsing logic that doesn't need a server."""
from __future__ import annotations
from check_deps import (
NODE_TO_PACKAGE,
model_present,
normalize_for_match,
suggest_install_command,
)
class TestNormalizeForMatch:
def test_basic(self):
s = normalize_for_match("model.safetensors")
assert "model.safetensors" in s
assert "model" in s
def test_subfolder(self):
s = normalize_for_match("subdir/model.pt")
assert "subdir/model.pt" in s
assert "model.pt" in s
assert "model" in s
class TestModelPresent:
def test_exact_match(self):
assert model_present("a.safetensors", {"a.safetensors", "b.safetensors"}) is True
def test_extension_difference(self):
# User said "model" but installed is "model.safetensors"
assert model_present("model", {"model.safetensors"}) is True
# Reverse direction — also matches
assert model_present("model.safetensors", {"model"}) is True
def test_subfolder_match(self):
# Installed list has "subdir/model.safetensors", workflow asks "model.safetensors"
assert model_present("model.safetensors", {"subdir/model.safetensors"}) is True
def test_missing(self):
assert model_present("missing.safetensors", {"a.safetensors", "b.safetensors"}) is False
def test_empty_installed(self):
assert model_present("anything.safetensors", set()) is False
class TestSuggestInstallCommand:
def test_known_node(self):
cmd = suggest_install_command("VHS_VideoCombine")
assert cmd == "comfy node install comfyui-videohelpersuite"
def test_unknown_node(self):
assert suggest_install_command("SomeRandomNodeName123") is None
class TestNodePackageMap:
def test_no_duplicates(self):
# Each node should map to exactly one package
keys = list(NODE_TO_PACKAGE.keys())
assert len(keys) == len(set(keys))
def test_packages_are_safe_for_shell(self):
# Registry slugs must be alphanumerics + hyphens/underscores only
# (passed straight to `comfy node install <pkg>`).
import re
safe = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._\-]*$")
for pkg in NODE_TO_PACKAGE.values():
assert safe.match(pkg), f"Unsafe package slug: {pkg!r}"
@@ -0,0 +1,95 @@
"""Integration tests against the live Comfy Cloud API.
These tests are auto-skipped when COMFY_CLOUD_API_KEY is not set.
They never SUBMIT workflows (would need a paid subscription) — they only
verify the read-only endpoints we rely on.
"""
from __future__ import annotations
import pytest
from _common import http_get, parse_model_list, resolve_url
pytestmark = pytest.mark.cloud
class TestCloudEndpointsLive:
def test_system_stats_reachable(self, cloud_key):
url = resolve_url("https://cloud.comfy.org", "/system_stats")
r = http_get(url, headers={"X-API-Key": cloud_key})
assert r.status == 200
data = r.json()
assert "system" in data
def test_models_endpoint_routed_to_experiment(self, cloud_key):
# We expect the skill to route /models/checkpoints → /api/experiment/models/checkpoints
url = resolve_url("https://cloud.comfy.org", "/models/checkpoints")
assert "/api/experiment/models/checkpoints" in url
r = http_get(url, headers={"X-API-Key": cloud_key})
assert r.status == 200
def test_models_endpoint_returns_dicts(self, cloud_key):
url = resolve_url("https://cloud.comfy.org", "/models/checkpoints")
r = http_get(url, headers={"X-API-Key": cloud_key})
data = r.json()
assert isinstance(data, list)
if data:
# Cloud format: list of dicts with `name`
assert isinstance(data[0], dict)
assert "name" in data[0]
# Our parser normalizes both
normalized = parse_model_list(data)
assert len(normalized) == len(data)
def test_history_renamed_to_v2(self, cloud_key):
# /history → /api/history_v2 on cloud
url = resolve_url("https://cloud.comfy.org", "/history/some-fake-id")
assert "/api/history_v2/some-fake-id" in url
def test_object_info_paid_tier(self, cloud_key):
# On free tier, /object_info returns 403 with a recognizable message
url = resolve_url("https://cloud.comfy.org", "/object_info")
r = http_get(url, headers={"X-API-Key": cloud_key})
# Should be either 200 (paid) or 403 (free) — not 404 / 500
assert r.status in (200, 403)
if r.status == 403:
# Body should mention the limitation
assert "free tier" in r.text().lower() or "subscription" in r.text().lower()
class TestCloudCheckDepsLive:
def test_check_deps_against_cloud(self, cloud_key, sd15_workflow):
from check_deps import check_deps
report = check_deps(sd15_workflow, host="https://cloud.comfy.org", api_key=cloud_key)
# Either node check passed OR was skipped (free tier)
assert "missing_models" in report
assert "is_cloud" in report and report["is_cloud"] is True
def test_flux_workflow_models_resolved_via_aliases(self, cloud_key, flux_workflow):
"""Flux uses unet/clip folders; cloud has them in diffusion_models/text_encoders.
With folder aliasing, the check should still find them."""
from check_deps import check_deps
report = check_deps(flux_workflow, host="https://cloud.comfy.org", api_key=cloud_key)
# The exact required Flux files (flux1-dev.safetensors, t5xxl_fp16, clip_l, ae)
# are present on cloud; with folder aliasing, none should be missing.
# If this fails, either the cloud removed the model or the aliasing logic broke.
missing_filenames = {m["value"] for m in report["missing_models"]}
assert "ae.safetensors" not in missing_filenames, \
"ae.safetensors should be on cloud's vae folder"
# t5xxl_fp16 / clip_l should be reachable via the clip → text_encoders alias
# flux1-dev.safetensors likewise via unet → diffusion_models
class TestHealthCheckLive:
def test_health_check_passes(self, cloud_key, capsys):
from health_check import main as health_main
rc = health_main(["--host", "https://cloud.comfy.org", "--api-key", cloud_key])
captured = capsys.readouterr()
# Should produce JSON
import json
report = json.loads(captured.out)
assert report["server"]["reachable"] is True
assert report["checkpoints"]["queryable"] is True
assert report["checkpoints"]["count"] > 0
@@ -0,0 +1,447 @@
"""Unit tests for _common.py — pure logic only, no network."""
from __future__ import annotations
from pathlib import Path
import pytest
from _common import (
DEFAULT_LOCAL_HOST,
EMBEDDING_REGEX,
FOLDER_ALIASES,
build_cloud_aware_url,
cloud_endpoint,
coerce_seed,
folder_aliases_for,
is_api_format,
is_cloud_host,
is_link,
iter_embedding_refs,
iter_model_deps,
iter_nodes,
looks_like_video_workflow,
media_type_from_filename,
parse_model_list,
resolve_url,
safe_path_join,
unwrap_workflow,
)
# =============================================================================
# Cloud detection / URL routing
# =============================================================================
class TestCloudDetection:
def test_cloud_host_exact(self):
assert is_cloud_host("https://cloud.comfy.org") is True
assert is_cloud_host("https://cloud.comfy.org/foo/bar") is True
def test_cloud_host_subdomain(self):
assert is_cloud_host("https://staging.cloud.comfy.org") is True
assert is_cloud_host("https://api.cloud.comfy.org") is True
def test_local_not_cloud(self):
assert is_cloud_host("http://127.0.0.1:8188") is False
assert is_cloud_host("http://localhost:8188") is False
assert is_cloud_host("http://my-server.local:8188") is False
def test_no_scheme(self):
# Defaults to http://
assert is_cloud_host("cloud.comfy.org") is True
assert is_cloud_host("127.0.0.1:8188") is False
class TestCloudEndpointRename:
def test_history_renamed(self):
assert cloud_endpoint("/history") == "/history_v2"
assert cloud_endpoint("/history/abc-123") == "/history_v2/abc-123"
def test_history_v2_preserved(self):
assert cloud_endpoint("/history_v2") == "/history_v2"
def test_models_renamed(self):
assert cloud_endpoint("/models") == "/experiment/models"
assert cloud_endpoint("/models/checkpoints") == "/experiment/models/checkpoints"
assert cloud_endpoint("/models/loras") == "/experiment/models/loras"
def test_other_paths_unchanged(self):
assert cloud_endpoint("/prompt") == "/prompt"
assert cloud_endpoint("/queue") == "/queue"
class TestResolveURL:
def test_local_no_prefix(self):
assert resolve_url("http://127.0.0.1:8188", "/prompt") == "http://127.0.0.1:8188/prompt"
def test_cloud_adds_api_prefix(self):
assert resolve_url("https://cloud.comfy.org", "/prompt") == "https://cloud.comfy.org/api/prompt"
def test_cloud_history_renamed(self):
assert resolve_url("https://cloud.comfy.org", "/history/abc") == "https://cloud.comfy.org/api/history_v2/abc"
def test_cloud_models_renamed(self):
assert resolve_url("https://cloud.comfy.org", "/models/loras") == "https://cloud.comfy.org/api/experiment/models/loras"
def test_cloud_already_has_api(self):
# Don't double-prefix
assert resolve_url("https://cloud.comfy.org", "/api/prompt") == "https://cloud.comfy.org/api/prompt"
def test_trailing_slash_stripped(self):
assert resolve_url("http://127.0.0.1:8188/", "/prompt") == "http://127.0.0.1:8188/prompt"
# =============================================================================
# Workflow validation
# =============================================================================
class TestAPIFormatDetection:
def test_valid_api(self, sd15_workflow):
assert is_api_format(sd15_workflow) is True
def test_editor_format_rejected(self):
editor = {"nodes": [], "links": [], "version": 0.4}
assert is_api_format(editor) is False
def test_empty_dict(self):
assert is_api_format({}) is False
def test_non_dict(self):
assert is_api_format([]) is False
assert is_api_format(None) is False
assert is_api_format("string") is False
def test_node_with_class_type(self):
wf = {"3": {"class_type": "KSampler", "inputs": {}}}
assert is_api_format(wf) is True
class TestUnwrapWorkflow:
def test_passthrough_api_format(self, sd15_workflow):
result = unwrap_workflow(sd15_workflow)
assert result is sd15_workflow
def test_unwrap_prompt_key(self, sd15_workflow):
wrapped = {"prompt": sd15_workflow, "client_id": "abc"}
result = unwrap_workflow(wrapped)
assert result is sd15_workflow
def test_editor_format_raises(self):
with pytest.raises(ValueError, match="editor format"):
unwrap_workflow({"nodes": [], "links": []})
def test_garbage_raises(self):
with pytest.raises(ValueError):
unwrap_workflow({"foo": "bar"})
class TestIsLink:
def test_valid_link(self):
assert is_link(["3", 0]) is True
assert is_link(["10", 1]) is True
def test_non_link(self):
assert is_link("string") is False
assert is_link(42) is False
assert is_link([]) is False
assert is_link(["3"]) is False # missing slot
assert is_link(["3", "0"]) is False # slot must be int
assert is_link([3, 0]) is False # node_id must be string
# =============================================================================
# Workflow iterators
# =============================================================================
class TestIterators:
def test_iter_nodes(self, sd15_workflow):
nodes = dict(iter_nodes(sd15_workflow))
assert "3" in nodes
assert nodes["3"]["class_type"] == "KSampler"
def test_iter_nodes_skips_comments(self, sd15_workflow):
# _comment is not a node
nodes = dict(iter_nodes(sd15_workflow))
assert "_comment" not in nodes
def test_iter_model_deps(self, sd15_workflow):
deps = list(iter_model_deps(sd15_workflow))
names = [d["value"] for d in deps]
assert "v1-5-pruned-emaonly.safetensors" in names
def test_iter_model_deps_flux(self, flux_workflow):
deps = list(iter_model_deps(flux_workflow))
names = {d["value"]: d["folder"] for d in deps}
assert names["flux1-dev.safetensors"] == "unet"
assert names["t5xxl_fp16.safetensors"] == "clip"
assert names["clip_l.safetensors"] == "clip"
assert names["ae.safetensors"] == "vae"
# =============================================================================
# Embedding extraction
# =============================================================================
class TestEmbeddingRegex:
def test_basic_embedding(self):
m = EMBEDDING_REGEX.search("a cat, embedding:goodvibes, more text")
assert m is not None
assert m.group(1) == "goodvibes"
def test_embedding_with_strength(self):
m = EMBEDDING_REGEX.search("embedding:bad-hands-5:1.2")
assert m is not None
assert m.group(1) == "bad-hands-5"
def test_embedding_with_extension(self):
# Strips .pt / .safetensors / .bin
m = EMBEDDING_REGEX.search("embedding:my-emb.pt")
assert m is not None
assert m.group(1) == "my-emb"
def test_embedding_in_parens(self):
m = EMBEDDING_REGEX.search("(embedding:foo:0.8)")
assert m is not None
assert m.group(1) == "foo"
def test_multiple_in_one_string(self):
text = "a cat, embedding:foo:1.2, and embedding:bar"
matches = [m.group(1) for m in EMBEDDING_REGEX.finditer(text)]
assert matches == ["foo", "bar"]
def test_no_false_positive_on_word_embedding(self):
# "embedding " (with space, no colon) should not match
m = EMBEDDING_REGEX.search("the embedding is great")
assert m is None
class TestIterEmbeddingRefs:
def test_finds_in_clip_text_encode(self):
wf = {
"1": {"class_type": "CLIPTextEncode",
"inputs": {"text": "embedding:foo, embedding:bar:0.5", "clip": ["2", 0]}},
"2": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "x"}},
}
refs = list(iter_embedding_refs(wf))
names = [name for _, name in refs]
assert names == ["foo", "bar"]
def test_ignores_non_prompt_fields(self):
wf = {
"1": {"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "embedding:foo.safetensors"}},
}
refs = list(iter_embedding_refs(wf))
# ckpt_name is not a prompt field — ignored
assert refs == []
# =============================================================================
# Path safety
# =============================================================================
class TestSafePathJoin:
def test_normal_join(self, tmp_path):
p = safe_path_join(tmp_path, "subdir", "file.png")
assert p.is_relative_to(tmp_path)
def test_blocks_traversal(self, tmp_path):
with pytest.raises(ValueError, match="path traversal"):
safe_path_join(tmp_path, "..", "..", "etc", "passwd")
def test_blocks_absolute(self, tmp_path):
with pytest.raises(ValueError):
safe_path_join(tmp_path, "/etc/passwd")
def test_subfolder_with_filename(self, tmp_path):
p = safe_path_join(tmp_path, "outputs", "img.png")
assert p.name == "img.png"
assert p.parent.name == "outputs"
# =============================================================================
# Seed coercion
# =============================================================================
class TestCoerceSeed:
def test_explicit_int(self):
assert coerce_seed(42) == 42
assert coerce_seed(0) == 0
def test_minus_one_randomizes(self):
s = coerce_seed(-1)
assert isinstance(s, int)
assert 0 <= s < 2**63
def test_none_randomizes(self):
s = coerce_seed(None)
assert isinstance(s, int)
def test_string_int(self):
# str() that converts cleanly is allowed (relaxed)
assert coerce_seed("12345") == 12345
def test_string_minus_one_randomizes(self):
# CLI / JSON sometimes carries seed as a string.
s = coerce_seed("-1")
assert isinstance(s, int)
assert 0 <= s < 2**63
# And whitespace tolerated
s2 = coerce_seed(" -1 ")
assert isinstance(s2, int)
assert 0 <= s2 < 2**63
# =============================================================================
# Model list normalization (cloud format)
# =============================================================================
class TestParseModelList:
def test_local_format_strings(self):
result = parse_model_list(["a.safetensors", "b.safetensors"])
assert result == {"a.safetensors", "b.safetensors"}
def test_cloud_format_dicts(self):
result = parse_model_list([
{"name": "a.safetensors", "pathIndex": 0},
{"name": "b.safetensors", "pathIndex": 1},
])
assert result == {"a.safetensors", "b.safetensors"}
def test_empty(self):
assert parse_model_list([]) == set()
def test_garbage(self):
assert parse_model_list("not a list") == set()
assert parse_model_list(None) == set()
def test_mixed_format(self):
result = parse_model_list([
"string-form.safetensors",
{"name": "dict-form.safetensors"},
])
assert result == {"string-form.safetensors", "dict-form.safetensors"}
# =============================================================================
# Folder aliases
# =============================================================================
class TestFolderAliases:
def test_unet_aliases_diffusion_models(self):
aliases = folder_aliases_for("unet")
assert "unet" in aliases
assert "diffusion_models" in aliases
def test_clip_aliases_text_encoders(self):
aliases = folder_aliases_for("clip")
assert "clip" in aliases
assert "text_encoders" in aliases
def test_unknown_folder_returns_self(self):
assert folder_aliases_for("checkpoints") == ["checkpoints"]
def test_primary_first(self):
# Order matters: primary should be first for human-friendly fix hints
assert folder_aliases_for("unet")[0] == "unet"
assert folder_aliases_for("diffusion_models")[0] == "diffusion_models"
# =============================================================================
# Media-type detection
# =============================================================================
class TestMediaType:
def test_video_extensions(self):
assert media_type_from_filename("vid.mp4") == "video"
assert media_type_from_filename("foo.webm") == "video"
assert media_type_from_filename("bar.gif") == "video"
def test_audio_extensions(self):
assert media_type_from_filename("song.wav") == "audio"
assert media_type_from_filename("music.mp3") == "audio"
def test_image_default(self):
assert media_type_from_filename("pic.png") == "image"
assert media_type_from_filename("image.jpg") == "image"
assert media_type_from_filename("unknown.xyz") == "image"
def test_3d(self):
assert media_type_from_filename("model.glb") == "3d"
assert media_type_from_filename("scene.gltf") == "3d"
# =============================================================================
# Cross-host header stripping (security)
# =============================================================================
class TestRedirectHeaderStripping:
"""Verify X-API-Key is dropped when redirect crosses to a different host
(e.g. cloud /api/view → S3 signed URL). Critical to prevent leaking auth
tokens to the storage backend.
"""
def _build_session(self):
from _common import _StripSensitiveOnRedirectSession, HAS_REQUESTS
if not HAS_REQUESTS:
import pytest
pytest.skip("requests not installed")
return _StripSensitiveOnRedirectSession()
def test_strips_x_api_key_cross_host(self):
import requests
s = self._build_session()
prep = requests.PreparedRequest()
prep.prepare(method="GET", url="https://other.example.com/file",
headers={"X-API-Key": "leak", "Authorization": "Bearer x"})
resp = requests.Response()
orig = requests.PreparedRequest()
orig.prepare(method="GET", url="https://cloud.comfy.org/api/view", headers={})
resp.request = orig
s.rebuild_auth(prep, resp)
assert "X-API-Key" not in prep.headers
assert "Authorization" not in prep.headers
def test_preserves_x_api_key_same_host(self):
import requests
s = self._build_session()
prep = requests.PreparedRequest()
prep.prepare(method="GET", url="https://cloud.comfy.org/foo",
headers={"X-API-Key": "keep"})
resp = requests.Response()
orig = requests.PreparedRequest()
orig.prepare(method="GET", url="https://cloud.comfy.org/bar", headers={})
resp.request = orig
s.rebuild_auth(prep, resp)
assert prep.headers.get("X-API-Key") == "keep"
def test_strips_cookie_cross_host(self):
import requests
s = self._build_session()
prep = requests.PreparedRequest()
prep.prepare(method="GET", url="https://other.example.com/x",
headers={"Cookie": "session=secret"})
resp = requests.Response()
orig = requests.PreparedRequest()
orig.prepare(method="GET", url="https://cloud.comfy.org/foo", headers={})
resp.request = orig
s.rebuild_auth(prep, resp)
assert "Cookie" not in prep.headers
# =============================================================================
# Video workflow detection
# =============================================================================
class TestVideoWorkflow:
def test_image_workflow(self, sd15_workflow):
assert looks_like_video_workflow(sd15_workflow) is False
def test_animatediff_workflow(self, workflows_dir):
import json
wf = json.loads((workflows_dir / "animatediff_video.json").read_text())
assert looks_like_video_workflow(wf) is True
def test_wan_workflow(self, video_workflow):
assert looks_like_video_workflow(video_workflow) is True
@@ -0,0 +1,185 @@
"""Tests for extract_schema.py."""
from __future__ import annotations
import pytest
from extract_schema import (
extract_schema,
find_negative_prompt_node,
find_positive_prompt_node,
trace_to_node,
)
# =============================================================================
# Connection tracing
# =============================================================================
class TestConnectionTracing:
def test_direct_link(self):
wf = {
"1": {"class_type": "CLIPTextEncode", "inputs": {"text": "x"}},
"2": {"class_type": "KSampler",
"inputs": {"positive": ["1", 0], "negative": ["1", 0]}},
}
assert trace_to_node(wf, ["1", 0]) == "1"
def test_through_reroute(self):
wf = {
"1": {"class_type": "CLIPTextEncode", "inputs": {"text": "x"}},
"2": {"class_type": "Reroute", "inputs": {"input": ["1", 0]}},
"3": {"class_type": "Reroute", "inputs": {"input": ["2", 0]}},
}
assert trace_to_node(wf, ["3", 0]) == "1"
def test_circular_safe(self):
wf = {
"1": {"class_type": "Reroute", "inputs": {"input": ["2", 0]}},
"2": {"class_type": "Reroute", "inputs": {"input": ["1", 0]}},
}
# Should hit max_hops without infinite loop
result = trace_to_node(wf, ["1", 0], max_hops=5)
assert result in ("1", "2") # any node, just don't hang
class TestPositiveNegativeDetection:
def test_basic(self, sd15_workflow):
# In sd15_workflow.json node 6 is positive, node 7 is negative
assert find_positive_prompt_node(sd15_workflow) == "6"
assert find_negative_prompt_node(sd15_workflow) == "7"
def test_swapped_order(self):
wf = {
"3": {"class_type": "KSampler",
"inputs": {
"positive": ["7", 0], "negative": ["6", 0],
"model": ["4", 0], "latent_image": ["5", 0],
"seed": 1, "steps": 20, "cfg": 7.5,
"sampler_name": "euler", "scheduler": "normal", "denoise": 1.0,
}},
"4": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "x"}},
"5": {"class_type": "EmptyLatentImage", "inputs": {"width": 512, "height": 512, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"text": "ugly", "clip": ["4", 1]}},
"7": {"class_type": "CLIPTextEncode", "inputs": {"text": "beautiful", "clip": ["4", 1]}},
}
# Now 7 is the positive (despite higher node ID)
assert find_positive_prompt_node(wf) == "7"
assert find_negative_prompt_node(wf) == "6"
# =============================================================================
# Schema extraction
# =============================================================================
class TestExtractSchema:
def test_basic_sd15(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
params = schema["parameters"]
assert "prompt" in params
assert "negative_prompt" in params
assert "seed" in params
assert "steps" in params
assert "cfg" in params
assert "width" in params
assert "height" in params
def test_prompt_value_correct(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
# The positive prompt in the example is the landscape one
assert "landscape" in schema["parameters"]["prompt"]["value"]
assert "ugly" in schema["parameters"]["negative_prompt"]["value"]
def test_model_dependencies(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
deps = schema["model_dependencies"]
ckpts = [d["value"] for d in deps if d["folder"] == "checkpoints"]
assert "v1-5-pruned-emaonly.safetensors" in ckpts
def test_output_nodes(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
assert "9" in schema["output_nodes"]
def test_summary(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
s = schema["summary"]
assert s["has_negative_prompt"] is True
assert s["has_seed"] is True
assert s["is_video_workflow"] is False
assert s["parameter_count"] > 5
def test_flux_workflow(self, flux_workflow):
schema = extract_schema(flux_workflow)
# Flux uses RandomNoise for seed
assert schema["summary"]["has_seed"] is True
# Flux has only positive prompt (no negative encoder)
assert schema["summary"]["has_negative_prompt"] is False
def test_video_detected(self, video_workflow):
schema = extract_schema(video_workflow)
assert schema["summary"]["is_video_workflow"] is True
class TestEmbeddingDeps:
def test_extract_from_prompt(self):
wf = {
"1": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "x"}},
"5": {"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode",
"inputs": {
"text": "a cat, embedding:goodvibes, embedding:art:1.2",
"clip": ["1", 1]
}},
"7": {"class_type": "CLIPTextEncode",
"inputs": {
"text": "ugly, embedding:badhands",
"clip": ["1", 1]
}},
"3": {"class_type": "KSampler",
"inputs": {
"positive": ["6", 0], "negative": ["7", 0],
"model": ["1", 0], "latent_image": ["5", 0],
"seed": 1, "steps": 20, "cfg": 7.5,
"sampler_name": "euler", "scheduler": "normal", "denoise": 1.0,
}},
"9": {"class_type": "SaveImage", "inputs": {"filename_prefix": "x", "images": ["3", 0]}},
}
schema = extract_schema(wf)
names = [d["embedding_name"] for d in schema["embedding_dependencies"]]
assert sorted(names) == ["art", "badhands", "goodvibes"]
class TestDuplicateDeduplication:
def test_two_ksamplers_get_unique_names(self):
wf = {
"1": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "x"}},
"5": {"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode", "inputs": {"text": "a", "clip": ["1", 1]}},
"7": {"class_type": "CLIPTextEncode", "inputs": {"text": "b", "clip": ["1", 1]}},
"3": {"class_type": "KSampler",
"inputs": {
"positive": ["6", 0], "negative": ["7", 0],
"model": ["1", 0], "latent_image": ["5", 0],
"seed": 42, "steps": 20, "cfg": 7.5,
"sampler_name": "euler", "scheduler": "normal", "denoise": 1.0,
}},
"4": {"class_type": "KSampler",
"inputs": {
"positive": ["6", 0], "negative": ["7", 0],
"model": ["1", 0], "latent_image": ["5", 0],
"seed": 99, "steps": 30, "cfg": 8.0,
"sampler_name": "euler", "scheduler": "normal", "denoise": 0.6,
}},
"9": {"class_type": "SaveImage", "inputs": {"filename_prefix": "x", "images": ["3", 0]}},
}
schema = extract_schema(wf)
params = schema["parameters"]
# Both seeds present with disambiguated names
seed_keys = [k for k in params if "seed" in k]
# Symmetric: both renamed (no bare "seed")
assert "seed" not in params
assert "seed_3" in params and "seed_4" in params
assert params["seed_3"]["value"] == 42
assert params["seed_4"]["value"] == 99
@@ -0,0 +1,213 @@
"""Tests for run_workflow.py — focuses on logic that doesn't require a server."""
from __future__ import annotations
import copy
import json
import pytest
from extract_schema import extract_schema
from run_workflow import (
ComfyRunner,
download_outputs,
inject_params,
parse_input_image_arg,
)
class TestParseInputImageArg:
def test_with_name(self, tmp_path):
f = tmp_path / "x.png"
f.write_text("x")
n, p = parse_input_image_arg(f"image={f}")
assert n == "image"
assert p == f
def test_without_name_defaults(self, tmp_path):
f = tmp_path / "x.png"
f.write_text("x")
n, p = parse_input_image_arg(str(f))
assert n == "image"
def test_custom_name(self, tmp_path):
f = tmp_path / "x.png"
f.write_text("x")
n, p = parse_input_image_arg(f"mask_image={f}")
assert n == "mask_image"
class TestInjectParams:
def test_basic_injection(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
wf, warnings = inject_params(sd15_workflow, schema, {
"prompt": "new prompt",
"seed": 999,
"steps": 25,
})
assert wf["6"]["inputs"]["text"] == "new prompt"
assert wf["3"]["inputs"]["seed"] == 999
assert wf["3"]["inputs"]["steps"] == 25
assert warnings == []
def test_unknown_param_warns(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
_, warnings = inject_params(sd15_workflow, schema, {"foobar": "x"})
assert any("foobar" in w for w in warnings)
def test_seed_minus_one_randomizes(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
wf, warnings = inject_params(sd15_workflow, schema, {"seed": -1})
assert wf["3"]["inputs"]["seed"] != -1
assert isinstance(wf["3"]["inputs"]["seed"], int)
assert any("expanded" in w.lower() for w in warnings)
def test_randomize_seed_when_unset(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
original = sd15_workflow["3"]["inputs"]["seed"]
wf, warnings = inject_params(sd15_workflow, schema, {}, randomize_seed_if_unset=True)
assert wf["3"]["inputs"]["seed"] != original
assert isinstance(wf["3"]["inputs"]["seed"], int)
def test_does_not_mutate_original(self, sd15_workflow):
schema = extract_schema(sd15_workflow)
original_text = sd15_workflow["6"]["inputs"]["text"]
inject_params(sd15_workflow, schema, {"prompt": "MUTATED"})
assert sd15_workflow["6"]["inputs"]["text"] == original_text
def test_refuses_to_overwrite_link(self):
wf = {
"1": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "x"}},
"5": {"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1}},
"6": {"class_type": "CLIPTextEncode",
"inputs": {"text": ["3", 0], "clip": ["1", 1]}}, # text is a link!
"3": {"class_type": "KSampler",
"inputs": {"seed": 1, "steps": 20, "cfg": 7.5,
"sampler_name": "euler", "scheduler": "normal", "denoise": 1.0,
"model": ["1", 0], "positive": ["6", 0], "negative": ["6", 0],
"latent_image": ["5", 0]}},
"9": {"class_type": "SaveImage", "inputs": {"filename_prefix": "x", "images": ["3", 0]}},
}
# Manually create a schema that has prompt pointing at 6.text
schema = {
"parameters": {
"prompt": {"node_id": "6", "field": "text", "type": "string", "value": ""},
}
}
wf2, warnings = inject_params(wf, schema, {"prompt": "literal value"})
# The link should NOT have been overwritten
assert wf2["6"]["inputs"]["text"] == ["3", 0]
assert any("link" in w.lower() for w in warnings)
# =============================================================================
# Output download walk
# =============================================================================
class TestDownloadOutputsWalk:
"""Test that download_outputs walks the structure correctly."""
def test_handles_videos_plural(self, tmp_path, monkeypatch):
"""Local ComfyUI uses 'videos'/'gifs' (plural) keys."""
downloads = []
class FakeRunner:
def download_output(self, *, filename, subfolder, file_type, output_dir, preserve_subfolder, overwrite):
downloads.append((filename, subfolder, file_type))
p = output_dir / filename
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(b"x")
return p
outputs = {
"9": {"images": [{"filename": "img1.png", "subfolder": "", "type": "output"}]},
"10": {"videos": [{"filename": "vid1.mp4", "subfolder": "", "type": "output"}]},
"11": {"gifs": [{"filename": "anim1.gif", "subfolder": "", "type": "output"}]},
}
result = download_outputs(FakeRunner(), outputs, tmp_path)
files = sorted(d["filename"] for d in result)
assert files == ["anim1.gif", "img1.png", "vid1.mp4"]
def test_handles_video_singular_cloud(self, tmp_path):
"""Cloud uses 'video' (singular)."""
class FakeRunner:
def download_output(self, *, filename, subfolder, file_type, output_dir, preserve_subfolder, overwrite):
p = output_dir / filename
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(b"x")
return p
outputs = {
"10": {"video": [{"filename": "cloud.mp4", "subfolder": "", "type": "output"}]},
}
result = download_outputs(FakeRunner(), outputs, tmp_path)
assert len(result) == 1
assert result[0]["filename"] == "cloud.mp4"
def test_preserves_subfolder(self, tmp_path):
"""When preserve_subfolder=True, server subfolder becomes local subdir."""
class FakeRunner:
def download_output(self, *, filename, subfolder, file_type, output_dir, preserve_subfolder, overwrite):
if preserve_subfolder and subfolder:
p = output_dir / subfolder / filename
else:
p = output_dir / filename
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(b"x")
return p
outputs = {
"9": {"images": [
{"filename": "img.png", "subfolder": "myrun", "type": "output"},
{"filename": "img.png", "subfolder": "otherrun", "type": "output"},
]},
}
result = download_outputs(FakeRunner(), outputs, tmp_path, preserve_subfolder=True)
files = [d["file"] for d in result]
assert any("myrun" in f for f in files)
assert any("otherrun" in f for f in files)
# Both must exist (no collision)
assert len({str(f) for f in files}) == 2
# =============================================================================
# ComfyRunner construction
# =============================================================================
class TestRunnerConstruction:
def test_local_default(self):
r = ComfyRunner()
assert r.is_cloud is False
assert r.host == "http://127.0.0.1:8188"
def test_cloud_detection(self):
r = ComfyRunner(host="https://cloud.comfy.org", api_key="abc")
assert r.is_cloud is True
assert "X-API-Key" in r.headers
def test_cloud_subdomain_detected(self):
r = ComfyRunner(host="https://staging.cloud.comfy.org", api_key="abc")
assert r.is_cloud is True
def test_partner_key_does_not_pollute_extra_data(self):
r = ComfyRunner(host="https://cloud.comfy.org", api_key="auth-key")
# No partner-key set → no extra_data should appear in submitted prompt
# (This is a static check; runtime check happens in submit())
assert r.partner_key is None
def test_url_routing_local(self):
r = ComfyRunner()
url = r._url("/prompt")
assert url == "http://127.0.0.1:8188/prompt"
def test_url_routing_cloud(self):
r = ComfyRunner(host="https://cloud.comfy.org", api_key="x")
url = r._url("/prompt")
assert url == "https://cloud.comfy.org/api/prompt"
def test_url_routing_cloud_history_renamed(self):
r = ComfyRunner(host="https://cloud.comfy.org", api_key="x")
url = r._url("/history/abc-123")
assert url == "https://cloud.comfy.org/api/history_v2/abc-123"
@@ -0,0 +1,86 @@
# Example Workflows
These are starter API-format workflows for the most common tasks. They're
ready to run with `scripts/run_workflow.py` once you've installed (or have
cloud access to) the listed models.
| File | Purpose | Required models | Min VRAM |
|------|---------|-----------------|----------|
| `sd15_txt2img.json` | SD 1.5 text-to-image (512×512) | SD1.5 checkpoint, e.g. `v1-5-pruned-emaonly.safetensors` | 4 GB |
| `sdxl_txt2img.json` | SDXL text-to-image (1024×1024) | `sd_xl_base_1.0.safetensors` | 8 GB |
| `flux_dev_txt2img.json` | Flux Dev text-to-image (1024×1024) | `flux1-dev.safetensors`, `t5xxl_fp16.safetensors`, `clip_l.safetensors`, `ae.safetensors` | 24 GB (or use `flux1-dev-fp8`) |
| `sdxl_img2img.json` | SDXL image-to-image | SDXL checkpoint | 8 GB |
| `sdxl_inpaint.json` | SDXL inpainting (image + mask) | SDXL checkpoint | 8 GB |
| `upscale_4x.json` | Standalone 4× ESRGAN upscale | `4x-UltraSharp.pth` (or any upscaler) | 4 GB |
| `animatediff_video.json` | AnimateDiff text-to-video (16 frames) | SD1.5 checkpoint, `mm_sd_v15_v2.ckpt` motion module | 8 GB |
| `wan_video_t2v.json` | Wan 2.x text-to-video (~33 frames) | `wan2.2_t2v_1.3B_fp16.safetensors`, `umt5_xxl_fp16.safetensors`, `wan_2.1_vae.safetensors` | 24 GB |
## Quick start
```bash
# Run a workflow with prompt injection
python3 ../scripts/run_workflow.py \
--workflow sdxl_txt2img.json \
--args '{"prompt": "majestic eagle in flight", "seed": 12345, "steps": 35}' \
--output-dir ./out
# Img2img: upload an input image first via the script's helper
python3 ../scripts/run_workflow.py \
--workflow sdxl_img2img.json \
--input-image image=./photo.png \
--args '{"prompt": "make it watercolor", "denoise": 0.6}' \
--output-dir ./out
# Cloud (set API key once)
export COMFY_CLOUD_API_KEY="comfyui-..."
python3 ../scripts/run_workflow.py \
--workflow flux_dev_txt2img.json \
--args '{"prompt": "a fox in a misty forest"}' \
--host https://cloud.comfy.org \
--output-dir ./out
# What can I tweak in this workflow?
python3 ../scripts/extract_schema.py sdxl_txt2img.json --summary-only
# Are all required models / nodes installed?
python3 ../scripts/check_deps.py wan_video_t2v.json
```
## Notes
- **Inpaint masks**: white pixels = "regenerate this region", black = preserve.
ComfyUI's `LoadImageMask` reads the **red channel** by default; export your
mask as a single-channel image or as a normal RGB where red==intensity.
- **Denoise strength** in img2img: `0.0` = output identical to input,
`1.0` = ignore input entirely. Sweet spot is usually 0.40.7.
- **Flux Dev** needs ~24 GB VRAM in its base form. The `flux1-dev-fp8.safetensors`
variant (already on Comfy Cloud) cuts that roughly in half.
- **Video workflows** can take many minutes. The skill auto-detects video
output nodes and bumps the default timeout to 900s. Override with `--timeout 1800`.
- These JSON files are deliberately **API format** (top-level keys are node IDs
with `class_type`), not editor format. To open them in ComfyUI's web UI for
visual editing, use `Workflow → Load (API Format)` or `Workflow → Open` and
follow the prompt.
## Cloud vs local model names
Comfy Cloud's preinstalled checkpoints sometimes have a `-fp16` suffix
(`v1-5-pruned-emaonly-fp16.safetensors`) while the canonical local download
keeps the original name (`v1-5-pruned-emaonly.safetensors`). The example
workflows use the local-canonical names. When running on cloud, override with:
```bash
python3 ../scripts/run_workflow.py \
--workflow sd15_txt2img.json \
--args '{"ckpt_name": "v1-5-pruned-emaonly-fp16.safetensors", "prompt": "..."}' \
--host https://cloud.comfy.org
```
The `ckpt_name`, `vae_name`, `lora_name`, `unet_name`, etc. are all exposed
as controllable parameters by `extract_schema.py` — discover what's installed
with `comfy model list` (local) or `curl /api/experiment/models/checkpoints`
(cloud).
@@ -0,0 +1,64 @@
{
"_comment": "AnimateDiff text-to-video at 16 frames. Required: comfyui-animatediff-evolved + comfyui-videohelpersuite custom nodes; SD1.5 checkpoint; AnimateDiff motion module (e.g. mm_sd_v15_v2.ckpt in models/animatediff_models/). Outputs a webp animation.",
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 42, "steps": 25, "cfg": 7.5,
"sampler_name": "dpmpp_sde", "scheduler": "karras", "denoise": 1.0,
"model": ["10", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"_meta": {"title": "Checkpoint"},
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
},
"5": {
"class_type": "EmptyLatentImage",
"_meta": {"title": "Latent (16 frames)"},
"inputs": {"width": 512, "height": 512, "batch_size": 16}
},
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Positive Prompt"},
"inputs": {"text": "a hot air balloon drifting over a mountain valley, sunset, cinematic", "clip": ["4", 1]}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {"text": "low quality, blurry, deformed, watermark", "clip": ["4", 1]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["4", 2]}
},
"9": {
"class_type": "VHS_VideoCombine",
"_meta": {"title": "Video Combine"},
"inputs": {
"frame_rate": 8.0,
"loop_count": 0,
"filename_prefix": "animatediff",
"format": "video/h264-mp4",
"pingpong": false,
"save_output": true,
"images": ["8", 0]
}
},
"10": {
"class_type": "ADE_AnimateDiffLoaderWithContext",
"_meta": {"title": "AnimateDiff Loader"},
"inputs": {
"model": ["4", 0],
"model_name": "mm_sd_v15_v2.ckpt",
"beta_schedule": "sqrt_linear (AnimateDiff)",
"motion_scale": 1.0,
"apply_v2_models_properly": true
}
}
}
@@ -0,0 +1,78 @@
{
"_comment": "Flux Dev text-to-image using the modern sampler chain (BasicScheduler/Guider/SamplerCustomAdvanced). Required: flux1-dev.safetensors (UNET), t5xxl_fp16.safetensors + clip_l.safetensors (CLIP), ae.safetensors (VAE).",
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Prompt"},
"inputs": {"text": "a serene mountain landscape at golden hour, photorealistic", "clip": ["11", 0]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["13", 0], "vae": ["10", 0]}
},
"9": {
"class_type": "SaveImage",
"_meta": {"title": "Save Image"},
"inputs": {"filename_prefix": "flux_dev", "images": ["8", 0]}
},
"10": {
"class_type": "VAELoader",
"_meta": {"title": "VAE"},
"inputs": {"vae_name": "ae.safetensors"}
},
"11": {
"class_type": "DualCLIPLoader",
"_meta": {"title": "DualCLIPLoader"},
"inputs": {
"clip_name1": "t5xxl_fp16.safetensors",
"clip_name2": "clip_l.safetensors",
"type": "flux"
}
},
"12": {
"class_type": "UNETLoader",
"_meta": {"title": "UNET Loader"},
"inputs": {"unet_name": "flux1-dev.safetensors", "weight_dtype": "default"}
},
"13": {
"class_type": "SamplerCustomAdvanced",
"_meta": {"title": "Sampler Custom"},
"inputs": {
"noise": ["25", 0],
"guider": ["22", 0],
"sampler": ["16", 0],
"sigmas": ["17", 0],
"latent_image": ["27", 0]
}
},
"16": {
"class_type": "KSamplerSelect",
"_meta": {"title": "Sampler Select"},
"inputs": {"sampler_name": "euler"}
},
"17": {
"class_type": "BasicScheduler",
"_meta": {"title": "Scheduler"},
"inputs": {
"scheduler": "simple",
"steps": 20,
"denoise": 1.0,
"model": ["12", 0]
}
},
"22": {
"class_type": "BasicGuider",
"_meta": {"title": "Guider"},
"inputs": {"model": ["12", 0], "conditioning": ["6", 0]}
},
"25": {
"class_type": "RandomNoise",
"_meta": {"title": "Noise"},
"inputs": {"noise_seed": 42}
},
"27": {
"class_type": "EmptySD3LatentImage",
"_meta": {"title": "Latent"},
"inputs": {"width": 1024, "height": 1024, "batch_size": 1}
}
}
@@ -0,0 +1,49 @@
{
"_comment": "SD 1.5 text-to-image. Smallest model, fastest. Required model: v1-5-pruned-emaonly.safetensors (or any SD1.5 checkpoint)",
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8.0,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"_meta": {"title": "Load Checkpoint"},
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
},
"5": {
"class_type": "EmptyLatentImage",
"_meta": {"title": "Empty Latent"},
"inputs": {"width": 512, "height": 512, "batch_size": 1}
},
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Positive Prompt"},
"inputs": {"text": "a beautiful landscape painting, masterpiece, highly detailed", "clip": ["4", 1]}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {"text": "ugly, blurry, low quality, deformed", "clip": ["4", 1]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["4", 2]}
},
"9": {
"class_type": "SaveImage",
"_meta": {"title": "Save Image"},
"inputs": {"filename_prefix": "sd15", "images": ["8", 0]}
}
}
@@ -0,0 +1,54 @@
{
"_comment": "SDXL img2img: load an input image, encode to latent, denoise partially. Use --input-image image=./photo.png with run_workflow.py. Lower 'denoise' value preserves more of the source image.",
"1": {
"class_type": "LoadImage",
"_meta": {"title": "Load Source Image"},
"inputs": {"image": "REPLACE_WITH_UPLOADED_FILENAME.png"}
},
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 42,
"steps": 30,
"cfg": 7.5,
"sampler_name": "dpmpp_2m",
"scheduler": "karras",
"denoise": 0.65,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["12", 0]
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"_meta": {"title": "Load SDXL Base"},
"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}
},
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Positive Prompt"},
"inputs": {"text": "make it cyberpunk, neon lights, futuristic", "clip": ["4", 1]}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {"text": "ugly, blurry, low quality, deformed", "clip": ["4", 1]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["4", 2]}
},
"9": {
"class_type": "SaveImage",
"_meta": {"title": "Save Image"},
"inputs": {"filename_prefix": "sdxl_img2img", "images": ["8", 0]}
},
"12": {
"class_type": "VAEEncode",
"_meta": {"title": "VAE Encode"},
"inputs": {"pixels": ["1", 0], "vae": ["4", 2]}
}
}
@@ -0,0 +1,59 @@
{
"_comment": "SDXL inpainting: given an image + mask, regenerate the masked region. Upload both: --input-image image=./photo.png --input-image mask_image=./mask.png. White pixels in mask = regenerate; black = preserve.",
"1": {
"class_type": "LoadImage",
"_meta": {"title": "Load Source"},
"inputs": {"image": "REPLACE_WITH_UPLOADED_FILENAME.png"}
},
"2": {
"class_type": "LoadImageMask",
"_meta": {"title": "Load Mask"},
"inputs": {"image": "REPLACE_WITH_UPLOADED_MASK.png", "channel": "red"}
},
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 42,
"steps": 30,
"cfg": 7.5,
"sampler_name": "dpmpp_2m",
"scheduler": "karras",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["12", 0]
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"_meta": {"title": "Checkpoint"},
"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}
},
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Positive Prompt"},
"inputs": {"text": "fill with blooming flowers, photorealistic", "clip": ["4", 1]}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {"text": "ugly, blurry, deformed, bad anatomy", "clip": ["4", 1]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["4", 2]}
},
"9": {
"class_type": "SaveImage",
"_meta": {"title": "Save"},
"inputs": {"filename_prefix": "sdxl_inpaint", "images": ["8", 0]}
},
"12": {
"class_type": "VAEEncodeForInpaint",
"_meta": {"title": "VAE Encode for Inpaint"},
"inputs": {"pixels": ["1", 0], "mask": ["2", 0], "vae": ["4", 2], "grow_mask_by": 6}
}
}
@@ -0,0 +1,49 @@
{
"_comment": "SDXL text-to-image at 1024x1024. Required model: sd_xl_base_1.0.safetensors (or any SDXL checkpoint).",
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 42,
"steps": 30,
"cfg": 7.5,
"sampler_name": "dpmpp_2m",
"scheduler": "karras",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"_meta": {"title": "Load SDXL Base"},
"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}
},
"5": {
"class_type": "EmptyLatentImage",
"_meta": {"title": "Empty Latent"},
"inputs": {"width": 1024, "height": 1024, "batch_size": 1}
},
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Positive Prompt"},
"inputs": {"text": "cinematic photograph, dramatic lighting, intricate detail", "clip": ["4", 1]}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {"text": "ugly, blurry, low quality, deformed, watermark", "clip": ["4", 1]}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["4", 2]}
},
"9": {
"class_type": "SaveImage",
"_meta": {"title": "Save Image"},
"inputs": {"filename_prefix": "sdxl", "images": ["8", 0]}
}
}
@@ -0,0 +1,27 @@
{
"_comment": "Standalone 4x upscale of an input image using ESRGAN. Required model: 4x-UltraSharp.pth (or any upscaler in models/upscale_models/). Upload with --input-image image=./photo.png.",
"1": {
"class_type": "LoadImage",
"_meta": {"title": "Load Image"},
"inputs": {"image": "REPLACE_WITH_UPLOADED_FILENAME.png"}
},
"2": {
"class_type": "UpscaleModelLoader",
"_meta": {"title": "Load Upscale Model"},
"inputs": {"model_name": "4x-UltraSharp.pth"}
},
"3": {
"class_type": "ImageUpscaleWithModel",
"_meta": {"title": "Upscale Image (with Model)"},
"inputs": {
"upscale_method": "lanczos",
"upscale_model": ["2", 0],
"image": ["1", 0]
}
},
"4": {
"class_type": "SaveImage",
"_meta": {"title": "Save"},
"inputs": {"filename_prefix": "upscaled_4x", "images": ["3", 0]}
}
}
@@ -0,0 +1,69 @@
{
"_comment": "Wan 2.1 text-to-video. Cloud: confirmed available. Local: download wan2.1_t2v_1.3B_fp16.safetensors → models/diffusion_models/ (or models/unet/), umt5_xxl_fp16.safetensors → models/text_encoders/ (or models/clip/), wan_2.1_vae.safetensors → models/vae/. Output: MP4. Large model — only on cloud or 24 GB+ local GPU.",
"6": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Prompt"},
"inputs": {
"text": "a graceful crane taking flight from a misty lake at dawn, slow motion, 4k",
"clip": ["38", 0]
}
},
"7": {
"class_type": "CLIPTextEncode",
"_meta": {"title": "Negative Prompt"},
"inputs": {
"text": "static, blurry, watermark, low quality",
"clip": ["38", 0]
}
},
"8": {
"class_type": "VAEDecode",
"_meta": {"title": "VAE Decode"},
"inputs": {"samples": ["3", 0], "vae": ["39", 0]}
},
"37": {
"class_type": "UNETLoader",
"_meta": {"title": "Wan UNET"},
"inputs": {"unet_name": "wan2.1_t2v_1.3B_fp16.safetensors", "weight_dtype": "default"}
},
"38": {
"class_type": "CLIPLoader",
"_meta": {"title": "Wan CLIP"},
"inputs": {"clip_name": "umt5_xxl_fp16.safetensors", "type": "wan"}
},
"39": {
"class_type": "VAELoader",
"_meta": {"title": "Wan VAE"},
"inputs": {"vae_name": "wan_2.1_vae.safetensors"}
},
"3": {
"class_type": "KSampler",
"_meta": {"title": "KSampler"},
"inputs": {
"seed": 42, "steps": 30, "cfg": 6.0,
"sampler_name": "uni_pc", "scheduler": "simple", "denoise": 1.0,
"model": ["37", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["40", 0]
}
},
"40": {
"class_type": "EmptyHunyuanLatentVideo",
"_meta": {"title": "Latent Video (33 frames)"},
"inputs": {"width": 832, "height": 480, "length": 33, "batch_size": 1}
},
"9": {
"class_type": "VHS_VideoCombine",
"_meta": {"title": "Video Combine"},
"inputs": {
"frame_rate": 16.0,
"loop_count": 0,
"filename_prefix": "wan_t2v",
"format": "video/h264-mp4",
"pingpong": false,
"save_output": true,
"images": ["8", 0]
}
}
}
@@ -1,7 +1,7 @@
---
name: ideation
title: Creative Ideation — Constraint-Driven Project Generation
description: "Generate project ideas through creative constraints. Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made."
description: "Generate project ideas via creative constraints."
version: 1.0.0
author: SHL0MS
license: MIT
@@ -14,6 +14,10 @@ metadata:
# Creative Ideation
## When to use
Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made.
Generate project ideas through creative constraints. Constraint + direction = creativity.
## How It Works
@@ -1,13 +1,13 @@
---
name: design-md
description: Author, validate, diff, and export DESIGN.md files — Google's open-source format spec that gives coding agents a persistent, structured understanding of a design system (tokens + rationale in one file). Use when building a design system, porting style rules between projects, generating UI with consistent brand, or auditing accessibility/contrast.
description: Author/validate/export Google's DESIGN.md token spec files.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [design, design-system, tokens, ui, accessibility, wcag, tailwind, dtcg, google]
related_skills: [popular-web-designs, excalidraw, architecture-diagram]
related_skills: [popular-web-designs, claude-design, excalidraw, architecture-diagram]
---
# DESIGN.md Skill
@@ -31,7 +31,9 @@ diffs versions for regressions, and exports to Tailwind or W3C DTCG JSON.
- User wants contrast / WCAG accessibility validation on their color palette
For purely visual inspiration or layout examples, use `popular-web-designs`
instead. This skill is for the *formal spec file* itself.
instead. For *process and taste* when designing a one-off HTML artifact
from scratch (prototype, deck, landing page, component lab), use
`claude-design`. This skill is for the *formal spec file* itself.
## File anatomy
@@ -1,6 +1,6 @@
---
name: excalidraw
description: Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
description: "Hand-drawn Excalidraw JSON diagrams (arch, flow, seq)."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -16,6 +16,10 @@ metadata:
Create diagrams by writing standard Excalidraw element JSON and saving as `.excalidraw` files. These files can be drag-and-dropped onto [excalidraw.com](https://excalidraw.com) for viewing and editing. No accounts, no API keys, no rendering libraries -- just JSON.
## When to use
Generate `.excalidraw` files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
## Workflow
1. **Load this skill** (you already did)
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Siqi Chen
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
@@ -0,0 +1,577 @@
---
name: humanizer
description: "Humanize text: strip AI-isms and add real voice."
version: 2.5.1
author: Siqi Chen (@blader, https://github.com/blader/humanizer), ported by Hermes Agent
license: MIT
metadata:
hermes:
tags: [writing, editing, humanize, anti-ai-slop, voice, prose, text]
category: creative
homepage: https://github.com/blader/humanizer
related_skills: [songwriting-and-ai-music]
---
# Humanizer: Remove AI Writing Patterns
Identify and remove signs of AI-generated text to make writing sound natural and human. Based on Wikipedia's "Signs of AI writing" guide (maintained by WikiProject AI Cleanup), derived from observations of thousands of AI-generated text instances.
**Key insight:** LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely completion, which is how the telltale patterns below get baked in.
## When to use this skill
Load this skill whenever the user asks to:
- "humanize", "de-AI", "de-slop", or "un-ChatGPT" a piece of text
- rewrite something so it doesn't sound like it was written by an LLM
- edit a draft (blog post, essay, PR description, docs, memo, email, tweet, resume bullet) to sound more natural
- match their voice in writing they're producing
- review text for AI tells before publishing
Also apply this skill to **your own** output when writing user-facing prose — release notes, PR descriptions, documentation, long-form explanations, summaries. Hermes's baseline voice already strips most of these, but a focused pass catches what slips through.
## How to use it in Hermes
The text usually arrives one of three ways:
1. **Inline** — user pastes the text directly into the message. Work on it in-place, reply with the rewrite.
2. **File** — user points at a file. Use `read_file` to load it, then `patch` or `write_file` to apply edits. For markdown docs in a repo, a targeted `patch` per section is cleaner than rewriting the whole file.
3. **Voice calibration sample** — user provides an additional sample of their own writing (inline or by file path) and asks you to match it. Read the sample first, then rewrite. See the Voice Calibration section below.
Always show the rewrite to the user. For file edits, show a diff or the changed section — don't silently overwrite.
## Your task
When given text to humanize:
1. **Identify AI patterns** — scan for the 29 patterns listed below.
2. **Rewrite problematic sections** — replace AI-isms with natural alternatives.
3. **Preserve meaning** — keep the core message intact.
4. **Maintain voice** — match the intended tone (formal, casual, technical, etc.). If a voice sample was provided, match it specifically.
5. **Add soul** — don't just remove bad patterns, inject actual personality. See PERSONALITY AND SOUL below.
6. **Do a final anti-AI pass** — ask yourself: "What makes the below so obviously AI generated?" Answer briefly with any remaining tells, then revise one more time.
## Voice Calibration (optional)
If the user provides a writing sample (their own previous writing), analyze it before rewriting:
1. **Read the sample first.** Note:
- Sentence length patterns (short and punchy? Long and flowing? Mixed?)
- Word choice level (casual? academic? somewhere between?)
- How they start paragraphs (jump right in? Set context first?)
- Punctuation habits (lots of dashes? Parenthetical asides? Semicolons?)
- Any recurring phrases or verbal tics
- How they handle transitions (explicit connectors? Just start the next point?)
2. **Match their voice in the rewrite.** Don't just remove AI patterns — replace them with patterns from the sample. If they write short sentences, don't produce long ones. If they use "stuff" and "things," don't upgrade to "elements" and "components."
3. **When no sample is provided,** fall back to the default behavior (natural, varied, opinionated voice from the PERSONALITY AND SOUL section below).
### How to provide a sample
- Inline: "Humanize this text. Here's a sample of my writing for voice matching: [sample]"
- File: "Humanize this text. Use my writing style from [file path] as a reference."
## PERSONALITY AND SOUL
Avoiding AI patterns is only half the job. Sterile, voiceless writing is just as obvious as slop. Good writing has a human behind it.
### Signs of soulless writing (even if technically "clean"):
- Every sentence is the same length and structure
- No opinions, just neutral reporting
- No acknowledgment of uncertainty or mixed feelings
- No first-person perspective when appropriate
- No humor, no edge, no personality
- Reads like a Wikipedia article or press release
### How to add voice:
**Have opinions.** Don't just report facts — react to them. "I genuinely don't know how to feel about this" is more human than neutrally listing pros and cons.
**Vary your rhythm.** Short punchy sentences. Then longer ones that take their time getting where they're going. Mix it up.
**Acknowledge complexity.** Real humans have mixed feelings. "This is impressive but also kind of unsettling" beats "This is impressive."
**Use "I" when it fits.** First person isn't unprofessional — it's honest. "I keep coming back to..." or "Here's what gets me..." signals a real person thinking.
**Let some mess in.** Perfect structure feels algorithmic. Tangents, asides, and half-formed thoughts are human.
**Be specific about feelings.** Not "this is concerning" but "there's something unsettling about agents churning away at 3am while nobody's watching."
### Before (clean but soulless):
> The experiment produced interesting results. The agents generated 3 million lines of code. Some developers were impressed while others were skeptical. The implications remain unclear.
### After (has a pulse):
> I genuinely don't know how to feel about this one. 3 million lines of code, generated while the humans presumably slept. Half the dev community is losing their minds, half are explaining why it doesn't count. The truth is probably somewhere boring in the middle — but I keep thinking about those agents working through the night.
## CONTENT PATTERNS
### 1. Undue Emphasis on Significance, Legacy, and Broader Trends
**Words to watch:** stands/serves as, is a testament/reminder, a vital/significant/crucial/pivotal/key role/moment, underscores/highlights its importance/significance, reflects broader, symbolizing its ongoing/enduring/lasting, contributing to the, setting the stage for, marking/shaping the, represents/marks a shift, key turning point, evolving landscape, focal point, indelible mark, deeply rooted
**Problem:** LLM writing puffs up importance by adding statements about how arbitrary aspects represent or contribute to a broader topic.
**Before:**
> The Statistical Institute of Catalonia was officially established in 1989, marking a pivotal moment in the evolution of regional statistics in Spain. This initiative was part of a broader movement across Spain to decentralize administrative functions and enhance regional governance.
**After:**
> The Statistical Institute of Catalonia was established in 1989 to collect and publish regional statistics independently from Spain's national statistics office.
### 2. Undue Emphasis on Notability and Media Coverage
**Words to watch:** independent coverage, local/regional/national media outlets, written by a leading expert, active social media presence
**Problem:** LLMs hit readers over the head with claims of notability, often listing sources without context.
**Before:**
> Her views have been cited in The New York Times, BBC, Financial Times, and The Hindu. She maintains an active social media presence with over 500,000 followers.
**After:**
> In a 2024 New York Times interview, she argued that AI regulation should focus on outcomes rather than methods.
### 3. Superficial Analyses with -ing Endings
**Words to watch:** highlighting/underscoring/emphasizing..., ensuring..., reflecting/symbolizing..., contributing to..., cultivating/fostering..., encompassing..., showcasing...
**Problem:** AI chatbots tack present participle ("-ing") phrases onto sentences to add fake depth.
**Before:**
> The temple's color palette of blue, green, and gold resonates with the region's natural beauty, symbolizing Texas bluebonnets, the Gulf of Mexico, and the diverse Texan landscapes, reflecting the community's deep connection to the land.
**After:**
> The temple uses blue, green, and gold colors. The architect said these were chosen to reference local bluebonnets and the Gulf coast.
### 4. Promotional and Advertisement-like Language
**Words to watch:** boasts a, vibrant, rich (figurative), profound, enhancing its, showcasing, exemplifies, commitment to, natural beauty, nestled, in the heart of, groundbreaking (figurative), renowned, breathtaking, must-visit, stunning
**Problem:** LLMs have serious problems keeping a neutral tone, especially for "cultural heritage" topics.
**Before:**
> Nestled within the breathtaking region of Gonder in Ethiopia, Alamata Raya Kobo stands as a vibrant town with a rich cultural heritage and stunning natural beauty.
**After:**
> Alamata Raya Kobo is a town in the Gonder region of Ethiopia, known for its weekly market and 18th-century church.
### 5. Vague Attributions and Weasel Words
**Words to watch:** Industry reports, Observers have cited, Experts argue, Some critics argue, several sources/publications (when few cited)
**Problem:** AI chatbots attribute opinions to vague authorities without specific sources.
**Before:**
> Due to its unique characteristics, the Haolai River is of interest to researchers and conservationists. Experts believe it plays a crucial role in the regional ecosystem.
**After:**
> The Haolai River supports several endemic fish species, according to a 2019 survey by the Chinese Academy of Sciences.
### 6. Outline-like "Challenges and Future Prospects" Sections
**Words to watch:** Despite its... faces several challenges..., Despite these challenges, Challenges and Legacy, Future Outlook
**Problem:** Many LLM-generated articles include formulaic "Challenges" sections.
**Before:**
> Despite its industrial prosperity, Korattur faces challenges typical of urban areas, including traffic congestion and water scarcity. Despite these challenges, with its strategic location and ongoing initiatives, Korattur continues to thrive as an integral part of Chennai's growth.
**After:**
> Traffic congestion increased after 2015 when three new IT parks opened. The municipal corporation began a stormwater drainage project in 2022 to address recurring floods.
## LANGUAGE AND GRAMMAR PATTERNS
### 7. Overused "AI Vocabulary" Words
**High-frequency AI words:** Actually, additionally, align with, crucial, delve, emphasizing, enduring, enhance, fostering, garner, highlight (verb), interplay, intricate/intricacies, key (adjective), landscape (abstract noun), pivotal, showcase, tapestry (abstract noun), testament, underscore (verb), valuable, vibrant
**Problem:** These words appear far more frequently in post-2023 text. They often co-occur.
**Before:**
> Additionally, a distinctive feature of Somali cuisine is the incorporation of camel meat. An enduring testament to Italian colonial influence is the widespread adoption of pasta in the local culinary landscape, showcasing how these dishes have integrated into the traditional diet.
**After:**
> Somali cuisine also includes camel meat, which is considered a delicacy. Pasta dishes, introduced during Italian colonization, remain common, especially in the south.
### 8. Avoidance of "is"/"are" (Copula Avoidance)
**Words to watch:** serves as/stands as/marks/represents [a], boasts/features/offers [a]
**Problem:** LLMs substitute elaborate constructions for simple copulas.
**Before:**
> Gallery 825 serves as LAAA's exhibition space for contemporary art. The gallery features four separate spaces and boasts over 3,000 square feet.
**After:**
> Gallery 825 is LAAA's exhibition space for contemporary art. The gallery has four rooms totaling 3,000 square feet.
### 9. Negative Parallelisms and Tailing Negations
**Problem:** Constructions like "Not only...but..." or "It's not just about..., it's..." are overused. So are clipped tailing-negation fragments such as "no guessing" or "no wasted motion" tacked onto the end of a sentence instead of written as a real clause.
**Before:**
> It's not just about the beat riding under the vocals; it's part of the aggression and atmosphere. It's not merely a song, it's a statement.
**After:**
> The heavy beat adds to the aggressive tone.
**Before (tailing negation):**
> The options come from the selected item, no guessing.
**After:**
> The options come from the selected item without forcing the user to guess.
### 10. Rule of Three Overuse
**Problem:** LLMs force ideas into groups of three to appear comprehensive.
**Before:**
> The event features keynote sessions, panel discussions, and networking opportunities. Attendees can expect innovation, inspiration, and industry insights.
**After:**
> The event includes talks and panels. There's also time for informal networking between sessions.
### 11. Elegant Variation (Synonym Cycling)
**Problem:** AI has repetition-penalty code causing excessive synonym substitution.
**Before:**
> The protagonist faces many challenges. The main character must overcome obstacles. The central figure eventually triumphs. The hero returns home.
**After:**
> The protagonist faces many challenges but eventually triumphs and returns home.
### 12. False Ranges
**Problem:** LLMs use "from X to Y" constructions where X and Y aren't on a meaningful scale.
**Before:**
> Our journey through the universe has taken us from the singularity of the Big Bang to the grand cosmic web, from the birth and death of stars to the enigmatic dance of dark matter.
**After:**
> The book covers the Big Bang, star formation, and current theories about dark matter.
### 13. Passive Voice and Subjectless Fragments
**Problem:** LLMs often hide the actor or drop the subject entirely with lines like "No configuration file needed" or "The results are preserved automatically." Rewrite these when active voice makes the sentence clearer and more direct.
**Before:**
> No configuration file needed. The results are preserved automatically.
**After:**
> You do not need a configuration file. The system preserves the results automatically.
## STYLE PATTERNS
### 14. Em Dash Overuse
**Problem:** LLMs use em dashes (—) more than humans, mimicking "punchy" sales writing. In practice, most of these can be rewritten more cleanly with commas, periods, or parentheses.
**Before:**
> The term is primarily promoted by Dutch institutions—not by the people themselves. You don't say "Netherlands, Europe" as an address—yet this mislabeling continues—even in official documents.
**After:**
> The term is primarily promoted by Dutch institutions, not by the people themselves. You don't say "Netherlands, Europe" as an address, yet this mislabeling continues in official documents.
### 15. Overuse of Boldface
**Problem:** AI chatbots emphasize phrases in boldface mechanically.
**Before:**
> It blends **OKRs (Objectives and Key Results)**, **KPIs (Key Performance Indicators)**, and visual strategy tools such as the **Business Model Canvas (BMC)** and **Balanced Scorecard (BSC)**.
**After:**
> It blends OKRs, KPIs, and visual strategy tools like the Business Model Canvas and Balanced Scorecard.
### 16. Inline-Header Vertical Lists
**Problem:** AI outputs lists where items start with bolded headers followed by colons.
**Before:**
> - **User Experience:** The user experience has been significantly improved with a new interface.
> - **Performance:** Performance has been enhanced through optimized algorithms.
> - **Security:** Security has been strengthened with end-to-end encryption.
**After:**
> The update improves the interface, speeds up load times through optimized algorithms, and adds end-to-end encryption.
### 17. Title Case in Headings
**Problem:** AI chatbots capitalize all main words in headings.
**Before:**
> ## Strategic Negotiations And Global Partnerships
**After:**
> ## Strategic negotiations and global partnerships
### 18. Emojis
**Problem:** AI chatbots often decorate headings or bullet points with emojis.
**Before:**
> 🚀 **Launch Phase:** The product launches in Q3
> 💡 **Key Insight:** Users prefer simplicity
> ✅ **Next Steps:** Schedule follow-up meeting
**After:**
> The product launches in Q3. User research showed a preference for simplicity. Next step: schedule a follow-up meeting.
### 19. Curly Quotation Marks
**Problem:** ChatGPT uses curly quotes ("...") instead of straight quotes ("...").
**Before:**
> He said "the project is on track" but others disagreed.
**After:**
> He said "the project is on track" but others disagreed.
## COMMUNICATION PATTERNS
### 20. Collaborative Communication Artifacts
**Words to watch:** I hope this helps, Of course!, Certainly!, You're absolutely right!, Would you like..., let me know, here is a...
**Problem:** Text meant as chatbot correspondence gets pasted as content.
**Before:**
> Here is an overview of the French Revolution. I hope this helps! Let me know if you'd like me to expand on any section.
**After:**
> The French Revolution began in 1789 when financial crisis and food shortages led to widespread unrest.
### 21. Knowledge-Cutoff Disclaimers
**Words to watch:** as of [date], Up to my last training update, While specific details are limited/scarce..., based on available information...
**Problem:** AI disclaimers about incomplete information get left in text.
**Before:**
> While specific details about the company's founding are not extensively documented in readily available sources, it appears to have been established sometime in the 1990s.
**After:**
> The company was founded in 1994, according to its registration documents.
### 22. Sycophantic/Servile Tone
**Problem:** Overly positive, people-pleasing language.
**Before:**
> Great question! You're absolutely right that this is a complex topic. That's an excellent point about the economic factors.
**After:**
> The economic factors you mentioned are relevant here.
## FILLER AND HEDGING
### 23. Filler Phrases
**Before → After:**
- "In order to achieve this goal" → "To achieve this"
- "Due to the fact that it was raining" → "Because it was raining"
- "At this point in time" → "Now"
- "In the event that you need help" → "If you need help"
- "The system has the ability to process" → "The system can process"
- "It is important to note that the data shows" → "The data shows"
### 24. Excessive Hedging
**Problem:** Over-qualifying statements.
**Before:**
> It could potentially possibly be argued that the policy might have some effect on outcomes.
**After:**
> The policy may affect outcomes.
### 25. Generic Positive Conclusions
**Problem:** Vague upbeat endings.
**Before:**
> The future looks bright for the company. Exciting times lie ahead as they continue their journey toward excellence. This represents a major step in the right direction.
**After:**
> The company plans to open two more locations next year.
### 26. Hyphenated Word Pair Overuse
**Words to watch:** third-party, cross-functional, client-facing, data-driven, decision-making, well-known, high-quality, real-time, long-term, end-to-end
**Problem:** AI hyphenates common word pairs with perfect consistency. Humans rarely hyphenate these uniformly, and when they do, it's inconsistent. Less common or technical compound modifiers are fine to hyphenate.
**Before:**
> The cross-functional team delivered a high-quality, data-driven report on our client-facing tools. Their decision-making process was well-known for being thorough and detail-oriented.
**After:**
> The cross functional team delivered a high quality, data driven report on our client facing tools. Their decision making process was known for being thorough and detail oriented.
### 27. Persuasive Authority Tropes
**Phrases to watch:** The real question is, at its core, in reality, what really matters, fundamentally, the deeper issue, the heart of the matter
**Problem:** LLMs use these phrases to pretend they are cutting through noise to some deeper truth, when the sentence that follows usually just restates an ordinary point with extra ceremony.
**Before:**
> The real question is whether teams can adapt. At its core, what really matters is organizational readiness.
**After:**
> The question is whether teams can adapt. That mostly depends on whether the organization is ready to change its habits.
### 28. Signposting and Announcements
**Phrases to watch:** Let's dive in, let's explore, let's break this down, here's what you need to know, now let's look at, without further ado
**Problem:** LLMs announce what they are about to do instead of doing it. This meta-commentary slows the writing down and gives it a tutorial-script feel.
**Before:**
> Let's dive into how caching works in Next.js. Here's what you need to know.
**After:**
> Next.js caches data at multiple layers, including request memoization, the data cache, and the router cache.
### 29. Fragmented Headers
**Signs to watch:** A heading followed by a one-line paragraph that simply restates the heading before the real content begins.
**Problem:** LLMs often add a generic sentence after a heading as a rhetorical warm-up. It usually adds nothing and makes the prose feel padded.
**Before:**
> ## Performance
>
> Speed matters.
>
> When users hit a slow page, they leave.
**After:**
> ## Performance
>
> When users hit a slow page, they leave.
---
## Process
1. Read the input text carefully (use `read_file` if it's a file).
2. Identify all instances of the patterns above.
3. Rewrite each problematic section.
4. Ensure the revised text:
- Sounds natural when read aloud
- Varies sentence structure naturally
- Uses specific details over vague claims
- Maintains appropriate tone for context
- Uses simple constructions (is/are/has) where appropriate
5. Present a draft humanized version.
6. Prompt yourself: "What makes the below so obviously AI generated?"
7. Answer briefly with the remaining tells (if any).
8. Prompt yourself: "Now make it not obviously AI generated."
9. Present the final version (revised after the audit).
10. If the text came from a file, apply the edit with `patch` (targeted) or `write_file` (full rewrite) and show the user what changed.
## Output Format
Provide:
1. Draft rewrite
2. "What makes the below so obviously AI generated?" (brief bullets)
3. Final rewrite
4. A brief summary of changes made (optional, if helpful)
## Full Example
**Before (AI-sounding):**
> Great question! Here is an essay on this topic. I hope this helps!
>
> AI-assisted coding serves as an enduring testament to the transformative potential of large language models, marking a pivotal moment in the evolution of software development. In today's rapidly evolving technological landscape, these groundbreaking tools—nestled at the intersection of research and practice—are reshaping how engineers ideate, iterate, and deliver, underscoring their vital role in modern workflows.
>
> At its core, the value proposition is clear: streamlining processes, enhancing collaboration, and fostering alignment. It's not just about autocomplete; it's about unlocking creativity at scale, ensuring that organizations can remain agile while delivering seamless, intuitive, and powerful experiences to users. The tool serves as a catalyst. The assistant functions as a partner. The system stands as a foundation for innovation.
>
> Industry observers have noted that adoption has accelerated from hobbyist experiments to enterprise-wide rollouts, from solo developers to cross-functional teams. The technology has been featured in The New York Times, Wired, and The Verge. Additionally, the ability to generate documentation, tests, and refactors showcases how AI can contribute to better outcomes, highlighting the intricate interplay between automation and human judgment.
>
> - 💡 **Speed:** Code generation is significantly faster, reducing friction and empowering developers.
> - 🚀 **Quality:** Output quality has been enhanced through improved training, contributing to higher standards.
> - ✅ **Adoption:** Usage continues to grow, reflecting broader industry trends.
>
> While specific details are limited based on available information, it could potentially be argued that these tools might have some positive effect. Despite challenges typical of emerging technologies—including hallucinations, bias, and accountability—the ecosystem continues to thrive. In order to fully realize this potential, teams must align with best practices.
>
> In conclusion, the future looks bright. Exciting times lie ahead as we continue this journey toward excellence. Let me know if you'd like me to expand on any section!
**Draft rewrite:**
> AI coding assistants speed up some tasks. In a 2024 study by Google, developers using Codex completed simple functions 55% faster than a control group, but showed no improvement on debugging or architectural decisions.
>
> The tools are good at boilerplate: config files, test scaffolding, repetitive refactors. They are bad at knowing when they are wrong. I have mass-accepted suggestions that compiled, passed lint, and still did the wrong thing because I stopped paying attention.
>
> Mira, an engineer at a fintech startup I interviewed, said she uses Copilot "like autocomplete for boring code" but reviews every line before committing. Jake, a senior dev at a larger company, disabled it after it kept suggesting patterns from a deprecated internal library.
>
> The productivity claims are hard to verify. GitHub says Copilot users "accept 30% of suggestions," but acceptance is not correctness, and correctness is not value. The 2024 Uplevel study found no statistically significant difference in pull-request throughput between teams with and without AI assistants.
>
> None of this means the tools are useless. It means they are tools. They do not replace judgment, and they do not eliminate the need for tests. If you do not have tests, you cannot tell whether the suggestion is right.
**What makes the below so obviously AI generated?**
- The rhythm is still a bit too tidy (clean contrasts, evenly paced paragraphs).
- The named people and study citations can read like plausible-but-made-up placeholders unless they're real and sourced.
- The closer leans a touch slogan-y ("If you do not have tests...") rather than sounding like a person talking.
**Now make it not obviously AI generated.**
> AI coding assistants can make you faster at the boring parts. Not everything. Definitely not architecture.
>
> They're great at boilerplate: config files, test scaffolding, repetitive refactors. They're also great at sounding right while being wrong. I've accepted suggestions that compiled, passed lint, and still missed the point because I stopped paying attention.
>
> People I talk to tend to land in two camps. Some use it like autocomplete for chores and review every line. Others disable it after it keeps suggesting patterns they don't want. Both feel reasonable.
>
> The productivity metrics are slippery. GitHub can say Copilot users "accept 30% of suggestions," but acceptance isn't correctness, and correctness isn't value. If you don't have tests, you're basically guessing.
**Changes made:**
- Removed chatbot artifacts ("Great question!", "I hope this helps!", "Let me know if...")
- Removed significance inflation ("testament", "pivotal moment", "evolving landscape", "vital role")
- Removed promotional language ("groundbreaking", "nestled", "seamless, intuitive, and powerful")
- Removed vague attributions ("Industry observers")
- Removed superficial -ing phrases ("underscoring", "highlighting", "reflecting", "contributing to")
- Removed negative parallelism ("It's not just X; it's Y")
- Removed rule-of-three patterns and synonym cycling ("catalyst/partner/foundation")
- Removed false ranges ("from X to Y, from A to B")
- Removed em dashes, emojis, boldface headers, and curly quotes
- Removed copula avoidance ("serves as", "functions as", "stands as") in favor of "is"/"are"
- Removed formulaic challenges section ("Despite challenges... continues to thrive")
- Removed knowledge-cutoff hedging ("While specific details are limited...")
- Removed excessive hedging ("could potentially be argued that... might have some")
- Removed filler phrases and persuasive framing ("In order to", "At its core")
- Removed generic positive conclusion ("the future looks bright", "exciting times lie ahead")
- Made the voice more personal and less "assembled" (varied rhythm, fewer placeholders)
## Attribution
This skill is ported from [blader/humanizer](https://github.com/blader/humanizer) (MIT licensed), which is itself based on [Wikipedia: Signs of AI writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing), maintained by WikiProject AI Cleanup. The patterns documented there come from observations of thousands of instances of AI-generated text on Wikipedia.
Original author: Siqi Chen ([@blader](https://github.com/blader)). Original repo: https://github.com/blader/humanizer (version 2.5.1). Ported to Hermes Agent with Hermes-native tool references (`read_file`, `patch`, `write_file`) and guidance for when to load the skill; the 29 patterns, personality/soul section, and full worked example are preserved verbatim from the source. Original MIT license preserved in the `LICENSE` file alongside this `SKILL.md`.
Key insight from Wikipedia: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."
@@ -1,11 +1,15 @@
---
name: manim-video
description: "Production pipeline for mathematical and technical animations using Manim Community Edition. Creates 3Blue1Brown-style explainer videos, algorithm visualizations, equation derivations, architecture diagrams, and data stories. Use when users request: animated explanations, math animations, concept visualizations, algorithm walkthroughs, technical explainers, 3Blue1Brown style videos, or any programmatic animation with geometric/mathematical content."
description: "Manim CE animations: 3Blue1Brown math/algo videos."
version: 1.0.0
---
# Manim Video Production Pipeline
## When to use
Use when users request: animated explanations, math animations, concept visualizations, algorithm walkthroughs, technical explainers, 3Blue1Brown style videos, or any programmatic animation with geometric/mathematical content. Creates 3Blue1Brown-style explainer videos, algorithm visualizations, equation derivations, architecture diagrams, and data stories using Manim Community Edition.
## Creative Standard
This is educational cinema. Every frame teaches. Every animation reveals structure.
@@ -1,6 +1,6 @@
---
name: p5js
description: "Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes, audio-reactive visuals, and motion graphics — exported as HTML, PNG, GIF, MP4, or SVG. Covers: 2D/3D rendering, noise and particle systems, flow fields, shaders (GLSL), pixel manipulation, kinetic typography, WebGL scenes, audio analysis, mouse/keyboard interaction, and headless high-res export. Use when users request: p5.js sketches, creative coding, generative art, interactive visualizations, canvas animations, browser-based visual art, data viz, shader effects, or any p5.js project."
description: "p5.js sketches: gen art, shaders, interactive, 3D."
version: 1.0.0
metadata:
hermes:
@@ -10,6 +10,14 @@ metadata:
# p5.js Production Pipeline
## When to use
Use when users request: p5.js sketches, creative coding, generative art, interactive visualizations, canvas animations, browser-based visual art, data viz, shader effects, or any p5.js project.
## What's inside
Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes, audio-reactive visuals, and motion graphics — exported as HTML, PNG, GIF, MP4, or SVG. Covers: 2D/3D rendering, noise and particle systems, flow fields, shaders (GLSL), pixel manipulation, kinetic typography, WebGL scenes, audio analysis, mouse/keyboard interaction, and headless high-res export.
## Creative Standard
This is visual art rendered in the browser. The canvas is the medium; the algorithm is the brush.
@@ -1,6 +1,6 @@
---
name: pixel-art
description: Convert images into retro pixel art with hardware-accurate palettes (NES, Game Boy, PICO-8, C64, etc.), and animate them into short videos. Presets cover arcade, SNES, and 10+ era-correct looks. Use `clarify` to let the user pick a style before generating.
description: "Pixel art w/ era palettes (NES, Game Boy, PICO-8)."
version: 2.0.0
author: dodo-reach
license: MIT
@@ -1,10 +1,6 @@
---
name: popular-web-designs
description: >
54 production-quality design systems extracted from real websites. Load a template
to generate HTML/CSS that matches the visual identity of sites like Stripe, Linear,
Vercel, Notion, Airbnb, and more. Each template includes colors, typography, components,
layout rules, and ready-to-use CSS values.
description: 54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.
version: 1.0.0
author: Hermes Agent + Teknium (design systems sourced from VoltAgent/awesome-design-md)
license: MIT
@@ -27,6 +23,16 @@ triggers:
site's complete visual language: color palette, typography hierarchy, component styles, spacing
system, shadows, responsive behavior, and practical agent prompts with exact CSS values.
## Related design skills
- **`claude-design`** — use for the design *process and taste* (scoping a brief,
producing variants, verifying a local HTML artifact, avoiding AI-design slop).
Pair it with this skill when the user wants a thoughtfully-designed page styled
after a known brand: `claude-design` drives the workflow, this skill supplies
the visual vocabulary.
- **`design-md`** — use when the deliverable is a formal DESIGN.md token spec
file, not a rendered artifact.
## How to Use
1. Pick a design from the catalog below
@@ -0,0 +1,219 @@
---
name: pretext
description: "Use when building creative browser demos with @chenglou/pretext — DOM-free text layout for ASCII art, typographic flow around obstacles, text-as-geometry games, kinetic typography, and text-powered generative art. Produces single-file HTML demos by default."
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [creative-coding, typography, pretext, ascii-art, canvas, generative, text-layout, kinetic-typography]
related_skills: [p5js, claude-design, excalidraw, architecture-diagram]
---
# Pretext Creative Demos
## Overview
[`@chenglou/pretext`](https://github.com/chenglou/pretext) is a 15KB zero-dependency TypeScript library by Cheng Lou (React core, ReasonML, Midjourney) for **DOM-free multiline text measurement and layout**. It does one thing: given `(text, font, width)`, return the line breaks, per-line widths, per-grapheme positions, and total height — all via canvas measurement, no reflow.
That sounds like plumbing. It is not. Because it is fast and geometric, it is a **creative primitive**: you can reflow paragraphs around a moving sprite at 60fps, build games whose level geometry is made of real words, drive ASCII logos through prose, shatter text into particles with exact per-grapheme starting positions, or pack shrink-wrapped multiline UI without any `getBoundingClientRect` thrash.
This skill exists so Hermes can make **cool demos** with it — the kind people post to X. See `pretext.cool` and `chenglou.me/pretext` for the community demo corpus.
## When to Use
Use when the user asks for:
- A "pretext demo" / "cool pretext thing" / "text-as-X"
- Text flowing around a moving shape (hero sections, editorial layouts, animated long-form pages)
- ASCII-art effects using **real words or prose**, not monospace rasters
- Games where the playfield / obstacles / bricks are made of text (Tetris-from-letters, Breakout-of-prose)
- Kinetic typography with per-glyph physics (shatter, scatter, flock, flow)
- Typographic generative art, especially with non-Latin scripts or mixed scripts
- Multiline "shrink-wrap" UI (smallest container width that still fits the text)
- Anything that would require knowing line breaks *before* rendering
Don't use for:
- Static SVG/HTML pages where CSS already solves layout — just use CSS
- Rich text editors, general inline formatting engines (pretext is intentionally narrow)
- Image → text (use `ascii-art` / `ascii-video` skills)
- Pure canvas generative art with no text role — use `p5js`
## Creative Standard
This is visual art rendered in a browser. Pretext returns numbers; **you** draw the thing.
- **Don't ship a "hello world" demo.** The `hello-orb-flow.html` template is the *starting* point. Every delivered demo must add intentional color, motion, composition, and one visual detail the user didn't ask for but will appreciate.
- **Dark backgrounds, warm cores, considered palette.** Classic amber-on-black (CRT / terminal) works, but so do cold-white-on-charcoal (editorial) and desaturated pastels (risograph). Pick one and commit.
- **Proportional fonts are the point.** Pretext's whole vibe is "not monospaced" — lean into it. Use Iowan Old Style, Inter, JetBrains Mono, Helvetica Neue, or a variable font. Never default sans.
- **Real source/text, not lorem ipsum.** The corpus should mean something. Short manifestos, poetry, real source code, a found text, the library's own README — never `lorem ipsum`.
- **First-paint excellence.** No loading states, no blank frames. The demo must look shippable the instant it opens.
## Stack
Single self-contained HTML file per demo. No build step.
| Layer | Tool | Purpose |
|-------|------|---------|
| Core | `@chenglou/pretext` via `esm.sh` CDN | Text measurement + line layout |
| Render | HTML5 Canvas 2D | Glyph rendering, per-frame composition |
| Segmentation | `Intl.Segmenter` (built-in) | Grapheme splitting for emoji / CJK / combining marks |
| Interaction | Raw DOM events | Mouse / touch / wheel — no framework |
```html
<script type="module">
import {
prepare, layout, // use-case 1: simple height
prepareWithSegments, layoutWithLines, // use-case 2a: fixed-width lines
layoutNextLineRange, materializeLineRange, // use-case 2b: streaming / variable width
measureLineStats, walkLineRanges, // stats without string allocation
} from "https://esm.sh/@chenglou/pretext@0.0.6";
</script>
```
Pin the version. `@0.0.6` at time of writing — check [npm](https://www.npmjs.com/package/@chenglou/pretext) for the latest if demo behavior is off.
## The Two Use Cases
Almost everything reduces to one of these two shapes. Learn both.
### Use-case 1 — measure, then render with CSS/DOM
```js
const prepared = prepare(text, "16px Inter");
const { height, lineCount } = layout(prepared, 320, 20);
```
You still let the browser draw the text. Pretext just tells you how tall the box will be at a given width, **without** a DOM read. Use for:
- Virtualized lists where rows contain wrapping text
- Masonry with precise card heights
- "Does this label fit?" dev-time checks
- Preventing layout shift when remote text loads
**Keep `font` and `letterSpacing` exactly in sync with your CSS.** The canvas `ctx.font` format (e.g. `"16px Inter"`, `"500 17px 'JetBrains Mono'"`) must match the rendered CSS, or measurements drift.
### Use-case 2 — measure *and* render yourself
```js
const prepared = prepareWithSegments(text, FONT);
const { lines } = layoutWithLines(prepared, 320, 26);
for (let i = 0; i < lines.length; i++) {
ctx.fillText(lines[i].text, 0, i * 26);
}
```
This is where the creative work lives. You own the drawing, so you can:
- Render to canvas, SVG, WebGL, or any coordinate system
- Substitute per-glyph transforms (rotation, jitter, scale, opacity)
- Use line metadata (width, grapheme positions) as geometry
For **variable-width-per-line** flow (text around a shape, text in a donut band, text in a non-rectangular column):
```js
let cursor = { segmentIndex: 0, graphemeIndex: 0 };
let y = 0;
while (true) {
const lineWidth = widthAtY(y); // your function: how wide is the corridor at this y?
const range = layoutNextLineRange(prepared, cursor, lineWidth);
if (!range) break;
const line = materializeLineRange(prepared, range);
ctx.fillText(line.text, leftEdgeAtY(y), y);
cursor = range.end;
y += lineHeight;
}
```
This is the most important pattern in the whole library. It's what unlocks "text flowing around a dragged sprite" — the demo that went viral on X.
### Helpers worth knowing
- `measureLineStats(prepared, maxWidth)``{ lineCount, maxLineWidth }` — the widest line, i.e. multiline shrink-wrap width.
- `walkLineRanges(prepared, maxWidth, callback)` — iterate lines without allocating strings. Use for stats/physics over graphemes when you don't need the characters.
- `@chenglou/pretext/rich-inline` — the same system but for paragraphs mixing fonts / chips / mentions. Import from the subpath.
## Demo Recipe Patterns
The community corpus (see `references/patterns.md`) clusters into a handful of strong patterns. Pick one and riff — don't invent a new category unless asked.
| Pattern | Key API | Example idea |
|---|---|---|
| **Reflow around obstacle** | `layoutNextLineRange` + per-row width function | Editorial paragraph that parts around a dragged cursor sprite |
| **Text-as-geometry game** | `layoutWithLines` + per-line collision rects | Breakout where each brick is a measured word |
| **Shatter / particles** | `walkLineRanges` → per-grapheme (x,y) → physics | Sentence that explodes into letters on click |
| **ASCII obstacle typography** | `layoutNextLineRange` + measured per-row obstacle spans | Bitmap ASCII logo, shape morphs, and draggable wire objects that make text open around their actual geometry |
| **Editorial multi-column** | `layoutNextLineRange` per column + shared cursor | Animated magazine spread with pull quotes |
| **Kinetic type** | `layoutWithLines` + per-line transform over time | Star Wars crawl, wave, bounce, glitch |
| **Multiline shrink-wrap** | `measureLineStats` | Quote card that auto-sizes to its tightest container |
See `templates/donut-orbit.html` and `templates/hello-orb-flow.html` for working single-file starters.
## Workflow
1. **Pick a pattern** from the table above based on the user's brief.
2. **Start from a template**:
- `templates/hello-orb-flow.html` — text reflowing around a moving orb (reflow-around-obstacle pattern)
- `templates/donut-orbit.html` — advanced example: measured ASCII logo obstacles, draggable wire sphere/cube, morphing shape fields, selectable DOM text, and dev-only controls
- `write_file` to a new `.html` in `/tmp/` or the user's workspace.
3. **Swap the corpus** for something intentional to the brief. Real prose, 10-100 sentences, no lorem.
4. **Tune the aesthetic** — font, palette, composition, interaction. This is the work; don't skip it.
5. **Verify locally**:
```sh
cd <dir-with-html> && python3 -m http.server 8765
# then open http://localhost:8765/<file>.html
```
6. **Check the console** — pretext will throw if `prepareWithSegments` is called with a bad font string; `Intl.Segmenter` is available in every modern browser.
7. **Show the user the file path**, not just the code — they want to open it.
## Performance Notes
- `prepare()` / `prepareWithSegments()` is the expensive call. Do it **once** per text+font pair. Cache the handle.
- On resize, only rerun `layout()` / `layoutWithLines()` — never re-prepare.
- For per-frame animations where text doesn't change but geometry does, `layoutNextLineRange` in a tight loop is cheap enough to do every frame at 60fps for normal-length paragraphs.
- When rendering ASCII masks per frame, keep a cell buffer (`Uint8Array`/typed arrays), derive measured per-row obstacle spans from the cells or projected geometry, merge spans, then feed those spans into `layoutNextLineRange` before drawing text.
- Keep visual animation and layout animation coupled. If a sphere morphs into a cube, tween both the rendered cell buffer and the obstacle spans with the same value; otherwise the demo looks painted-on instead of physically reflowed.
- For fades, prefer layer opacity over changing glyph intensity or obstacle scale. Put transient ASCII sprites on their own canvas and fade the canvas with CSS/GSAP opacity so geometry does not appear to shrink.
- Canvas `ctx.font` setting is surprisingly slow; set it **once** per frame if font doesn't vary, not per `fillText` call.
## Common Pitfalls
1. **Drifting CSS/canvas font strings.** `ctx.font = "16px Inter"` measured, but CSS says `font-family: Inter, sans-serif; font-size: 16px`. Fine *if* Inter loads. If Inter 404s, CSS falls back to sans-serif and measurements drift by 5-20%. Always `preload` the font or use a web-safe family.
2. **Re-preparing inside the animation loop.** Only `layout*` is cheap. Re-calling `prepare` every frame will tank perf. Keep the prepared handle in module scope.
3. **Forgetting `Intl.Segmenter` for grapheme splits.** Emoji, combining marks, CJK — `"é".split("")` gives you two chars. Use `new Intl.Segmenter(undefined, { granularity: "grapheme" })` when sampling individual visible glyphs.
4. **`break: 'never'` chips without `extraWidth`.** In `rich-inline`, if you use `break: 'never'` for an atomic chip/mention, you must also supply `extraWidth` for the pill padding — otherwise chip chrome overflows the container.
5. **Using `@chenglou/pretext` from `unpkg` with TypeScript-only entry.** Use `esm.sh` — it compiles the TS exports to browser-ready ESM automatically. `unpkg` will 404 or serve raw TS.
6. **Monospace fallbacks silently erasing the whole point.** Users seeing monospace-looking output often have a CSS `font-family` that fell through to `monospace`. Verify the actual rendered font via DevTools.
7. **Skipping rows vs adjusting width** when flowing around a shape. If the corridor on this row is too narrow to fit a line, *skip the row* (`y += lineHeight; continue;`) rather than passing a tiny maxWidth to `layoutNextLineRange` — pretext will return one-grapheme lines that look broken.
8. **Shipping a cold demo.** The default first-paint looks tutorial-grade. Add: vignette, subtle scanline, idle auto-motion, one carefully chosen interactive response (drag, hover, scroll, click). Without these, "cool pretext demo" lands as "intern repro of the README."
## Verification Checklist
- [ ] Demo is a single self-contained `.html` file — opens by double-click or `python3 -m http.server`
- [ ] `@chenglou/pretext` imported via `esm.sh` with pinned version
- [ ] Corpus is real prose, not lorem ipsum, and matches the demo's concept
- [ ] Font string passed to `prepare` matches the CSS font exactly
- [ ] `prepare()` / `prepareWithSegments()` called once, not per frame
- [ ] Dark background + considered palette — not the default white canvas
- [ ] At least one interactive response (drag / hover / scroll / click) or idle auto-motion
- [ ] Tested locally with `python3 -m http.server` and confirmed no console errors
- [ ] 60fps on a mid-tier laptop (or graceful degradation documented)
- [ ] One "extra mile" detail the user didn't ask for
## Reference: Community Demos
Clone these for inspiration / patterns (all MIT-ish, linked from [pretext.cool](https://www.pretext.cool/)):
- **Pretext Breaker** — breakout with word-bricks — `github.com/rinesh/pretext-breaker`
- **Tetris × Pretext** — `github.com/shinichimochizuki/tetris-pretext`
- **Dragon animation** — `github.com/qtakmalay/PreTextExperiments`
- **Somnai editorial engine** — `github.com/somnai-dreams/pretext-demos`
- **Bad Apple!! ASCII** — `github.com/frmlinn/bad-apple-pretext`
- **Drag-sprite reflow** — `github.com/dokobot/pretext-demo`
- **Alarmy editorial clock** — `github.com/SmisLee/alarmy-pretext-demo`
Official playground: [chenglou.me/pretext](https://chenglou.me/pretext/) — accordion, bubbles, dynamic-layout, editorial-engine, justification-comparison, masonry, markdown-chat, rich-note.
@@ -0,0 +1,258 @@
# Pretext Patterns
Copy-pasteable snippets for the most common pretext demo shapes. Each pattern is self-contained — drop into an HTML `<script type="module">` after importing from `https://esm.sh/@chenglou/pretext@0.0.6`.
## 1. Flow around an obstacle (variable-width column)
The signature pretext move. Row-by-row ask "how wide is the corridor here?" and let pretext break lines accordingly.
```js
const prepared = prepareWithSegments(TEXT, FONT);
const LINE_H = 24;
function drawFlow(ctx, obstacle /* {x,y,r} */, COL_X, COL_W, H) {
let cursor = { segmentIndex: 0, graphemeIndex: 0 };
let y = 72;
while (y < H - 40) {
const dy = y - obstacle.y;
const inBand = Math.abs(dy) < obstacle.r;
let x = COL_X, w = COL_W;
if (inBand) {
const half = Math.sqrt(obstacle.r ** 2 - dy ** 2);
const leftW = Math.max(0, (obstacle.x - half) - COL_X);
const rightW = Math.max(0, (COL_X + COL_W) - (obstacle.x + half));
if (leftW >= rightW) { x = COL_X; w = leftW - 12; }
else { x = obstacle.x + half + 12; w = rightW - 12; }
if (w < 40) { y += LINE_H; continue; } // skip rather than squeeze
}
const range = layoutNextLineRange(prepared, cursor, w);
if (!range) break;
const line = materializeLineRange(prepared, range);
ctx.fillText(line.text, x, y);
cursor = range.end;
y += LINE_H;
}
}
```
**Obstacle variants:** circles (above), rectangles (use `Math.max(0, …)` on the row-segment), multiple obstacles (sort segments and emit the wider remaining lane), animated obstacles (recompute every frame — pretext is fast enough).
## 2. Text-as-geometry game (word-bricks with collision)
Use `layoutWithLines` to get stable line rects, then treat each word as an axis-aligned box for physics.
```js
const prepared = prepareWithSegments(WORDS.join(" "), FONT);
const { lines } = layoutWithLines(prepared, FIELD_W, 28);
// Build brick rects: split each line on spaces and measure word-by-word.
const bricks = [];
let y = 50;
for (const line of lines) {
let x = 10;
for (const word of line.text.split(" ")) {
const wPx = ctx.measureText(word).width; // or use walkLineRanges per word
bricks.push({ x, y, w: wPx, h: 24, text: word, hp: 1 });
x += wPx + ctx.measureText(" ").width;
}
y += 28;
}
```
Collision: standard AABB vs the ball. When `hp` drops to 0, the brick is "eaten." For the aesthetic: fade brick opacity with hp, trail particles from the letters on impact.
## 3. Shatter / explode typography
Use `walkLineRanges` + a manual grapheme walk to get `(x, y)` for every glyph, then spawn particles.
```js
const prepared = prepareWithSegments(TEXT, FONT);
const particles = [];
let y = 100;
walkLineRanges(prepared, COL_W, (line) => {
// materialize so we get per-grapheme positions
const range = materializeLineRange(prepared, line);
const seg = new Intl.Segmenter(undefined, { granularity: "grapheme" });
let x = COL_X;
for (const { segment } of seg.segment(range.text)) {
const w = ctx.measureText(segment).width;
particles.push({ ch: segment, x, y, vx: 0, vy: 0, homeX: x, homeY: y });
x += w;
}
y += LINE_H;
});
// On click, kick particles outward from click point; ease them back to (homeX, homeY).
canvas.addEventListener("click", (e) => {
for (const p of particles) {
const dx = p.x - e.clientX, dy = p.y - e.clientY;
const d = Math.hypot(dx, dy) || 1;
const force = 400 / (d * 0.2 + 1);
p.vx += (dx / d) * force;
p.vy += (dy / d) * force;
}
});
function tick(dt) {
for (const p of particles) {
p.vx *= 0.92; p.vy *= 0.92;
p.vx += (p.homeX - p.x) * 0.06;
p.vy += (p.homeY - p.y) * 0.06;
p.x += p.vx * dt; p.y += p.vy * dt;
}
}
```
## 4. ASCII mask as moving obstacle
The "cool demos" money pattern: rasterize an ASCII logo, sprite, or bitmap into a cell buffer, then convert the occupied cells into per-row obstacle spans. Pretext lays the paragraphs around those spans, so the text actually opens around the moving ASCII object instead of being visually overpainted.
See `templates/donut-orbit.html` in this skill for a full implementation. Treat it as an example, not the canonical scene: it shows how to derive spans from an ASCII logo, project a wire shape into obstacle rows, keep text selectable in a DOM layer, and hide tuning controls behind `?dev`. Key structure:
```js
const CELL_W = 12, CELL_H = 15;
const cols = Math.ceil(W / CELL_W), rows = Math.ceil(H / CELL_H);
const asciiMask = new Uint8Array(cols * rows);
const obstacleRows = Array.from({ length: rows }, () => []);
function rasterizeLogo(time) {
asciiMask.fill(0);
for (const r of obstacleRows) r.length = 0;
for (const block of logoBlocks(time)) {
const r0 = Math.floor(block.y0 / CELL_H);
const r1 = Math.ceil(block.y1 / CELL_H);
for (let r = r0; r <= r1; r++) {
obstacleRows[r]?.push([block.x0 - 18, block.x1 + 22]);
// Fill asciiMask cells here for drawing.
}
}
mergeRowSpans(obstacleRows);
}
function drawParagraphs(prepared) {
let cursor = { segmentIndex: 0, graphemeIndex: 0 };
for (let y = yStart; y < yEnd; y += LINE_H) {
const spans = obstacleRows[Math.floor(y / CELL_H)];
for (const [x0, x1] of freeIntervalsAround(spans)) {
const range = layoutNextLineRange(prepared, cursor, x1 - x0);
if (!range) return;
ctx.fillText(materializeLineRange(prepared, range).text, x0, y);
cursor = range.end;
}
}
}
```
The important bit is that the ASCII geometry is not decorative only. The same moving spans that draw the logo or draggable object also carve the line intervals passed to `layoutNextLineRange`.
### Measured spans beat magic padding
When a logo or bitmap is rasterized into cells, measure the actual occupied cells per row and then add a small halo. Do not use one giant bounding box. Tight measured spans make the text read as if it is flowing around the letter shapes.
```js
const rowMin = new Float32Array(rows).fill(Infinity);
const rowMax = new Float32Array(rows).fill(-Infinity);
for (const cell of visibleCells) {
rowMin[cell.row] = Math.min(rowMin[cell.row], cell.x);
rowMax[cell.row] = Math.max(rowMax[cell.row], cell.x + CELL_W);
}
for (let row = 0; row < rows; row++) {
if (!Number.isFinite(rowMin[row])) continue;
obstacleRows[row].push([rowMin[row] - halo, rowMax[row] + halo]);
}
```
For sharp pixel-art letters, smooth adjacent rows before pushing spans. A 1-2 row halo usually prevents code/prose from touching corners without losing the letter silhouette.
### Morphing shapes need morphing obstacles
If the visible object morphs (sphere to cube, logo to particles, etc.), tween the collision field too. A convincing demo uses the same `mix` value for both the rendered buffer and the pretext obstacle rows.
```js
function pushMorphedRows(aRows, bRows, mix) {
for (let row = 0; row < rows; row++) {
const a = aRows[row] ?? [centerX, centerX];
const b = bRows[row] ?? [centerX, centerX];
obstacleRows[row].push([
a[0] + (b[0] - a[0]) * mix,
a[1] + (b[1] - a[1]) * mix,
]);
}
}
```
Without this, the artwork may morph while the text still wraps around the old shape, which breaks the pretext effect.
### Separate visual layers from collision
Use separate canvases when visual treatment should not affect layout. For example, fade an ASCII object with CSS opacity on its own canvas layer, but keep its obstacle rows controlled by explicit shape state. Fading glyph intensity or scaling obstacle spans often looks like the object is shrinking instead of fading.
## 5. Editorial multi-column with shared cursor
Classic magazine layout: three columns, text flows from the end of column 1 into the top of column 2, etc. Pretext makes this trivial because the cursor is portable between `layoutNextLineRange` calls.
```js
const prepared = prepareWithSegments(ARTICLE, FONT);
let cursor = { segmentIndex: 0, graphemeIndex: 0 };
for (const col of [COL1, COL2, COL3]) {
let y = col.y;
while (y < col.y + col.h) {
const range = layoutNextLineRange(prepared, cursor, col.w);
if (!range) return;
const line = materializeLineRange(prepared, range);
ctx.fillText(line.text, col.x, y);
cursor = range.end;
y += LINE_H;
}
}
```
Add pull quotes by treating them as obstacles in the middle column and using pattern #1 around them.
## 6. Multiline shrink-wrap (tightest-fitting card)
Given a max width, find the **smallest** container width that still produces the same line count. Useful for chat bubbles, quote cards, tooltip sizing.
```js
const prepared = prepareWithSegments(text, FONT);
const { lineCount, maxLineWidth } = measureLineStats(prepared, MAX_W);
// card width = maxLineWidth + padding; card height = lineCount * LINE_H + padding
```
For a demo that *visualizes* this, render the card shrinking from `MAX_W` down to `maxLineWidth` over a second — the line count stays constant but the right edge pulls in.
## 7. Kinetic typography
Animate per-line transforms over time. `layoutWithLines` gives you stable lines; index `i` drives the timing offset.
```js
const { lines } = layoutWithLines(prepared, W - 80, 40);
function frame(t) {
for (let i = 0; i < lines.length; i++) {
const phase = t * 0.001 - i * 0.15;
const y = 100 + i * 40 + Math.sin(phase) * 12;
const opacity = 0.4 + 0.6 * Math.max(0, Math.sin(phase));
ctx.globalAlpha = opacity;
ctx.fillText(lines[i].text, 40, y);
}
}
```
Variants: Star Wars crawl (perspective skew per line), wave (sine y-offset), bounce (ease-in-out arrival), glitch (per-glyph random offset using `Intl.Segmenter`).
## 8. Font stack patterns
| Vibe | Font string | Palette hint |
|------|-------------|--------------|
| Editorial / serious | `17px/1.4 "Iowan Old Style", Georgia, serif` | bone `#e8e6df` on charcoal `#0c0d10` |
| CRT / terminal | `600 13px "JetBrains Mono", ui-monospace, monospace` | amber `hsl(38 60% 62%)` on `#07070a` |
| Humanist / modern | `500 17px Inter, ui-sans-serif, system-ui, sans-serif` | off-white `#f3efe6` on deep-navy `#0b1020` |
| Display / poster | `700 64px "Playfair Display", serif` | hot-red `#ff4130` on cream `#f0ebe0` |
| Engineering | `14px "IBM Plex Mono", monospace` | neon-green `#7cff7c` on near-black `#0a0a0c` |
Always load the web font explicitly (Google Fonts link tag or `@font-face`) so the canvas measurement matches the CSS render.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,95 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>pretext hello — text flowing around an orb</title>
<style>
html,body { margin:0; padding:0; height:100%; background:#0c0d10; color:#e8e6df; overflow:hidden; }
body { font-family: "Iowan Old Style", Georgia, serif; }
canvas { display:block; width:100vw; height:100vh; }
</style>
</head>
<body>
<canvas id="c"></canvas>
<script type="module">
// Minimal pretext starter: long paragraph flows around a moving orb.
// Uses layoutNextLineRange + variable-width streaming — the "killer app"
// pattern that only pretext can do cheaply in the browser.
import {
prepareWithSegments,
layoutNextLineRange,
materializeLineRange,
} from "https://esm.sh/@chenglou/pretext@0.0.6";
const TEXT = `Pretext measures text without touching the DOM. It returns numbers — widths, line breaks, cursors — and those numbers, arranged with a little imagination, become layouts the browser could never draw on its own. Here, a paragraph flows around a moving orb. Each line is asked for its own width, live. No reflows. No cheats. Just measurement. `.repeat(18);
const FONT = '17px/1.4 "Iowan Old Style", Georgia, serif';
const LINE_H = 24;
const c = document.getElementById("c");
const ctx = c.getContext("2d");
let W, H, DPR;
function resize() {
DPR = Math.min(devicePixelRatio || 1, 2);
W = innerWidth; H = innerHeight;
c.width = W*DPR; c.height = H*DPR;
c.style.width = W+"px"; c.style.height = H+"px";
ctx.setTransform(DPR,0,0,DPR,0,0);
}
addEventListener("resize", resize); resize();
const prepared = prepareWithSegments(TEXT, FONT);
// Orb follows mouse (or bobs idly)
const orb = { x: innerWidth*0.45, y: innerHeight*0.5, r: 140 };
addEventListener("mousemove", e => { orb.x = e.clientX; orb.y = e.clientY; });
function frame(t) {
ctx.fillStyle = "#0c0d10"; ctx.fillRect(0,0,W,H);
// glowing orb
const g = ctx.createRadialGradient(orb.x, orb.y, 0, orb.x, orb.y, orb.r);
g.addColorStop(0, "rgba(255,200,120,0.35)");
g.addColorStop(0.6, "rgba(255,140,80,0.10)");
g.addColorStop(1, "rgba(0,0,0,0)");
ctx.fillStyle = g; ctx.fillRect(0,0,W,H);
// flow text as a column, routing around the orb row-by-row
const COL_X = 60, COL_W = W - 120;
let cursor = { segmentIndex: 0, graphemeIndex: 0 };
let y = 72;
ctx.fillStyle = "#e8e6df";
ctx.font = FONT;
ctx.textBaseline = "alphabetic";
while (y < H - 40) {
// does this row intersect the orb band?
const dy = y - orb.y;
const bandY = Math.abs(dy) < orb.r;
// lane = (left, width) skipping over the orb horizontally
let x = COL_X, lineMaxW = COL_W;
if (bandY) {
const half = Math.sqrt(orb.r*orb.r - dy*dy);
const orbLeft = orb.x - half, orbRight = orb.x + half;
// choose the wider side, simple heuristic
const leftWidth = Math.max(0, orbLeft - COL_X);
const rightWidth = Math.max(0, COL_X + COL_W - orbRight);
if (leftWidth >= rightWidth) { x = COL_X; lineMaxW = leftWidth - 12; }
else { x = orbRight + 12; lineMaxW = rightWidth - 12; }
if (lineMaxW < 40) { y += LINE_H; continue; }
}
const range = layoutNextLineRange(prepared, cursor, lineMaxW);
if (!range) break;
const line = materializeLineRange(prepared, range);
ctx.fillText(line.text, x, y);
cursor = range.end;
y += LINE_H;
}
requestAnimationFrame(frame);
}
requestAnimationFrame(frame);
</script>
</body>
</html>
@@ -0,0 +1,217 @@
---
name: sketch
description: "Throwaway HTML mockups: 2-3 design variants to compare."
version: 1.0.0
author: Hermes Agent (adapted from gsd-build/get-shit-done)
license: MIT
metadata:
hermes:
tags: [sketch, mockup, design, ui, prototype, html, variants, exploration, wireframe, comparison]
related_skills: [spike, claude-design, popular-web-designs, excalidraw]
---
# Sketch
Use this skill when the user wants to **see a design direction before committing** to one — exploring a UI/UX idea as disposable HTML mockups. The point is to generate 2-3 interactive variants so the user can compare visual directions side-by-side, not to produce shippable code.
Load this when the user says things like "sketch this screen", "show me what X could look like", "compare layout A vs B", "give me 2-3 takes on this UI", "let me see some variants", "mockup this before I build".
## When NOT to use this
- User wants a production component — use `claude-design` or build it properly
- User wants a polished one-off HTML artifact (landing page, deck) — `claude-design`
- User wants a diagram — `excalidraw`, `architecture-diagram`
- The design is already locked — just build it
## If the user has the full GSD system installed
If `gsd-sketch` shows up as a sibling skill (installed via `npx get-shit-done-cc --hermes`), prefer **`gsd-sketch`** for the full workflow: persistent `.planning/sketches/` with MANIFEST, frontier mode analysis, consistency audits across past sketches, and integration with the rest of GSD. This skill is the lightweight standalone version — one-off sketching without the state machinery.
## Core method
```
intake → variants → head-to-head → pick winner (or iterate)
```
### 1. Intake (skip if the user already gave you enough)
Before generating variants, get three things — one question at a time, not all at once:
1. **Feel.** "What should this feel like? Adjectives, emotions, a vibe." — *"calm, editorial, like Linear"* tells you more than *"minimal"*.
2. **References.** "What apps, sites, or products capture the feel you're imagining?" — actual references beat abstract descriptions.
3. **Core action.** "What's the single most important thing a user does on this screen?" — the variants should all serve this well; if they don't, they're just decoration.
Reflect each answer briefly before the next question. If the user already gave you all three upfront, skip straight to variants.
### 2. Variants (2-3, never 1, rarely 4+)
Produce **2-3 variants** in one go. Each variant is a complete, standalone HTML file. Don't describe variants — build them. The point is comparison.
Each variant should take a **different design stance**, not different pixel values. Three good variant axes:
- **Density:** compact / airy / ultra-dense (pick two contrasting poles)
- **Emphasis:** content-first / action-first / tool-first
- **Aesthetic:** editorial / utilitarian / playful
- **Layout:** single-column / sidebar / split-pane
- **Grounding:** card-based / bare-content / document-style
Pick one axis and pull apart from it. Two variants that differ only in accent color are wasted effort — the user can't distinguish them.
**Variant naming:** describe the stance, not the number.
```
sketches/
├── 001-calm-editorial/
│ ├── index.html
│ └── README.md
├── 001-utilitarian-dense/
│ ├── index.html
│ └── README.md
└── 001-playful-split/
├── index.html
└── README.md
```
### 3. Make them real HTML
Each variant is a **single self-contained HTML file**:
- Inline `<style>` — no build step, no external CSS
- System fonts or one Google Font via `<link>`
- Tailwind via CDN (`<script src="https://cdn.tailwindcss.com"></script>`) is fine
- Realistic fake content — actual sentences, actual names, not "Lorem ipsum"
- **Interactive**: links clickable, hovers real, at least one state transition (open/close, filter, toggle). A frozen static image is a worse spike than a sloppy animated one.
Open it in a browser. If it looks broken, fix it before showing the user.
**Verify variants visually — use Hermes' browser tools.** Don't just write HTML and hope it renders; load each variant and look at it:
```
browser_navigate(url="file:///absolute/path/to/sketches/001-calm-editorial/index.html")
browser_vision(question="Does this layout look clean and readable? Any visible bugs (overlapping text, unstyled elements, broken images)?")
```
`browser_vision` returns an AI description of what's actually on the page plus a screenshot path — catches layout bugs that pure source inspection misses (e.g. a font import that silently failed, a flex container that collapsed). Fix and re-navigate until each variant looks right.
**Default CSS reset + system font stack** for fast starts:
```html
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
"Helvetica Neue", Arial, sans-serif;
-webkit-font-smoothing: antialiased;
color: #1a1a1a;
background: #fafafa;
line-height: 1.5;
}
</style>
```
### 4. Variant README
Each variant's `README.md` answers:
```markdown
## Variant: {stance name}
### Design stance
One sentence on the principle driving this variant.
### Key choices
- Layout: ...
- Typography: ...
- Color: ...
- Interaction: ...
### Trade-offs
- Strong at: ...
- Weak at: ...
### Best for
- The kind of user or use case this variant actually serves
```
### 5. Head-to-head
After all variants are built, present them as a comparison. Don't just list — **opinionate**:
```markdown
## Three takes on the home screen
| Dimension | Calm editorial | Utilitarian dense | Playful split |
|-----------|----------------|-------------------|---------------|
| Density | Low | High | Medium |
| Primary action visibility | Low | High | Medium |
| Scan-ability | High | Medium | Low |
| Feel | Calm, trusted | Sharp, tool-like | Inviting, energetic |
**My take:** Utilitarian dense for power users, calm editorial for content-forward audiences. Playful split is weakest — tries to do both and commits to neither.
```
Let the user pick a winner, or combine two into a hybrid, or ask for another round.
## Theming (when the project has a visual identity)
If the user has an existing theme (colors, fonts, tokens), put shared tokens in `sketches/themes/tokens.css` and `@import` them in each variant. Keep tokens minimal:
```css
/* sketches/themes/tokens.css */
:root {
--color-bg: #fafafa;
--color-fg: #1a1a1a;
--color-accent: #0066ff;
--color-muted: #666;
--radius: 8px;
--font-display: "Inter", sans-serif;
--font-body: -apple-system, BlinkMacSystemFont, sans-serif;
}
```
Don't over-tokenize a throwaway sketch — three colors and one font is usually enough.
## Interactivity bar
A sketch is interactive enough when the user can:
1. **Click a primary action** and something visible happens (state change, modal, toast, navigation feint)
2. **See one meaningful state transition** (filter a list, toggle a mode, open/close a panel)
3. **Hover recognizable affordances** (buttons, rows, tabs)
More than that is over-engineering a throwaway. Less than that is a screenshot.
## Frontier mode (picking what to sketch next)
If sketches already exist and the user says "what should I sketch next?":
- **Consistency gaps** — two winning variants from different sketches made independent choices that haven't been composed together yet
- **Unsketched screens** — referenced but never explored
- **State coverage** — happy path sketched, but not empty / loading / error / 1000-items
- **Responsive gaps** — validated at one viewport; does it hold at mobile / ultrawide?
- **Interaction patterns** — static layouts exist; transitions, drag, scroll behavior don't
Propose 2-4 named candidates. Let the user pick.
## Output
- Create `sketches/` (or `.planning/sketches/` if the user is using GSD conventions) in the repo root
- One subdir per variant: `NNN-stance-name/index.html` + `README.md`
- Tell the user how to open them: `open sketches/001-calm-editorial/index.html` on macOS, `xdg-open` on Linux, `start` on Windows
- Keep variants disposable — a sketch that you felt the need to preserve should be promoted into real project code, not curated as an asset
**Typical tool sequence for one variant:**
```
terminal("mkdir -p sketches/001-calm-editorial")
write_file("sketches/001-calm-editorial/index.html", "<!doctype html>...")
write_file("sketches/001-calm-editorial/README.md", "## Variant: Calm editorial\n...")
browser_navigate(url="file://$(pwd)/sketches/001-calm-editorial/index.html")
browser_vision(question="How does this look? Any obvious layout issues?")
```
Repeat for each variant, then present the comparison table.
## Attribution
Adapted from the GSD (Get Shit Done) project's `/gsd-sketch` workflow — MIT © 2025 Lex Christopherson ([gsd-build/get-shit-done](https://github.com/gsd-build/get-shit-done)). The full GSD system ships persistent sketch state, theme/variant pattern references, and consistency-audit workflows; install with `npx get-shit-done-cc --hermes --global`.
@@ -1,9 +1,6 @@
---
name: songwriting-and-ai-music
description: >
Songwriting craft, AI music generation prompts (Suno focus), parody/adaptation
techniques, phonetic tricks, and lessons learned. These are tools and ideas,
not rules. Break any of them when the art calls for it.
description: "Songwriting craft and Suno AI music prompts."
tags: [songwriting, music, suno, parody, lyrics, creative]
triggers:
- writing a song
@@ -0,0 +1,355 @@
---
name: touchdesigner-mcp
description: "Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools."
version: 1.1.0
author: kshitijk4poor
license: MIT
metadata:
hermes:
tags: [TouchDesigner, MCP, twozero, creative-coding, real-time-visuals, generative-art, audio-reactive, VJ, installation, GLSL]
related_skills: [native-mcp, ascii-video, manim-video, hermes-video]
---
# TouchDesigner Integration (twozero MCP)
## CRITICAL RULES
1. **NEVER guess parameter names.** Call `td_get_par_info` for the op type FIRST. Your training data is wrong for TD 2025.32.
2. **If `tdAttributeError` fires, STOP.** Call `td_get_operator_info` on the failing node before continuing.
3. **NEVER hardcode absolute paths** in script callbacks. Use `me.parent()` / `scriptOp.parent()`.
4. **Prefer native MCP tools over td_execute_python.** Use `td_create_operator`, `td_set_operator_pars`, `td_get_errors` etc. Only fall back to `td_execute_python` for complex multi-step logic.
5. **Call `td_get_hints` before building.** It returns patterns specific to the op type you're working with.
## Architecture
```
Hermes Agent -> MCP (Streamable HTTP) -> twozero.tox (port 40404) -> TD Python
```
36 native tools. Free plugin (no payment/license — confirmed April 2026).
Context-aware (knows selected OP, current network).
Hub health check: `GET http://localhost:40404/mcp` returns JSON with instance PID, project name, TD version.
## Setup (Automated)
Run the setup script to handle everything:
```bash
bash "${HERMES_HOME:-$HOME/.hermes}/skills/creative/touchdesigner-mcp/scripts/setup.sh"
```
The script will:
1. Check if TD is running
2. Download twozero.tox if not already cached
3. Add `twozero_td` MCP server to Hermes config (if missing)
4. Test the MCP connection on port 40404
5. Report what manual steps remain (drag .tox into TD, enable MCP toggle)
### Manual steps (one-time, cannot be automated)
1. **Drag `~/Downloads/twozero.tox` into the TD network editor** → click Install
2. **Enable MCP:** click twozero icon → Settings → mcp → "auto start MCP" → Yes
3. **Restart Hermes session** to pick up the new MCP server
After setup, verify:
```bash
nc -z 127.0.0.1 40404 && echo "twozero MCP: READY"
```
## Environment Notes
- **Non-Commercial TD** caps resolution at 1280×1280. Use `outputresolution = 'custom'` and set width/height explicitly.
- **Codecs:** `prores` (preferred on macOS) or `mjpa` as fallback. H.264/H.265/AV1 require a Commercial license.
- Always call `td_get_par_info` before setting params — names vary by TD version (see CRITICAL RULES #1).
## Workflow
### Step 0: Discover (before building anything)
```
Call td_get_par_info with op_type for each type you plan to use.
Call td_get_hints with the topic you're building (e.g. "glsl", "audio reactive", "feedback").
Call td_get_focus to see where the user is and what's selected.
Call td_get_network to see what already exists.
```
No temp nodes, no cleanup. This replaces the old discovery dance entirely.
### Step 1: Clean + Build
**IMPORTANT: Split cleanup and creation into SEPARATE MCP calls.** Destroying and recreating same-named nodes in one `td_execute_python` script causes "Invalid OP object" errors. See pitfalls #11b.
Use `td_create_operator` for each node (handles viewport positioning automatically):
```
td_create_operator(type="noiseTOP", parent="/project1", name="bg", parameters={"resolutionw": 1280, "resolutionh": 720})
td_create_operator(type="levelTOP", parent="/project1", name="brightness")
td_create_operator(type="nullTOP", parent="/project1", name="out")
```
For bulk creation or wiring, use `td_execute_python`:
```python
# td_execute_python script:
root = op('/project1')
nodes = []
for name, optype in [('bg', noiseTOP), ('fx', levelTOP), ('out', nullTOP)]:
n = root.create(optype, name)
nodes.append(n.path)
# Wire chain
for i in range(len(nodes)-1):
op(nodes[i]).outputConnectors[0].connect(op(nodes[i+1]).inputConnectors[0])
result = {'created': nodes}
```
### Step 2: Set Parameters
Prefer the native tool (validates params, won't crash):
```
td_set_operator_pars(path="/project1/bg", parameters={"roughness": 0.6, "monochrome": true})
```
For expressions or modes, use `td_execute_python`:
```python
op('/project1/time_driver').par.colorr.expr = "absTime.seconds % 1000.0"
```
### Step 3: Wire
Use `td_execute_python` — no native wire tool exists:
```python
op('/project1/bg').outputConnectors[0].connect(op('/project1/fx').inputConnectors[0])
```
### Step 4: Verify
```
td_get_errors(path="/project1", recursive=true)
td_get_perf()
td_get_operator_info(path="/project1/out", detail="full")
```
### Step 5: Display / Capture
```
td_get_screenshot(path="/project1/out")
```
Or open a window via script:
```python
win = op('/project1').create(windowCOMP, 'display')
win.par.winop = op('/project1/out').path
win.par.winw = 1280; win.par.winh = 720
win.par.winopen.pulse()
```
## MCP Tool Quick Reference
**Core (use these most):**
| Tool | What |
|------|------|
| `td_execute_python` | Run arbitrary Python in TD. Full API access. |
| `td_create_operator` | Create node with params + auto-positioning |
| `td_set_operator_pars` | Set params safely (validates, won't crash) |
| `td_get_operator_info` | Inspect one node: connections, params, errors |
| `td_get_operators_info` | Inspect multiple nodes in one call |
| `td_get_network` | See network structure at a path |
| `td_get_errors` | Find errors/warnings recursively |
| `td_get_par_info` | Get param names for an OP type (replaces discovery) |
| `td_get_hints` | Get patterns/tips before building |
| `td_get_focus` | What network is open, what's selected |
**Read/Write:**
| Tool | What |
|------|------|
| `td_read_dat` | Read DAT text content |
| `td_write_dat` | Write/patch DAT content |
| `td_read_chop` | Read CHOP channel values |
| `td_read_textport` | Read TD console output |
**Visual:**
| Tool | What |
|------|------|
| `td_get_screenshot` | Capture one OP viewer to file |
| `td_get_screenshots` | Capture multiple OPs at once |
| `td_get_screen_screenshot` | Capture actual screen via TD |
| `td_navigate_to` | Jump network editor to an OP |
**Search:**
| Tool | What |
|------|------|
| `td_find_op` | Find ops by name/type across project |
| `td_search` | Search code, expressions, string params |
**System:**
| Tool | What |
|------|------|
| `td_get_perf` | Performance profiling (FPS, slow ops) |
| `td_list_instances` | List all running TD instances |
| `td_get_docs` | In-depth docs on a TD topic |
| `td_agents_md` | Read/write per-COMP markdown docs |
| `td_reinit_extension` | Reload extension after code edit |
| `td_clear_textport` | Clear console before debug session |
**Input Automation:**
| Tool | What |
|------|------|
| `td_input_execute` | Send mouse/keyboard to TD |
| `td_input_status` | Poll input queue status |
| `td_input_clear` | Stop input automation |
| `td_op_screen_rect` | Get screen coords of a node |
| `td_click_screen_point` | Click a point in a screenshot |
| `td_screen_point_to_global` | Convert screenshot pixel to absolute screen coords |
The table above covers the 32 tools used in typical creative workflows. The remaining 4 tools (`td_project_quit`, `td_test_session`, `td_dev_log`, `td_clear_dev_log`) are admin/dev-mode utilities — see `references/mcp-tools.md` for the full 36-tool reference with complete parameter schemas.
## Key Implementation Rules
**GLSL time:** No `uTDCurrentTime` in GLSL TOP. Use the Values page:
```python
# Call td_get_par_info(op_type="glslTOP") first to confirm param names
td_set_operator_pars(path="/project1/shader", parameters={"value0name": "uTime"})
# Then set expression via script:
# op('/project1/shader').par.value0.expr = "absTime.seconds"
# In GLSL: uniform float uTime;
```
Fallback: Constant TOP in `rgba32float` format (8-bit clamps to 0-1, freezing the shader).
**Feedback TOP:** Use `top` parameter reference, not direct input wire. "Not enough sources" resolves after first cook. "Cook dependency loop" warning is expected.
**Resolution:** Non-Commercial caps at 1280×1280. Use `outputresolution = 'custom'`.
**Large shaders:** Write GLSL to `/tmp/file.glsl`, then use `td_write_dat` or `td_execute_python` to load.
**Vertex/Point access (TD 2025.32):** `point.P[0]`, `point.P[1]`, `point.P[2]` — NOT `.x`, `.y`, `.z`.
**Extensions:** `ext0object` format is `"op('./datName').module.ClassName(me)"` in CONSTANT mode. After editing extension code with `td_write_dat`, call `td_reinit_extension`.
**Script callbacks:** ALWAYS use relative paths via `me.parent()` / `scriptOp.parent()`.
**Cleaning nodes:** Always `list(root.children)` before iterating + `child.valid` check.
## Recording / Exporting Video
```python
# via td_execute_python:
root = op('/project1')
rec = root.create(moviefileoutTOP, 'recorder')
op('/project1/out').outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'prores' # Apple ProRes — NOT license-restricted on macOS
rec.par.record = True # start
# rec.par.record = False # stop (call separately later)
```
H.264/H.265/AV1 need Commercial license. Use `prores` on macOS or `mjpa` as fallback.
Extract frames: `ffmpeg -i /tmp/output.mov -vframes 120 /tmp/frames/frame_%06d.png`
**TOP.save() is useless for animation** — captures same GPU texture every time. Always use MovieFileOut.
### Before Recording: Checklist
1. **Verify FPS > 0** via `td_get_perf`. If FPS=0 the recording will be empty. See pitfalls #38-39.
2. **Verify shader output is not black** via `td_get_screenshot`. Black output = shader error or missing input. See pitfalls #8, #40.
3. **If recording with audio:** cue audio to start first, then delay recording by 3 frames. See pitfalls #19.
4. **Set output path before starting record** — setting both in the same script can race.
## Audio-Reactive GLSL (Proven Recipe)
### Correct signal chain (tested April 2026)
```
AudioFileIn CHOP (playmode=sequential)
→ AudioSpectrum CHOP (FFT=512, outputmenu=setmanually, outlength=256, timeslice=ON)
→ Math CHOP (gain=10)
→ CHOP to TOP (dataformat=r, layout=rowscropped)
→ GLSL TOP input 1 (spectrum texture, 256x2)
Constant TOP (rgba32float, time) → GLSL TOP input 0
GLSL TOP → Null TOP → MovieFileOut
```
### Critical audio-reactive rules (empirically verified)
1. **TimeSlice must stay ON** for AudioSpectrum. OFF = processes entire audio file → 24000+ samples → CHOP to TOP overflow.
2. **Set Output Length manually** to 256 via `outputmenu='setmanually'` and `outlength=256`. Default outputs 22050 samples.
3. **DO NOT use Lag CHOP for spectrum smoothing.** Lag CHOP operates in timeslice mode and expands 256 samples to 2400+, averaging all values to near-zero (~1e-06). The shader receives no usable data. This was the #1 audio sync failure in testing.
4. **DO NOT use Filter CHOP either** — same timeslice expansion problem with spectrum data.
5. **Smoothing belongs in the GLSL shader** if needed, via temporal lerp with a feedback texture: `mix(prevValue, newValue, 0.3)`. This gives frame-perfect sync with zero pipeline latency.
6. **CHOP to TOP dataformat = 'r'**, layout = 'rowscropped'. Spectrum output is 256x2 (stereo). Sample at y=0.25 for first channel.
7. **Math gain = 10** (not 5). Raw spectrum values are ~0.19 in bass range. Gain of 10 gives usable ~5.0 for the shader.
8. **No Resample CHOP needed.** Control output size via AudioSpectrum's `outlength` param directly.
### GLSL spectrum sampling
```glsl
// Input 0 = time (1x1 rgba32float), Input 1 = spectrum (256x2)
float iTime = texture(sTD2DInputs[0], vec2(0.5)).r;
// Sample multiple points per band and average for stability:
// NOTE: y=0.25 for first channel (stereo texture is 256x2, first row center is 0.25)
float bass = (texture(sTD2DInputs[1], vec2(0.02, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.05, 0.25)).r) / 2.0;
float mid = (texture(sTD2DInputs[1], vec2(0.2, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.35, 0.25)).r) / 2.0;
float hi = (texture(sTD2DInputs[1], vec2(0.6, 0.25)).r +
texture(sTD2DInputs[1], vec2(0.8, 0.25)).r) / 2.0;
```
See `references/network-patterns.md` for complete build scripts + shader code.
## Operator Quick Reference
| Family | Color | Python class / MCP type | Suffix |
|--------|-------|-------------|--------|
| TOP | Purple | noiseTOP, glslTOP, compositeTOP, levelTop, blurTOP, textTOP, nullTOP | TOP |
| CHOP | Green | audiofileinCHOP, audiospectrumCHOP, mathCHOP, lfoCHOP, constantCHOP | CHOP |
| SOP | Blue | gridSOP, sphereSOP, transformSOP, noiseSOP | SOP |
| DAT | White | textDAT, tableDAT, scriptDAT, webserverDAT | DAT |
| MAT | Yellow | phongMAT, pbrMAT, glslMAT, constMAT | MAT |
| COMP | Gray | geometryCOMP, containerCOMP, cameraCOMP, lightCOMP, windowCOMP | COMP |
## Security Notes
- MCP runs on localhost only (port 40404). No authentication — any local process can send commands.
- `td_execute_python` has unrestricted access to the TD Python environment and filesystem as the TD process user.
- `setup.sh` downloads twozero.tox from the official 404zero.com URL. Verify the download if concerned.
- The skill never sends data outside localhost. All MCP communication is local.
## References
| File | What |
|------|------|
| `references/pitfalls.md` | Hard-won lessons from real sessions |
| `references/operators.md` | All operator families with params and use cases |
| `references/network-patterns.md` | Recipes: audio-reactive, generative, GLSL, instancing |
| `references/mcp-tools.md` | Full twozero MCP tool parameter schemas |
| `references/python-api.md` | TD Python: op(), scripting, extensions |
| `references/troubleshooting.md` | Connection diagnostics, debugging |
| `references/glsl.md` | GLSL uniforms, built-in functions, shader templates |
| `references/postfx.md` | Post-FX: bloom, CRT, chromatic aberration, feedback glow |
| `references/layout-compositor.md` | HUD layout patterns, panel grids, BSP-style layouts |
| `references/operator-tips.md` | Wireframe rendering, feedback TOP setup |
| `references/geometry-comp.md` | Geometry COMP: instancing, POP vs SOP, morphing |
| `references/audio-reactive.md` | Audio band extraction, beat detection, envelope following |
| `references/animation.md` | LFOs, timers, keyframes, easing, expression-driven motion |
| `references/midi-osc.md` | MIDI/OSC controllers, TouchOSC, multi-machine sync |
| `references/particles.md` | POPs and legacy particleSOP — emission, forces, collisions |
| `references/projection-mapping.md` | Multi-window output, corner pin, mesh warp, edge blending |
| `references/external-data.md` | HTTP, WebSocket, MQTT, Serial, TCP, webserverDAT |
| `references/panel-ui.md` | Custom params, panel COMPs, button/slider/field, panelExecuteDAT |
| `references/replicator.md` | replicatorCOMP — data-driven cloning, layouts, callbacks |
| `references/dat-scripting.md` | Execute DAT family — chop/dat/parameter/panel/op/executeDAT |
| `references/3d-scene.md` | Lighting rigs, shadows, IBL/cubemaps, multi-camera, PBR |
| `scripts/setup.sh` | Automated setup script |
---
> You're not writing code. You're conducting light.
@@ -0,0 +1,275 @@
# 3D Scene Reference
Lighting rigs, shadows, IBL/cubemaps, multi-camera, and PBR materials. For wireframe rendering and feedback TOPs see `operator-tips.md`. For instancing geometry see `geometry-comp.md`. For shader code see `glsl.md`.
---
## Anatomy of a 3D Scene
```
[Geometry COMP] ← contains SOPs (the shapes)
[Material] ← Phong/PBR/GLSL/Constant MAT
[Light COMPs] ← point/directional/spot/area/environment
[Camera COMP] ← view position, FOV
[Render TOP] ← combines geo + lights + camera into a 2D image
[post-FX chain] ← bloomTOP, glsl shaders, etc.
[windowCOMP] ← actual display
```
Render TOP is the heart. It takes an explicit `geometry` path, an explicit `camera` path, and lights via the lights table or an envlight reference.
---
## Minimal Scene
```python
# Geometry
geo = root.create(geometryCOMP, 'scene_geo')
sphere = geo.create(sphereSOP, 'shape')
sphere.par.rad = 1.0; sphere.par.rows = 64; sphere.par.cols = 64
# Material — start with PBR
mat = root.create(pbrMAT, 'mat')
mat.par.basecolorr = 0.7; mat.par.basecolorg = 0.7; mat.par.basecolorb = 0.7
mat.par.metallic = 0.0
mat.par.roughness = 0.4
geo.par.material = mat.path
# Camera
cam = root.create(cameraCOMP, 'cam1')
cam.par.tx = 0; cam.par.ty = 0; cam.par.tz = 4
cam.par.fov = 45
cam.par.near = 0.1; cam.par.far = 100
# Key light
key = root.create(lightCOMP, 'key_light')
key.par.lighttype = 'point'
key.par.tx = 3; key.par.ty = 3; key.par.tz = 3
key.par.dimmer = 1.5
# Render
render = root.create(renderTOP, 'render1')
render.par.outputresolution = 'custom'
render.par.resolutionw = 1920; render.par.resolutionh = 1080
render.par.camera = cam.path
render.par.geometry = geo.path
render.par.lights = key.path # single light path; for multi, see below
render.par.bgcolorr = 0; render.par.bgcolorg = 0; render.par.bgcolorb = 0
```
For multiple lights, leave `par.lights` blank — Render TOP scans the network for all `lightCOMP` and `envlightCOMP` ops by default. To restrict to specific lights, set `par.lights = '/project1/key_light /project1/fill_light'` (space-separated paths).
---
## Light Types
| Type | What | Common params |
|---|---|---|
| `point` | Omnidirectional, falls off with distance | `dimmer`, `coneangle` (n/a), `attenuation` |
| `directional` | Parallel rays, infinite distance (sun) | `dimmer`, light's rotation only matters |
| `spot` | Cone, falls off with distance + angle | `coneangle`, `conedelta`, `dimmer` |
| `cone` | Like spot but harder edge | same |
| `area` | Rectangular soft light source | `sizex`, `sizey` |
For all: `colorr`, `colorg`, `colorb`, `tx/ty/tz`, `rx/ry/rz`, `dimmer`.
### Three-Point Lighting (Studio Setup)
```python
# Key — main light, ~45° front
key = root.create(lightCOMP, 'key')
key.par.lighttype = 'point'
key.par.tx = 4; key.par.ty = 3; key.par.tz = 4
key.par.dimmer = 1.5
key.par.colorr = 1.0; key.par.colorg = 0.95; key.par.colorb = 0.85
# Fill — softer, opposite side
fill = root.create(lightCOMP, 'fill')
fill.par.lighttype = 'area'
fill.par.tx = -4; fill.par.ty = 2; fill.par.tz = 3
fill.par.dimmer = 0.5
fill.par.colorr = 0.7; fill.par.colorg = 0.8; fill.par.colorb = 1.0
fill.par.sizex = 4; fill.par.sizey = 4
# Rim/back — outline from behind
rim = root.create(lightCOMP, 'rim')
rim.par.lighttype = 'spot'
rim.par.tx = 0; rim.par.ty = 4; rim.par.tz = -4
rim.par.coneangle = 30
rim.par.dimmer = 1.0
# Optional: ambient lift to prevent pure-black shadows
amb = root.create(ambientlightCOMP, 'ambient')
amb.par.dimmer = 0.15
```
---
## Shadows
Spot and directional lights cast shadows when `par.shadowtype != 'none'`.
```python
key.par.shadowtype = 'softshadow' # 'none' | 'hardshadow' | 'softshadow'
key.par.shadowsize = 1024 # shadow map resolution
key.par.shadowsoftness = 0.02 # softshadow only
```
**Tips:**
- Soft shadows are GPU-expensive. Start with `shadowsize = 1024` and only go higher (2048/4096) if shadow edges look pixelated at your resolution.
- Set the spot light's `near`/`far` to JUST contain the scene. Wider range = wasted shadow map precision.
- Multiple shadow-casting lights compound cost. Limit to 1-2 in real-time work; pre-bake the rest into the materials.
---
## Image-Based Lighting (IBL) / Environment Light
For realistic PBR materials you need a cubemap for reflections.
```python
# Environment light from an HDR
env = root.create(envlightCOMP, 'env')
env.par.envmap = '/project1/cube_in' # path to a TOP that produces a cubemap
env.par.envlightmap = ... # diffuse irradiance map (often same as envmap)
env.par.dimmer = 1.0
# Cubemap source — option A: built-in cubeTOP from 6 faces
cube = root.create(cubeTOP, 'cube_in')
# (assign 6 face TOPs)
# Option B: HDR equirectangular → cubemap conversion
# Use a moviefileinTOP loading .hdr or .exr, then projectTOP type='cubemapfromequirect'
hdr = root.create(moviefileinTOP, 'hdr_src')
hdr.par.file = '/path/to/environment.hdr'
proj = root.create(projectTOP, 'cube_proj')
proj.par.projecttype = 'cubemapfromequirect'
proj.inputConnectors[0].connect(hdr)
```
PBR materials sample the environment automatically when `envlightCOMP` is in the scene. Verify param names with `td_get_par_info(op_type='envlightCOMP')` — TD versions vary.
---
## PBR Material Setup
```python
mat = root.create(pbrMAT, 'pbr_metal')
mat.par.basecolorr = 0.95; mat.par.basecolorg = 0.65; mat.par.basecolorb = 0.4
mat.par.metallic = 1.0
mat.par.roughness = 0.25
mat.par.specularlevel = 0.5
mat.par.emitcolorr = 0; mat.par.emitcolorg = 0; mat.par.emitcolorb = 0
# Texture maps
mat.par.basecolormap = '/project1/textures/albedo' # TOP path
mat.par.metallicroughnessmap = '/project1/textures/mr' # G=roughness, B=metallic (glTF convention)
mat.par.normalmap = '/project1/textures/normal'
mat.par.emitmap = '/project1/textures/emit'
mat.par.occlusionmap = '/project1/textures/ao'
```
**Material idioms:**
| Look | metallic | roughness | basecolor |
|---|---|---|---|
| Brushed steel | 1.0 | 0.4 | (0.7, 0.7, 0.7) |
| Polished gold | 1.0 | 0.1 | (1.0, 0.85, 0.4) |
| Plastic | 0.0 | 0.5 | mid-saturated |
| Rubber | 0.0 | 0.9 | dark |
| Glass | 0.0 | 0.05 | (1, 1, 1), low alpha + transmission |
| Glowing emitter | 0.0 | 1.0 | dark, high `emitcolor` |
For glass/transmission, recent TD versions support `transmission` in PBR; older versions need glslMAT.
---
## Multi-Camera Setups
For comparison views, instant replay, multi-screen mapping, etc.
```python
# Camera A — main scene
cam_a = root.create(cameraCOMP, 'cam_main')
cam_a.par.tz = 5
# Camera B — orbiting top-down
cam_b = root.create(cameraCOMP, 'cam_top')
cam_b.par.ty = 6; cam_b.par.rx = -90
# Render each via separate Render TOPs
render_a = root.create(renderTOP, 'render_main')
render_a.par.camera = cam_a.path
render_a.par.geometry = geo.path
render_b = root.create(renderTOP, 'render_top')
render_b.par.camera = cam_b.path
render_b.par.geometry = geo.path
```
Composite both with a `multiplyTOP`/`compositeTOP` for picture-in-picture, or route to separate `windowCOMP`s for multi-display.
### Camera animation
Drive camera params via expressions (orbit), animationCOMP (waypoint), or LFO (oscillation):
```python
# Orbiting camera
cam_a.par.tx.mode = ParMode.EXPRESSION
cam_a.par.tx.expr = "cos(absTime.seconds * 0.3) * 6"
cam_a.par.tz.mode = ParMode.EXPRESSION
cam_a.par.tz.expr = "sin(absTime.seconds * 0.3) * 6"
cam_a.par.lookat = '/project1/scene_geo' # auto-aim at target
```
`par.lookat` is the simplest "always look at target" mechanism.
### Depth of field
PBR + Render TOP supports DOF when `par.dof = 'on'`.
```python
render.par.dof = 'on'
render.par.focusdistance = 5.0
render.par.aperture = 0.05 # blur strength
render.par.bokehshape = 'hexagon'
```
DOF is GPU-heavy. Render at lower res then upscale for performance.
---
## Common Pitfalls
1. **Render TOP shows black** — most common cause: no light. Even with PBR you need at least one `lightCOMP` or `envlightCOMP`. Add an `ambientlightCOMP` at low dimmer as a safety net.
2. **Material doesn't appear**`geo.par.material` must be a string PATH, not the material op itself. Use `mat.path`, not `mat`.
3. **Lights ignored** — by default Render TOP picks up ALL `lightCOMP`s in the network. If you have leftover lights from another scene, they leak in. Set `par.lights` explicitly.
4. **PBR looks flat** — without an `envlightCOMP` providing reflections, PBR materials look like Phong. Add one even if you don't have an HDR (use a `constantTOP` cubemap as fallback).
5. **Shadow acne / striping** — increase `par.shadowbias` slightly. Tune per-light.
6. **Camera inside geometry** — if `cam.par.tz` is INSIDE a sphere, you see the inside (or nothing if backface culled). Move the camera further out.
7. **Light range too small** — point lights have implicit attenuation. Far-away geometry receives little light. Increase `par.dimmer` or move lights closer.
8. **Multiple cameras conflict** — one render TOP = one camera. Don't try to share. Use multiple render TOPs.
9. **Wrong handedness** — TD is right-handed Y-up. Imported assets from Z-up apps (Blender, Maya in Z-up) need a 90° X rotation on the geo COMP.
10. **Cooking budget** — PBR + IBL + shadows + DOF at 1080p60 is fine on modern GPUs but 4K + 4 lights + soft shadows + DOF will tank. Profile via `td_get_perf` and downgrade settings before adding more.
---
## Quick Recipes
| Goal | Recipe |
|---|---|
| Studio portrait | 3-point rig (key + fill + rim) + ambient + PBR mat + DOF |
| Outdoor daylight | One directional `lightCOMP` (sun) + envlight (sky HDR) + soft shadows |
| Dramatic / film noir | Single spot light from upper side, hard shadows, deep ambient = 0.05 |
| Abstract / dreamy | Multiple area lights at low dimmer, no shadows, `bloomTOP` post |
| Product render | Three-point + IBL + neutral PBR + `bgcolorr=g=b=1` (white seamless) |
| Game-style | Phong MAT + 1-2 lights + no IBL + flat ambient (cheap, stylized) |
| Wireframe + solid | Two render TOPs (one with wireframeMAT, one with PBR), composite via `addTOP` |
| Orbiting camera | `par.lookat` + expressions on tx/tz using sin/cos |
@@ -0,0 +1,221 @@
# Animation Reference
Patterns for time-based motion — keyframes, LFOs, timers, easing, expression-driven animation.
Always call `td_get_par_info` for the op type before setting params. Param names below reflect TD 2025.32 but verify if errors fire.
---
## Time Sources
TD has three time references — pick the right one.
| Expression | Behavior | Use for |
|---|---|---|
| `absTime.seconds` | Wall-clock seconds since TD started. Never resets. | Continuous motion, GLSL `uTime`, infinite loops |
| `absTime.frame` | Wall-clock frame count. | Frame-accurate triggers |
| `me.time.frame` | Local component frame count (resets on play/stop). | Per-COMP animation timeline |
| `me.time.seconds` | Local component seconds. | Same, in seconds |
**Rule:** for shaders and continuous motion use `absTime.seconds`. For triggered/looping animations inside a COMP use `me.time.*`.
---
## LFO CHOP — Cyclic Motion
The simplest periodic driver. Fast, GPU-cheap, expression-friendly.
```python
lfo = root.create(lfoCHOP, 'rot_driver')
lfo.par.type = 'sin' # 'sin' | 'cos' | 'ramp' | 'square' | 'triangle' | 'pulse'
lfo.par.frequency = 0.25 # cycles per second
lfo.par.amplitude = 1.0
lfo.par.offset = 0.0
lfo.par.phase = 0.0 # 0-1, useful for offsetting parallel LFOs
```
**Drive a parameter via export:**
```python
op('/project1/geo1').par.rx.mode = ParMode.EXPRESSION
op('/project1/geo1').par.rx.expr = "op('rot_driver')['chan1'] * 360"
```
**Multiple synced LFOs (X/Y/Z rotation with phase offsets):**
Create one LFO with three channels and phase-offset each, or use three LFOs and offset their `phase` params (0.0, 0.33, 0.66).
---
## Timer CHOP — Triggered Sequences
For run-once animations, beat-locked sequences, or stage-based logic.
```python
timer = root.create(timerCHOP, 'fade_timer')
timer.par.length = 4.0 # cycle length in seconds
timer.par.cycle = False # run once vs. loop
timer.par.outputseconds = True
```
Output channels: `timer_fraction` (0→1 across the cycle), `running`, `done`, `cycles`.
**Start the timer:**
```python
timer.par.start.pulse()
```
**Drive a fade:**
```python
op('/project1/level1').par.opacity.mode = ParMode.EXPRESSION
op('/project1/level1').par.opacity.expr = "op('fade_timer')['timer_fraction']"
```
**Easing on the timer fraction** — apply in the expression itself:
```python
# Smoothstep: ease in/out
expr = "smoothstep(0, 1, op('fade_timer')['timer_fraction'])"
# Cubic ease-out: 1 - (1-t)^3
expr = "1 - pow(1 - op('fade_timer')['timer_fraction'], 3)"
```
---
## Pattern CHOP — Custom Curves
For arbitrary waveforms (saw ramps, easing curves, custom envelopes).
```python
pat = root.create(patternCHOP, 'envelope')
pat.par.type = 'gaussian' # 'gaussian' | 'ramp' | 'square' | 'sin' | etc.
pat.par.length = 60 # samples
pat.par.cyclelength = 1.0 # seconds at TD framerate
```
Combine with `lookupCHOP` to remap a 0-1 driver through a custom curve.
---
## Animation COMP — Keyframe-Based
For multi-keyframe motion graphics. Each animationCOMP holds channels with keyframes editable in the Animation Editor.
```python
anim = root.create(animationCOMP, 'intro_anim')
# By default has channels chan1..chanN; access via:
# op('intro_anim').par.length, .par.play, .par.cue, etc.
# Drive a parameter from a channel
op('/project1/text1').par.tx.mode = ParMode.EXPRESSION
op('/project1/text1').par.tx.expr = "op('intro_anim/out1')['chan1']"
```
**Keyframes are typically edited in the UI** (Animation Editor), but can be set via `keyframes` table internally. For programmatic keyframe creation, use `td_execute_python`:
```python
# Get the channel CHOP inside an animationCOMP
ch = op('/project1/intro_anim/chans')
# Insert a key (advanced API — verify with td_get_par_info(op_type='animationCOMP'))
ch.appendKey('chan1', frame=0, value=0.0, expression=None)
ch.appendKey('chan1', frame=120, value=1.0)
```
For most use cases, drive params with LFO/Timer/Pattern CHOPs instead — simpler and scriptable.
---
## Easing in Expressions
TD's expression evaluator supports Python math. Common easing forms:
```python
# Linear
"t"
# Smoothstep (classic ease-in-out)
"smoothstep(0, 1, t)"
# Ease-out cubic
"1 - pow(1 - t, 3)"
# Ease-in cubic
"pow(t, 3)"
# Ease-in-out cubic
"3*t*t - 2*t*t*t"
# Bounce (manual, simplified)
"abs(sin(t * 6.28 * 3) * (1 - t))"
```
Where `t` is `op('fade_timer')['timer_fraction']` or any 0-1 driver.
---
## Filter CHOP — Smoothing Existing Channels
Smooth out jittery values (e.g., audio analysis, sensor data) before driving visuals.
```python
filt = root.create(filterCHOP, 'smooth')
filt.par.filter = 'gaussian' # or 'lowpass'
filt.par.width = 0.5 # smoothing window in seconds
filt.inputConnectors[0].connect(op('raw_signal'))
```
**WARNING:** Do NOT use Filter CHOP on AudioSpectrum output in timeslice mode — it expands the sample count and averages bins to near-zero. See `audio-reactive.md`.
---
## Lag CHOP — Asymmetric Attack/Release
Different speeds for rising vs. falling values. Standard for visualizing audio envelopes.
```python
lag = root.create(lagCHOP, 'env_smooth')
lag.par.lag1 = 0.02 # attack (rise time, seconds)
lag.par.lag2 = 0.30 # release (fall time, seconds)
lag.inputConnectors[0].connect(op('raw_envelope'))
```
Fast attack, slow release = classic VU-meter feel.
---
## Per-Frame Driving via Script DAT
For complex per-frame logic that doesn't fit expressions, use a `executeDAT` (`onFrameStart` callback) or a `chopExecuteDAT`.
```python
# In an executeDAT (frameStart):
def onFrameStart(frame):
t = absTime.seconds
op('/project1/circle').par.tx = math.sin(t * 2.0) * 3.0
op('/project1/circle').par.ty = math.cos(t * 2.0) * 3.0
return
```
Heavy logic should still be in CHOPs (CPU-cheap, deterministic). Reserve scripts for one-shots or non-realtime branching.
---
## Pitfalls
1. **Frame rate dependency**`me.time.frame` is in TD project frames (default 60). If your project rate changes, motion speed changes. Use `seconds` for rate-independent timing.
2. **Cooking budget** — every CHOP that drives a parameter cooks every frame. Consolidate drivers (one big mathCHOP > many small ones).
3. **Expression mode** — params default to `CONSTANT`. `par.X.expr = ...` is ignored unless `par.X.mode = ParMode.EXPRESSION`.
4. **Animation editor edits** — keyframes set via UI live in the animationCOMP's internal keyframe table. They survive save/reopen. Programmatic keys via `appendKey()` work but verify the API with `td_get_docs(topic='animation')` first.
5. **Looping animations** — for seamless loops, `length` must equal `cyclelength` and the start/end values must match. Otherwise expect a visible jump.
---
## Quick Recipes
| Goal | Simplest path |
|---|---|
| Continuous rotation | LFO CHOP `type='ramp'`, expr → `geo.par.rx` |
| Fade in over 2s | Timer CHOP `length=2`, smoothstep expr → `level.par.opacity` |
| Pulse on every beat | `triggerCHOP` from audio → drive scale via expression |
| 3D Lissajous orbit | Two LFOs with different freq, drive `tx`/`ty`/`tz` |
| Random jitter | `noiseCHOP` (low-freq) added to position |
| Timed scene switch | Timer CHOP → switchTOP/CHOP `index` |
@@ -0,0 +1,175 @@
# Audio-Reactive Reference
Patterns for driving visuals from audio — spectrum analysis, beat detection, envelope following.
## Audio Input
```python
# Live input from audio interface
audio_in = root.create(audiodeviceinCHOP, 'audio_in')
audio_in.par.rate = 44100
# OR: from audio file (for testing)
audio_file = root.create(audiofileinCHOP, 'audio_in')
audio_file.par.file = '/path/to/track.wav'
audio_file.par.play = True
audio_file.par.repeat = 'on' # NOT par.loop
audio_file.par.playmode = 'locked'
```
---
## Audio Band Extraction (Verified TD 2025.32460)
Use `audiofilterCHOP` for band separation (NOT `selectCHOP` by channel index):
```python
# Audio input
af = root.create(audiofileinCHOP, 'audio_in')
af.par.file = path
af.par.play = True
af.par.repeat = 'on'
af.par.playmode = 'locked'
# Low band: lowpass @ 250Hz
flt_low = root.create(audiofilterCHOP, 'flt_low')
flt_low.par.filter = 'lowpass'
flt_low.par.cutofffrequency = 250
flt_low.par.rolloff = 2
flt_low.inputConnectors[0].connect(af)
# Mid band: highpass@250 → lowpass@4000
flt_mid_hp = root.create(audiofilterCHOP, 'flt_mid_hp')
flt_mid_hp.par.filter = 'highpass'
flt_mid_hp.par.cutofffrequency = 250
flt_mid_hp.par.rolloff = 2
flt_mid_hp.inputConnectors[0].connect(af)
flt_mid_lp = root.create(audiofilterCHOP, 'flt_mid_lp')
flt_mid_lp.par.filter = 'lowpass'
flt_mid_lp.par.cutofffrequency = 4000
flt_mid_lp.par.rolloff = 2
flt_mid_lp.inputConnectors[0].connect(flt_mid_hp)
# High band: highpass @ 4000Hz
flt_high = root.create(audiofilterCHOP, 'flt_high')
flt_high.par.filter = 'highpass'
flt_high.par.cutofffrequency = 4000
flt_high.par.rolloff = 2
flt_high.inputConnectors[0].connect(af)
# Per-band: RMS → lag → gain → clamp
for name, filt in [('low', flt_low), ('mid', flt_mid_lp), ('high', flt_high)]:
rms = root.create(analyzeCHOP, f'rms_{name}')
rms.par.function = 'rmspower' # NOT 'rms'
rms.inputConnectors[0].connect(filt)
lag = root.create(lagCHOP, f'lag_{name}')
lag.par.lag1 = 0.05 # attack (NOT par.lagin)
lag.par.lag2 = 0.25 # release (NOT par.lagout)
lag.inputConnectors[0].connect(rms)
math = root.create(mathCHOP, f'scale_{name}')
math.par.gain = 8.0
math.inputConnectors[0].connect(lag)
# mathCHOP has NO par.clamp — use limitCHOP
lim = root.create(limitCHOP, f'clamp_{name}')
lim.par.type = 'clamp'
lim.par.min = 0.0
lim.par.max = 1.0
lim.inputConnectors[0].connect(math)
null = root.create(nullCHOP, f'out_{name}')
null.inputConnectors[0].connect(lim)
null.viewer = True
```
**Key TD 2025 corrections:**
- `analyzeCHOP.par.function = 'rmspower'` NOT `'rms'`
- `lagCHOP.par.lag1` / `par.lag2` NOT `par.lagin` / `par.lagout`
- `mathCHOP` has NO `par.clamp` — use separate `limitCHOP`
---
## Beat / Onset Detection
### Kick Detection (slope → trigger)
```python
slope = root.create(slopeCHOP, 'kick_slope')
slope.inputConnectors[0].connect(op('out_low'))
trig = root.create(triggerCHOP, 'kick_trig')
trig.par.threshold = 0.12
trig.par.attack = 0.005 # NOT par.attacktime
trig.par.decay = 0.15 # NOT par.decaytime
trig.par.triggeron = 'increase'
trig.inputConnectors[0].connect(slope)
kick_out = root.create(nullCHOP, 'out_kick')
kick_out.inputConnectors[0].connect(trig)
```
---
## Passing Audio to GLSL
```python
glsl.par.vec0name = 'uLow'
glsl.par.vec0valuex.expr = "op('out_low')['chan1']"
glsl.par.vec0valuex.mode = ParMode.EXPRESSION
glsl.par.vec1name = 'uKick'
glsl.par.vec1valuex.expr = "op('out_kick')['chan1']"
glsl.par.vec1valuex.mode = ParMode.EXPRESSION
```
```glsl
uniform float uLow;
uniform float uKick;
float scale = 1.0 + uKick * 0.4 + uLow * 0.2;
```
---
## Standard Audio Bus Pattern
Recommended structure:
```
audiodeviceinCHOP (audio_in)
[null_audio_in]
├──→ audiofilterCHOP (lowpass@250) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
├──→ audiofilterCHOP (bandpass@250-4k) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
├──→ audiofilterCHOP (highpass@4k) → analyzeCHOP → lagCHOP → mathCHOP → limitCHOP → null
└──→ slopeCHOP → triggerCHOP (beat_trigger)
```
Keep this entire bus inside a `baseCOMP` (e.g., `audio_bus`) and reference via paths from visual networks.
---
## MIDI Input
```python
midi_in = root.create(midiinCHOP, 'midi_in')
midi_in.par.device = 0 # Check midiinDAT for device index
# Outputs channels named by MIDI note/CC: 'ch1n60', 'ch1c74', etc.
# Map CC to a parameter
op('bloom1').par.threshold.mode = ParMode.EXPRESSION
op('bloom1').par.threshold.expr = "op('midi_in')['ch1c74'][0]"
```
---
## CRITICAL: DO NOT use Lag CHOP for spectrum smoothing
Lag CHOP in timeslice mode expands 256-sample spectrum to 1600-2400 samples, averaging all values to near-zero (~1e-06). The shader receives no usable data. Use `mathCHOP(gain=8)` directly, or smooth in GLSL via temporal lerp with a feedback texture.
Verified:
- Without Lag CHOP: bass bins = 5.0-5.4 (strong, usable)
- With Lag CHOP: ALL bins = 0.000001 (dead)
@@ -0,0 +1,352 @@
# DAT-Based Scripting Reference
TD's event/callback model — Python that runs in response to network events. The full set of "Execute DATs" plus their idiomatic patterns.
For arbitrary Python execution (not callback-based), see `python-api.md`. For the MCP's `td_execute_python` tool, see `mcp-tools.md`.
---
## The Execute DAT Family
Every type watches one kind of event source and fires Python on changes.
| DAT | Watches | Use for |
|---|---|---|
| `chopExecuteDAT` | A CHOP's channel values | Audio triggers, threshold callbacks, state machines on numeric input |
| `datExecuteDAT` | A DAT's content (table cells, text) | Reacting to data updates from APIs, parsing webDAT responses |
| `parameterExecuteDAT` | A parameter's value or pulse | Reacting to user-changed params, custom pulse buttons |
| `panelExecuteDAT` | A panel COMP's interaction | Button clicks, slider drags, field commits |
| `opExecuteDAT` | Operator lifecycle | New operator created, deleted, name changed |
| `executeDAT` | Project lifecycle, frame events | Run-once setup, per-frame logic, save/load hooks |
All have a docked DAT with predefined callback functions. You only fill in the bodies of the ones you care about.
---
## chopExecuteDAT — Numeric Triggers
```python
ce = root.create(chopExecuteDAT, 'kick_handler')
ce.par.chop = '/project1/audio/out_kick' # source CHOP
ce.par.offtoon = True # fire when channel rises above 0
ce.par.ontooff = False
ce.par.whileon = False
ce.par.valuechange = False
```
In the docked callback DAT:
```python
def offToOn(channel, sampleIndex, val, prev):
"""Channel went from 0 to non-zero. Classic beat trigger."""
op('/project1/strobe').par.flash.pulse()
op('/project1/scene').par.index = (op('/project1/scene').par.index + 1) % 8
return
def onToOff(channel, sampleIndex, val, prev):
"""Channel went from non-zero to 0."""
return
def whileOn(channel, sampleIndex, val, prev):
"""Fires every frame while channel is non-zero. Use sparingly."""
return
def valueChange(channel, sampleIndex, val, prev):
"""Fires every frame the value changes (continuous). Heavy."""
return
```
`channel` is a `Channel` object — `.name`, `.owner`, `.vals[]`. Use `channel.name == 'chan1'` to filter.
**Threshold-based custom triggers:** wire the source CHOP through a `triggerCHOP` first to get clean 0/1 pulses, then watch with `offtoon`.
---
## datExecuteDAT — Table/Text Changes
```python
de = root.create(datExecuteDAT, 'api_response')
de.par.dat = '/project1/api/web1' # source DAT
de.par.tablechange = True # any cell change
de.par.cellchange = False
de.par.rowchange = False
de.par.colchange = False
```
```python
def onTableChange(dat):
"""Whole table changed (including text DAT content updates)."""
if dat.numRows == 0:
return
# If it's a webDAT response, parse JSON
import json
try:
data = json.loads(dat.text)
except json.JSONDecodeError:
debug(f'Bad JSON: {dat.text[:100]}')
return
# Write to a CHOP
op('/project1/api_value').par.value0 = float(data.get('count', 0))
return
def onCellChange(dat, cells, prev):
"""Specific cells changed."""
for cell in cells:
# cell.row, cell.col, cell.val
pass
return
```
`debug()` prints to the textport — readable via `td_read_textport`.
---
## parameterExecuteDAT — Param Changes & Pulse
```python
pe = root.create(parameterExecuteDAT, 'comp_params')
pe.par.op = '/project1/my_component' # COMP whose params to watch
pe.par.parameters = '*' # or specific names like 'Intensity Reset'
pe.par.valuechange = True
pe.par.pulse = True
```
```python
def onValueChange(par, prev):
"""par is a Par object. par.name, par.eval(), par.owner."""
if par.name == 'Intensity':
op('/project1/bloom').par.threshold = par.eval()
return
def onPulse(par):
"""Pulse param was triggered."""
if par.name == 'Reset':
op('/project1/scene').par.index = 0
op('/project1/audio_player').par.cuepoint = 0
op('/project1/audio_player').par.cuepulse.pulse()
return
def onExpressionChange(par, val, prev):
"""User changed the expression on a param."""
return
def onExportChange(par, val, prev):
"""Export source changed."""
return
def onModeChange(par, val, prev):
"""Param mode changed (CONSTANT / EXPRESSION / EXPORT / etc)."""
return
```
---
## panelExecuteDAT — UI Events
For interactive control surfaces. See `panel-ui.md` for the full panel COMP context.
```python
pe = root.create(panelExecuteDAT, 'btn_handler')
pe.par.panel = '/project1/play_btn'
pe.par.click = True # mouse click events
pe.par.value = True # state changes (toggle)
pe.par.lockedchange = False
```
```python
def onOffToOn(panelValue):
"""Panel value rose to 1 (button pressed, slider crossed threshold)."""
op('/project1/scene_timer').par.start.pulse()
return
def onOnToOff(panelValue):
"""Panel value dropped to 0."""
return
def onValueChange(panelValue):
"""Continuous: every frame the value changes."""
val = panelValue.eval()
op('/project1/master').par.opacity = val
return
def onClick(panelValue):
"""Discrete click event, fires once per click."""
return
```
`panelValue` is a `Par` object on the panel COMP.
---
## opExecuteDAT — Operator Lifecycle
Watches creation/deletion/renaming of operators in a parent COMP.
```python
oe = root.create(opExecuteDAT, 'lifecycle')
oe.par.op = '/project1'
oe.par.create = True
oe.par.destroy = True
oe.par.namechange = True
oe.par.flagchange = False
```
```python
def onCreate(opCreated):
"""A new operator was created. Useful for auto-applying conventions."""
if opCreated.OPType == 'glslTOP':
# Always wrap with a null
n = opCreated.parent().create(nullTOP, opCreated.name + '_out')
n.inputConnectors[0].connect(opCreated)
return
def onDestroy(opDestroyed):
"""Operator was deleted. opDestroyed.path is still valid for one frame."""
return
def onNameChange(opChanged):
"""Operator was renamed."""
return
```
Useful for dev-time scaffolding (auto-create downstream nullTOPs, auto-name conventions). Disable in production projects to avoid surprise side effects.
---
## executeDAT — Project Lifecycle & Per-Frame
The catch-all. Gets you hooks into project start, save, load, frame-start, frame-end.
```python
exec_dat = root.create(executeDAT, 'lifecycle')
exec_dat.par.start = True
exec_dat.par.create = True
exec_dat.par.framestart = True
exec_dat.par.frameend = False
```
```python
def onStart():
"""Project just started cooking. Run once."""
op('/project1/scene').par.index = 0
debug('Project started')
return
def onCreate():
"""Component was just created (only fires for component executeDATs, not project root)."""
return
def onFrameStart(frame):
"""Per-frame, BEFORE network cooks. Heavy logic here = bottleneck."""
return
def onFrameEnd(frame):
"""Per-frame, AFTER network cooks. Use for capture, recording, post-network logic."""
return
def onPlayStateChange(playing):
"""Project play/pause toggled."""
return
def onProjectPreSave():
"""Right before saving the .toe file."""
return
def onProjectPostSave():
return
```
Heavy per-frame logic in `onFrameStart` is one of the top performance regressions in TD projects. Use CHOPs for per-frame computation, scripts for events.
---
## Pattern: Triggering an Animation Sequence on Beat
```python
# Source: a kick trigger CHOP
# Goal: on each kick, run a 1.5s scale pulse + color flash
# Setup (create once)
animator = root.create(timerCHOP, 'pulse_anim')
animator.par.length = 1.5
animator.par.cycle = False
# Param expressions on visual targets:
op('logo').par.sx.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sx.mode = ParMode.EXPRESSION
op('logo').par.sy.expr = "1.0 + (1 - op('pulse_anim')['timer_fraction']) * 0.3"
op('logo').par.sy.mode = ParMode.EXPRESSION
# In a chopExecuteDAT watching the kick CHOP:
def offToOn(channel, sampleIndex, val, prev):
op('pulse_anim').par.start.pulse()
return
```
---
## Pattern: Live Editing a CHOP from API Data
```python
# webDAT polls an API every 5 seconds
# datExecuteDAT parses the response and writes to a constantCHOP
def onTableChange(dat):
import json
try:
data = json.loads(dat.text)
except:
return
target = op('/project1/external_state')
target.par.name0 = 'temperature'
target.par.value0 = float(data['temp_c'])
target.par.name1 = 'humidity'
target.par.value1 = float(data['humidity'])
return
```
Visuals just reference `op('external_state')['temperature']` — they update live.
---
## Pattern: Self-Cleaning Network
```python
# An opExecuteDAT watching for orphaned helper ops, deleting them after their parent disappears
def onDestroy(opDestroyed):
parent_name = opDestroyed.name
helper = op(f'/project1/{parent_name}_helper')
if helper:
helper.destroy()
return
```
---
## Pitfalls
1. **Callbacks crash silently** — exceptions print to the textport but don't show up in the UI. Always `td_clear_textport` before debugging, then `td_read_textport` after.
2. **`debug()` vs `print()`** — both write to textport, but `debug()` includes the file/line of the calling DAT. Prefer `debug()` for scripts.
3. **`val` is the new value, `prev` is old** — easy to swap. Always: `def offToOn(channel, sampleIndex, val, prev)`. Check parameter order in TD docs if confused.
4. **`whileOn` and `valueChange` are per-frame** — heavy. Avoid unless absolutely needed. Drive via expressions instead.
5. **Callbacks don't run during cooking-paused state** — if the parent COMP has `allowCooking=False`, callbacks freeze. Useful for "disable me" toggles.
6. **`par` vs `panelValue`** — parameterExecuteDAT gives `par` (a Par object), panelExecuteDAT gives `panelValue` (also a Par-like object). Both have `.name` and `.eval()` but their context differs.
7. **`opExecuteDAT` fires for itself** — when you create an opExecuteDAT, it can fire `onCreate` for itself if `par.create=True` and parent matches. Filter by `if opCreated == me: return`.
8. **Reload behavior** — when reloading an extension (`td_reinit_extension`), all callback DATs reset their internal state. Module-level vars are lost. Persist state in tableDATs or the docked DAT itself, not in module globals.
9. **Cooking dependencies** — if a callback writes to an op that's upstream of the callback's source, you get a cooking loop. TD warns about it but doesn't always block. Keep dataflow one-directional.
10. **Active flag** — every Execute DAT has `par.active`. False = silent. Easy to toggle for testing without deleting wiring.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| Beat trigger | `chopExecuteDAT.par.offtoon=True` watching a `triggerCHOP` |
| API response handler | `datExecuteDAT.par.tablechange=True` watching a `webDAT` |
| Custom button → action | `parameterExecuteDAT.par.pulse=True` watching a custom pulse param |
| Slider → continuous param | `panelExecuteDAT.par.value=True` watching a `sliderCOMP` |
| Run-once setup | `executeDAT.par.start=True` with logic in `onStart()` |
| Per-frame metrics | `executeDAT.par.frameend=True` recording values to a CHOP |
| Auto-name new ops | `opExecuteDAT.par.create=True` enforcing naming conventions |
@@ -0,0 +1,322 @@
# External Data Reference
Network and device I/O — HTTP requests, WebSockets, MQTT, Serial, TCP, UDP. For MIDI/OSC specifically see `midi-osc.md`.
Common production needs:
- API polling / webhook ingestion
- Real-time data streams (sensors, market data, chat)
- IoT device control (Arduino, ESP32, smart lights)
- Inter-application messaging
- Hosting a tiny TD-side HTTP server for remote control
---
## Web DAT — HTTP Requests
```python
web = root.create(webDAT, 'api_call')
web.par.url = 'https://api.example.com/v1/status'
web.par.fetchmethod = 'get' # 'get' | 'post' | 'put' | 'delete'
web.par.format = 'auto' # 'auto' | 'text' | 'json'
web.par.timeout = 5.0
```
**Triggering a request:**
`webDAT` does NOT auto-fetch on cook. Trigger explicitly:
```python
web.par.fetch.pulse()
```
Or via expression on a CHOP value-change (chopExecuteDAT — see `dat-scripting.md`).
**Authentication headers:**
Use `webclientDAT` (more flexible) or set `webDAT` headers via the headers DAT:
```python
web_headers = root.create(tableDAT, 'headers')
web_headers.appendRow(['Authorization', 'Bearer YOUR_TOKEN'])
web_headers.appendRow(['Accept', 'application/json'])
web.par.headers = web_headers.path
```
**Parsing JSON response:**
```python
import json
def onTableChange(dat):
response = dat.text # raw response body
data = json.loads(response)
# Update a tableDAT or store in a constantCHOP for downstream use
op('/project1/api_status').par.value0 = data['count']
return
```
Wire this in a `datExecuteDAT` watching the webDAT.
**Polling pattern:**
```python
# timerCHOP fires every N seconds
timer = root.create(timerCHOP, 'poll_timer')
timer.par.length = 5.0
timer.par.cycle = True
# chopExecuteDAT on the timer's 'cycles' channel pulses the webDAT
def offToOn(channel, sampleIndex, val, prev):
op('/project1/api_call').par.fetch.pulse()
return
```
---
## Web Client DAT — More Robust HTTP
`webclientDAT` is the modern replacement for `webDAT` — supports streaming responses, chunked transfer, custom auth.
```python
client = root.create(webclientDAT, 'api')
client.par.method = 'POST'
client.par.url = 'https://api.example.com/events'
client.par.uploadtype = 'json'
client.par.uploaddata = '{"event": "scene_change", "scene": 3}'
client.par.request.pulse()
```
Output goes to its child `webclient1_response` DAT. Use a `datExecuteDAT` to react.
---
## Web Server DAT — TD as HTTP Server
Hosts a tiny HTTP server inside TD. Useful for:
- Status/health endpoints
- Remote control from a phone or another machine
- Webhook receivers from external services
```python
server = root.create(webserverDAT, 'control_server')
server.par.port = 8080
server.par.active = True
# Define handler in the docked callback DAT
```
In the auto-created `webserver1_callbacks` DAT:
```python
def onHTTPRequest(webServerDAT, request, response):
path = request['uri']
if path == '/status':
response['statusCode'] = 200
response['data'] = '{"fps": 60, "scene": "active"}'
elif path == '/scene':
idx = int(request['args'].get('index', 0))
op('/project1/scene_switch').par.index = idx
response['statusCode'] = 200
response['data'] = 'OK'
else:
response['statusCode'] = 404
response['data'] = 'Not Found'
return response
```
Test from terminal: `curl http://localhost:8080/status`.
**Security:** No auth by default. Bind to localhost only or add a token check in the callback. Never expose to the public internet without auth.
---
## WebSocket DAT — Bidirectional Real-Time
For low-latency bidirectional streams (chat, live data feeds, controllers).
### Client
```python
ws = root.create(websocketDAT, 'ws_client')
ws.par.netaddress = 'wss://api.example.com/socket'
ws.par.active = True
```
In the docked callbacks DAT:
```python
def onConnect(dat):
dat.sendText('{"action": "subscribe", "channel": "ticks"}')
return
def onReceiveText(dat, rowIndex, message):
# message is a string; parse JSON, dispatch to ops
import json
data = json.loads(message)
op('/project1/price_chop').par.value0 = data['price']
return
def onDisconnect(dat):
# Optionally schedule a reconnect
return
```
### Server
```python
ws = root.create(websocketDAT, 'ws_server')
ws.par.mode = 'server'
ws.par.port = 9001
ws.par.active = True
```
Same callback structure with an additional `clientID` arg.
---
## MQTT — Pub/Sub for IoT
```python
mqtt = root.create(mqttClientDAT, 'iot')
mqtt.par.brokeraddress = 'broker.hivemq.com'
mqtt.par.brokerport = 1883
mqtt.par.clientid = 'td_install_01'
mqtt.par.connect.pulse()
# Subscribe in callbacks DAT:
def onConnect(dat):
dat.subscribe('home/lights/+', qos=1)
return
def onReceive(dat, topic, payload, qos, retained, dup):
# payload is bytes — decode if JSON
msg = payload.decode('utf-8')
# Dispatch by topic
return
# Publish from anywhere:
op('iot').publish('show/scene', 'sunset', qos=0, retain=False)
```
For Mosquitto / HiveMQ self-hosted brokers use the same setup with `tcp://192.168.x.x` and your local port.
---
## Serial DAT — Arduino, USB Devices
```python
serial = root.create(serialDAT, 'arduino')
serial.par.port = '/dev/cu.usbmodem14101' # macOS — check Arduino IDE
# Windows: 'COM3', 'COM4', etc.
serial.par.baudrate = 115200
serial.par.active = True
```
In callbacks:
```python
def onReceive(dat, rowIndex, line):
# Each newline-terminated line from Arduino arrives here
parts = line.split(',')
op('/project1/sensors').par.value0 = float(parts[0])
op('/project1/sensors').par.value1 = float(parts[1])
return
```
Send to Arduino:
```python
op('arduino').send('LED_ON\n')
```
---
## TCP/IP DAT — Custom Protocols
For talking to non-HTTP servers (game servers, custom protocols, legacy systems).
```python
tcp = root.create(tcpipDAT, 'show_control')
tcp.par.netaddress = '192.168.1.50'
tcp.par.port = 7000
tcp.par.protocol = 'tcp' # 'tcp' | 'udp'
tcp.par.active = True
```
Send / receive via callbacks similar to websocketDAT.
For UDP-only (fire-and-forget, no connection), use `udpoutDAT` + `udpinDAT` — simpler but unreliable across networks.
---
## Common Patterns
### REST API → Visual
```
timerCHOP (5s loop)
→ chopExecuteDAT (pulse webDAT.par.fetch on cycle)
→ webDAT (returns JSON)
→ datExecuteDAT (parse, write to constantCHOP)
→ CHOP drives glsl uniform → visuals
```
### Webhook receiver
```
webserverDAT (port 8080, /webhook endpoint)
→ callback writes to a tableDAT log + triggers a scene change
```
### Real-time stock/crypto ticker
```
websocketDAT (subscribe to feed)
→ onReceiveText callback parses JSON
→ writes to constantCHOP
→ drives bar chart / typography animation
```
### IoT-controlled installation
```
MQTT → callback dispatches by topic
→ /lights/main → constantCHOP drives lighting render
→ /audio/volume → mathCHOP for master fader
```
### Two-way phone control
```
WebSocket server in TD
→ simple HTML page on phone connects, sends slider values
→ callback writes to ops
→ TD pushes status back via dat.sendText() to phone UI
```
---
## Pitfalls
1. **`webDAT` doesn't auto-fetch** — must explicitly pulse `par.fetch`. Easy to forget.
2. **Blocking on slow APIs**`webDAT` runs on the cook thread. A 30s API call freezes TD for 30s. Use `webclientDAT` (async) for anything potentially slow.
3. **WebSocket reconnection** — TD does NOT auto-reconnect on disconnect. Implement backoff in `onDisconnect`.
4. **Serial port permissions on macOS** — TD needs Full Disk Access OR the port needs to be unlocked via `sudo chmod 666 /dev/cu.usbmodem...` per session.
5. **MQTT broker connection state**`mqttClientDAT` may show `connected=true` but messages don't flow if QoS is wrong or topic ACL blocks. Check broker logs.
6. **JSON parse errors crash callbacks silently** — wrap parses in try/except and log to textport. Otherwise the callback just stops firing.
7. **Firewall on Windows** — first time `webserverDAT` binds, Windows pops a firewall dialog. Approve it or the server is unreachable.
8. **CORS**`webserverDAT` doesn't add CORS headers by default. If serving a webapp from a different origin, add `Access-Control-Allow-Origin: *` in the response.
9. **Polling vs push** — polling burns API quota. Always prefer WebSocket / webhook / MQTT for high-frequency data.
10. **Floating-point parsing** — sensor data over Serial often comes as strings. `float()` will crash on `'\n'` or `'NaN'`. Validate before converting.
---
## Quick Recipes
| Goal | Op chain |
|---|---|
| Periodic API fetch | `timerCHOP``chopExecuteDAT` pulses → `webDAT``datExecuteDAT` parses |
| Webhook receiver | `webserverDAT` (port + path), callback writes to ops |
| Real-time stream | `websocketDAT` client → onReceiveText → CHOP/DAT |
| Arduino sensor → visual | `serialDAT` → callback → `constantCHOP` → expression on visual op |
| TD ↔ phone control | `websocketDAT` server + simple HTML page on phone |
| MQTT IoT integration | `mqttClientDAT` subscribe → callback dispatches by topic |
@@ -0,0 +1,121 @@
# Geometry COMP Reference
## Creating Geometry COMPs
```python
geo = root.create(geometryCOMP, 'geo1')
# Remove default torus
for c in list(geo.children):
if c.valid: c.destroy()
# Build your shape inside
```
## Correct Pattern (shapes inside geo)
```python
# Create shape INSIDE the geo COMP
box = geo.create(boxSOP, 'cube')
box.par.sizex = 1.5; box.par.sizey = 1.5; box.par.sizez = 1.5
# For POP-based geometry (TD 099), POPs must be inside:
sph = geo.create(spherePOP, 'shape')
out1 = geo.create(outPOP, 'out1')
out1.inputConnectors[0].connect(sph.outputConnectors[0])
```
## DO NOT: Common Mistakes
```python
# BAD: Don't create geometry at parent level and wire into COMP
box = root.create(boxPOP, 'box1') # ← outside geo, won't render
# BAD: Don't reference parent operators from inside COMP
choptopop1.par.chop = '../null1' # ← hidden dependency, breaks on move
```
## Instancing
```python
geo.par.instancing = True
geo.par.instanceop = 'sopto1' # relative path to CHOP/SOP with instance data
geo.par.instancetx = 'tx'
geo.par.instancety = 'ty'
geo.par.instancetz = 'tz'
```
### Instance Attribute Names by OP Type
| OP Type | Attribute Names |
|---------|-----------------|
| CHOP | Channel names: `tx`, `ty`, `tz` |
| SOP/POP | `P(0)`, `P(1)`, `P(2)` for position |
| DAT | Column header names from first row |
| TOP | `r`, `g`, `b`, `a` |
### Mixed Data Sources
```python
geo.par.instanceop = 'pos_chop' # Position from CHOP
geo.par.instancetx = 'tx'
geo.par.instancecolorop = 'color_top' # Color from TOP
geo.par.instancecolorr = 'r'
```
## Rendering Setup
```python
# Camera
cam = root.create(cameraCOMP, 'cam1')
cam.par.tx = 0; cam.par.ty = 0; cam.par.tz = 4
# Render TOP
render = root.create(renderTOP, 'render1')
render.par.outputresolution = 'custom'
render.par.resolutionw = 1280; render.par.resolutionh = 720
render.par.camera = cam.path
render.par.geometry = geo.path # accepts path string
```
## POPs vs SOPs for Rendering
In TD 099, `geometryCOMP` renders **POPs** but NOT SOPs. A `boxSOP` inside a geometry COMP is invisible — no errors.
```python
# WRONG — SOPs don't render (invisible, no errors)
box = geo.create(boxSOP, 'cube') # ✗ invisible
# CORRECT — POPs render
box = geo.create(boxPOP, 'cube') # ✓ visible
```
| SOP | POP | Notes |
|-----|-----|-------|
| `boxSOP` | `boxPOP` | `sizex/y/z`, `surftype` |
| `sphereSOP` | `spherePOP` | `radx/y/z`, `freq`, `type` (geodesic/grid/sharedpoles/tetrahedron) |
| `torusSOP` | `torusPOP` | TD auto-creates in new geo COMPs |
| `circleSOP` | `circlePOP` | |
| `gridSOP` | `gridPOP` | |
| `tubeSOP` | `tubePOP` | |
New geometry COMPs auto-create: `in1` (inPOP), `out1` (outPOP), `torus1` (torusPOP). Always clean before building.
## Morphing Between Shapes (switchPOP)
```python
sw = geo.create(switchPOP, 'shape_switch')
sw.par.index.expr = 'int(absTime.seconds / 3) % 4'
sw.inputConnectors[0].connect(tetra.outputConnectors[0]) # shape 0
sw.inputConnectors[1].connect(box.outputConnectors[0]) # shape 1
sw.inputConnectors[2].connect(octa.outputConnectors[0]) # shape 2
sw.inputConnectors[3].connect(sphere.outputConnectors[0]) # shape 3
out = geo.create(outPOP, 'out1')
out.inputConnectors[0].connect(sw.outputConnectors[0])
```
`spherePOP.par.type` options: `geodesic`, `grid`, `sharedpoles`, `tetrahedron`. Use `tetrahedron` for platonic solid polyhedra.
## Misc
- `connect()` replaces existing connections — no need to disconnect first
- `project.name` returns the TOE filename, `project.folder` returns the directory
@@ -0,0 +1,151 @@
# GLSL Reference
## Uniforms
```
TouchDesigner GLSL
─────────────────────────────
vec0name = 'uTime' → uniform float uTime;
vec0valuex = 1.0 → uTime value
```
### Pass Time
```python
glsl_op.par.vec0name = 'uTime'
glsl_op.par.vec0valuex.mode = ParMode.EXPRESSION
glsl_op.par.vec0valuex.expr = 'absTime.seconds'
```
```glsl
uniform float uTime;
void main() { float t = uTime * 0.5; }
```
### Built-in Uniforms (TOP)
```glsl
// Output resolution (always available)
vec2 res = uTDOutputInfo.res.zw;
// Input texture (only when inputs connected)
vec2 inputRes = uTD2DInfos[0].res.zw;
vec4 color = texture(sTD2DInputs[0], vUV.st);
// UV coordinates
vUV.st // 0-1 texture coords
```
**IMPORTANT:** `uTD2DInfos` requires input textures. For standalone shaders use `uTDOutputInfo`.
## Built-in Utility Functions
```glsl
// Noise
float TDPerlinNoise(vec2/vec3/vec4 v);
float TDSimplexNoise(vec2/vec3/vec4 v);
// Color conversion
vec3 TDHSVToRGB(vec3 c);
vec3 TDRGBToHSV(vec3 c);
// Matrix transforms
mat4 TDTranslate(float x, float y, float z);
mat3 TDRotateX/Y/Z(float radians);
mat3 TDRotateOnAxis(float radians, vec3 axis);
mat3 TDScale(float x, float y, float z);
mat3 TDRotateToVector(vec3 forward, vec3 up);
mat3 TDCreateRotMatrix(vec3 from, vec3 to); // vectors must be normalized
// Resolution struct
struct TDTexInfo {
vec4 res; // (1/width, 1/height, width, height)
vec4 depth;
};
// Output (always use this — handles sRGB correctly)
fragColor = TDOutputSwizzle(color);
// Instancing (MAT only)
int TDInstanceID();
```
## glslTOP
Docked DATs created automatically:
- `glsl1_pixel` — Pixel shader
- `glsl1_compute` — Compute shader
- `glsl1_info` — Compile info
### Pixel Shader Template
```glsl
out vec4 fragColor;
void main() {
vec4 color = texture(sTD2DInputs[0], vUV.st);
fragColor = TDOutputSwizzle(color);
}
```
### Compute Shader Template
```glsl
layout (local_size_x = 8, local_size_y = 8) in;
void main() {
vec4 color = texelFetch(sTD2DInputs[0], ivec2(gl_GlobalInvocationID.xy), 0);
TDImageStoreOutput(0, gl_GlobalInvocationID, color);
}
```
### Update Shader
```python
op('/project1/glsl1_pixel').text = shader_code
op('/project1/glsl1').cook(force=True)
# Check errors:
print(op('/project1/glsl1_info').text)
```
## glslMAT
Docked DATs:
- `glslmat1_vertex` — Vertex shader (param: `vdat`)
- `glslmat1_pixel` — Pixel shader (param: `pdat`)
- `glslmat1_info` — Compile info
Note: MAT uses `vdat`/`pdat`, TOP uses `vertexdat`/`pixeldat`.
### Vertex Shader Template
```glsl
uniform float uTime;
void main() {
vec3 pos = TDPos();
pos.z += sin(pos.x * 3.0 + uTime) * 0.2;
vec4 worldSpacePos = TDDeform(pos);
gl_Position = TDWorldToProj(worldSpacePos);
}
```
## Bayer 8x8 Dither Matrix
Reusable ordered dither function for retro/print aesthetics:
```glsl
float bayer8(vec2 pos) {
int x = int(mod(pos.x, 8.0)), y = int(mod(pos.y, 8.0)), idx = x + y * 8;
int b[64] = int[64](
0,32,8,40,2,34,10,42,48,16,56,24,50,18,58,26,
12,44,4,36,14,46,6,38,60,28,52,20,62,30,54,22,
3,35,11,43,1,33,9,41,51,19,59,27,49,17,57,25,
15,47,7,39,13,45,5,37,63,31,55,23,61,29,53,21
);
return float(b[idx]) / 64.0;
}
```
## glslPOP / glsladvancedPOP / glslcopyPOP
All use compute shaders. Docked DATs follow naming convention:
- `glsl1_compute` / `glsladv1_compute`
- `glslcopy1_ptCompute` / `glslcopy1_vertCompute` / `glslcopy1_primCompute`
@@ -0,0 +1,131 @@
# Layout Compositor Reference
Patterns for building modular multi-panel grids — useful for HUD interfaces, data dashboards, and multi-source visual composites.
## Layout Approaches
| Approach | Best For | Notes |
|----------|----------|-------|
| `layoutTOP` | Fixed grid, quick setup | GPU, simple tiling |
| Container COMP + `overTOP` | Full control, mixed-size panels | More setup, very flexible |
| GLSL compositor | Procedural / BSP-style | Most powerful, more complex |
---
## layoutTOP
Built-in grid compositor — fastest path for uniform tile grids.
```python
layout = root.create(layoutTOP, 'layout1')
layout.par.resolutionw = 1920
layout.par.resolutionh = 1080
layout.par.cols = 3
layout.par.rows = 2
layout.par.gap = 4
```
Connect inputs (up to cols×rows):
```python
layout.inputConnectors[0].connect(op('panel_radar'))
layout.inputConnectors[1].connect(op('panel_wave'))
layout.inputConnectors[2].connect(op('panel_data'))
```
**Variable-width columns:** Not directly supported. Use overTOP approach for non-uniform grids.
---
## Container COMP Grid
Build each element as its own `containerCOMP`. Compose with `overTOP`:
```python
def create_panel(root, name, width, height, x=0, y=0):
panel = root.create(containerCOMP, name)
panel.par.w = width
panel.par.h = height
panel.viewer = True
return panel
# Composite with overTOP chain
over1 = root.create(overTOP, 'over1')
over1.inputConnectors[0].connect(panel_radar)
over1.inputConnectors[1].connect(panel_wave)
over1.par.topx2 = 0
over1.par.topy2 = 512
```
**Tip:** Use a `resolutionTOP` before each `overTOP` input if panels are different sizes.
---
## Panel Dividers (GLSL)
```glsl
out vec4 fragColor;
uniform vec2 uGridDivisions; // e.g. vec2(3, 2) for 3 cols, 2 rows
uniform float uLineWidth; // pixels
uniform vec4 uLineColor; // e.g. vec4(0.0, 1.0, 0.8, 0.6) for cyan
void main() {
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = vUV.st;
vec4 bg = texture(sTD2DInputs[0], uv);
float lineW = uLineWidth / res.x;
float lineH = uLineWidth / res.y;
float vDiv = 0.0;
for (float i = 1.0; i < uGridDivisions.x; i++) {
float x = i / uGridDivisions.x;
vDiv = max(vDiv, step(abs(uv.x - x), lineW));
}
float hDiv = 0.0;
for (float i = 1.0; i < uGridDivisions.y; i++) {
float y = i / uGridDivisions.y;
hDiv = max(hDiv, step(abs(uv.y - y), lineH));
}
float line = max(vDiv, hDiv);
vec4 result = mix(bg, uLineColor, line * uLineColor.a);
fragColor = TDOutputSwizzle(result);
}
```
---
## Element Library Pattern
Each visual element lives in its own `baseCOMP` as a reusable `.tox`:
### Standard Interface
```
inputs:
- in_audio (CHOP) — audio envelope / beat data
- in_data (CHOP) — optional data stream
- in_control (CHOP) — intensity, color, speed params
outputs:
- out_top (TOP) — rendered element
```
### Network Structure
```
/project1/
audio_bus/ ← all audio analysis (see audio-reactive.md)
elements/
elem_radar/ ← baseCOMP with out_top
elem_wave/
elem_data/
compositor/
layout1 ← layoutTOP or overTOP chain
dividers1 ← GLSL divider lines
postfx/ ← bloom → chrom → CRT stack (see postfx.md)
null_out ← final output
output/
windowCOMP ← full-screen output
```
**Key principle:** Elements don't know about each other. The compositor assembles them. Audio bus is referenced by all elements but lives separately.
@@ -0,0 +1,382 @@
# twozero MCP Tools Reference
36 tools from twozero MCP v2.774+ (April 2026).
All tools accept an optional `target_instance` param for multi-TD-instance scenarios.
## Execution & Scripting
### td_execute_python
Execute Python code inside TouchDesigner and return the result. Has full access to TD Python API (op, project, app, etc). Print statements and the last expression value are captured. Best for: wiring connections (inputConnectors), setting expressions (par.X.expr/mode), querying parameter names, and batch creation scripts (5+ operators). For creating 1-4 operators, prefer td_create_operator instead.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `code` | string | yes | Python code to execute in TouchDesigner |
## Network & Structure
### td_get_network
Get the operator network structure in TouchDesigner (TD) at a given path. Returns compact list: name OPType flags. First line is full path of queried op. Flags: ch:N=children count, !cook=allowCooking off, bypass, private=isPrivate, blocked:reason, "comment text". depth=0 (default) = current level only. depth=1 = one level of children (indented). To explore deeper, call again on a specific COMP path. System operators (/ui, /sys) are hidden by default.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Network path to inspect, e.g. '/' or '/project1' |
| `depth` | integer | no | How many levels deep to recurse. 0=current level only (recommended), 1=include direct children of COMPs |
| `includeSystem` | boolean | no | Include system operators (/ui, /sys). Default false. |
| `nodeXY` | boolean | no | Include nodeX,nodeY coordinates. Default false. |
### td_create_operator
Create a new operator (node) in TouchDesigner (TD). Preferred way to create operators — handles viewport positioning, viewer flag, and docked ops automatically. For batch creation (5+ ops), you may use td_execute_python with a script instead, but then call td_get_hints('construction') first for correct parameter names and layout rules. Supports all TD operator types: TOP, CHOP, SOP, DAT, COMP, MAT. If parent is omitted, creates in the currently open network at the user's viewport position. When building a container: first create baseCOMP (no parent), then create children with parent=compPath.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `type` | string | yes | Operator type, e.g. 'textDAT', 'constantCHOP', 'noiseTOP', 'transformTOP', 'baseCOMP' |
| `parent` | string | no | Path to the parent operator. If omitted, uses the currently open network in TD. |
| `name` | string | no | Name for the new operator (optional, TD auto-names if omitted) |
| `parameters` | object | no | Key-value pairs of parameters to set on the created operator |
### td_find_op
Find operators by name and/or type across the project. Returns TSV: path, OPType, flags. Flags: bypass, !cook, private, blocked:reason. Use td_search to search inside code/expressions; use td_find_op to find operators themselves.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | no | Substring to match in operator name (case-insensitive). E.g. 'noise' finds noise1, noise2, myNoise. |
| `type` | string | no | Substring to match in OPType (case-insensitive). E.g. 'noiseTOP', 'baseCOMP', 'CHOP'. Use exact type for precision or partial for broader matches. |
| `root` | string | no | Root operator path to search from. Default '/project1'. |
| `max_results` | number | no | Maximum results to return. Default 50. |
| `max_depth` | number | no | Max recursion depth from root. Default unlimited. |
| `detail` | `basic` / `summary` | no | Result detail level. 'basic' = name/path/type (fast). 'summary' = + connections, non-default pars, expressions. Default 'basic'. |
### td_search
Search for text across all code (DAT scripts), parameter expressions, and string parameter values in the TD project. Returns TSV: path, kind (code/expression/parameter/ref), line, text. JSON when context>0. Words are OR-matched. Use quotes for exact phrases: 'GetLogin "op('login')"'. Use count_only=true to quickly check if something is referenced without fetching full results.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `query` | string | yes | Search query. Multiple words = OR (any match). Wrap in quotes for exact phrase. Example: 'GetLogin getLogin' finds either. |
| `root` | string | no | Root operator path to search from. Default '/project1'. |
| `scope` | `all` / `code` / `editable` / `expressions` / `parameters` | no | What to search. 'code' = DAT scripts only (fast, ~0.05s). 'editable' = only editable code (skips inherited/ref DATs). 'expressions' = parameter expressions only. 'parameters' = string parameter values only. 'all' = everything (slow, ~1.5s due to parameter scan). Default 'all'. |
| `case_sensitive` | boolean | no | Case-sensitive matching. Default false. |
| `max_results` | number | no | Maximum results to return. Default 50. |
| `context` | number | no | Lines to show before/after each code match. Saves td_read_dat calls. Default 0. |
| `count_only` | boolean | no | Return only match count, not results. Fast existence check. |
| `max_depth` | number | no | Max recursion depth from root. Default unlimited. |
### td_navigate_to
Navigate the TouchDesigner Network Editor viewport to show a specific operator. Opens the operator's parent network and centers the view on it. Use this to show the user where a problem is, or to navigate to an operator before modifying it.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the operator to navigate to, e.g. '/project1/noise1' |
## Operator Inspection
### td_get_operator_info
Get information about a specific operator (node) in TouchDesigner (TD). detail='summary': connections, non-default pars, expressions, CHOP channels (compact). detail='full': all of the above PLUS every parameter with value/default/label.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Full path to the operator, e.g. '/project1/noise1' |
| `detail` | `summary` / `full` | no | Level of detail. 'summary' = connections, expressions, non-default pars, custom pars (pulse marked), CHOP channels. 'full' = summary + all parameters. Default 'full'. |
### td_get_operators_info
Get information about multiple operators in one call. Returns an array of operator info objects. Use instead of calling td_get_operator_info multiple times.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `paths` | array | yes | Array of full operator paths, e.g. ['/project1/null1', '/project1/null2'] |
| `detail` | `summary` / `full` | no | Level of detail. Default 'summary'. |
### td_get_par_info
Get parameter names and details for a TouchDesigner operator type. Without specific pars: returns compact list of all parameters with their names, types, and menu options. With pars: returns full details (help text, menu values, style) for specific parameters. Use this when you need to know exact parameter names before setting them.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `op_type` | string | yes | TD operator type name, e.g. 'noiseTOP', 'blurTOP', 'lfoCHOP', 'compositeTOP' |
| `pars` | array | no | Optional list of specific parameter names to get full details for |
## Parameter Setting
### td_set_operator_pars
Set parameters and flags on an operator in TouchDesigner (TD). Safer than td_execute_python for simple parameter changes. Can set values, toggle bypass/viewer, without writing Python code.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the operator |
| `parameters` | object | no | Key-value pairs of parameters to set |
| `bypass` | boolean | no | Set bypass state of the operator (not available on COMPs) |
| `viewer` | boolean | no | Set viewer state of the operator |
| `allowCooking` | boolean | no | Set cooking flag on a COMP. When False, internal network stops cooking (0 CPU). COMP-only. |
## Data Read/Write
### td_read_dat
Read the text content of a DAT operator in TouchDesigner (TD). Returns content with line numbers. Use to read scripts, extensions, GLSL shaders, table data.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the DAT operator |
| `start_line` | integer | no | Start line (1-based). Omit to read from beginning. |
| `end_line` | integer | no | End line (inclusive). Omit to read to end. |
### td_write_dat
Write or patch text content of a DAT operator in TouchDesigner (TD). Can do full replacement or StrReplace-style patching (old_text -> new_text). Use for editing scripts, extensions, shaders. Does NOT reinit extensions automatically.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the DAT operator |
| `text` | string | no | Full replacement text. Use this OR old_text+new_text, not both. |
| `old_text` | string | no | Text to find and replace (must be unique in the DAT) |
| `new_text` | string | no | Replacement text |
| `replace_all` | boolean | no | If true, replaces ALL occurrences of old_text (default: false, requires unique match) |
### td_read_chop
Read CHOP channel sample data. Returns channel values as arrays. Use when you need the actual sample values (animation curves, lookup tables, waveforms), not just the summary from td_get_operator_info.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the CHOP operator |
| `channels` | array | no | Channel names to read. Omit to read all channels. |
| `start` | integer | no | Start sample index (0-based). Omit to read from beginning. |
| `end` | integer | no | End sample index (inclusive). Omit to read to end. |
### td_read_textport
Read the last N lines from the TouchDesigner (TD) log/textport (console output). Use this to see errors, warnings and print output from TD.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `lines` | integer | no | Number of recent lines to return |
### td_clear_textport
Clear the MCP textport log buffer. Use this before starting a debug session or an edit-run-check loop to keep td_read_textport output focused and minimal.
No parameters (other than optional `target_instance`).
## Visual Capture
### td_get_screenshot
Get a screenshot of an operator's viewer in TouchDesigner (TD). Saves the image to a file and returns the file path. Use your file-reading tool to view the image. Shows what the operator looks like in its viewer (TOP output, CHOP waveform graph, SOP geometry, DAT table, parameter UI, etc). Use this to visually inspect any operator, or to generate images via TD for use in your project. TWO-STEP ASYNC USAGE: Step 1 — call with 'path' to start: returns {'status': 'pending', 'requestId': '...'}. Step 2 — call with 'request_id' to retrieve: returns {'file': '/tmp/.../opname_id.jpg'}. Then read the file to see the image. If step 2 still returns pending, make one other tool call then retry.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Full operator path to screenshot, e.g. '/project1/noise1'. Required for step 1. |
| `request_id` | string | no | Request ID from step 1 to retrieve the completed screenshot. |
| `max_size` | integer | no | Max pixel size for the longer side (default 512). Use 0 for original operator resolution (useful for pixel-accurate UI work). Higher values (e.g. 1024) for more detail. |
| `output_path` | string | no | Optional absolute path where the image should be saved (e.g. '/Users/me/project/render.png'). If omitted, saved to /tmp/pisang_mcp/screenshots/. Use absolute paths — TD's working directory may differ from the agent's. |
| `as_top` | boolean | no | If true, captures the operator directly as a TOP (bypasses the viewer renderer), preserving alpha/transparency. Only works for TOP operators — if the target is not a TOP, falls back to the viewer automatically. Use this when you need a clean PNG with alpha, e.g. to save a generated image for use in another project. |
| `format` | `auto` / `jpg` / `png` | no | Image format. 'auto' (default): JPEG for viewer mode, PNG for as_top=true. 'jpg': always JPEG (smaller). 'png': always PNG (lossless). |
### td_get_screenshots
Get screenshots of multiple operators in one batch. Saves images to files and returns file paths. Use your file-reading tool to view images. TWO-STEP ASYNC USAGE: Step 1 — call with 'paths' array to start: returns {'status': 'pending', 'batchId': '...', 'total': N}. Step 2 — call with 'batch_id' to retrieve: returns {'files': [{op, file}, ...]}. Then read the files to see the images. If still processing returns {'status': 'pending', 'ready': K, 'total': N}.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `paths` | array | no | List of full operator paths to screenshot. Required for step 1. |
| `batch_id` | string | no | Batch ID from step 1 to retrieve completed screenshots. |
| `max_size` | integer | no | Max pixel size for longer side (default 512). Use 0 for original resolution. |
| `as_top` | boolean | no | If true, captures TOP operators directly (preserves alpha). Non-TOP operators fall back to viewer. |
| `output_dir` | string | no | Optional absolute path to a directory. Each screenshot saved as <opname>.jpg or .png inside it and kept on disk. |
| `format` | `auto` / `jpg` / `png` | no | Image format. 'auto' (default): JPEG for viewer mode, PNG for as_top=true. 'jpg': always JPEG (smaller). 'png': always PNG (lossless). |
### td_get_screen_screenshot
Capture a screenshot of the actual screen via TD's screenGrabTOP. Saves the image to a file and returns the file path. Use your file-reading tool to view the image. Unlike td_get_screenshot (operator viewer), this shows what the user literally sees on their monitor — TD windows, UI panels, everything. Use when simulating mouse/keyboard input to verify what happened on screen. Workflow: td_get_screen_screenshot → read file → td_input_execute → wait idle → td_get_screen_screenshot again. TWO-STEP ASYNC: Step 1 — call without request_id: returns {'status':'pending','requestId':'...'}. Step 2 — call with request_id: returns {'file': '/tmp/.../screen_id.jpg', 'info': '...metadata...'}. Then read the file to see the image. The requestId also stays usable with td_screen_point_to_global for later coordinate lookup. crop_x/y/w/h are in ACTUAL SCREEN PIXELS (not image pixels). Crops exceeding screen bounds are auto-clamped. SMART DEFAULTS: max_size is auto when omitted — 1920 for full screen (good overview), max(crop_w,crop_h) for cropped (guarantees 1:1 scale). At 1:1 scale: screen_coord = crop_origin + image_pixel. Otherwise use the formula from metadata.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | no | Request ID from step 1 to retrieve the completed screenshot. |
| `max_size` | integer | no | Max pixel size for the longer side. Auto when omitted: 1920 for full screen, max(crop_w,crop_h) for cropped (1:1). Set explicitly to override. |
| `crop_x` | integer | no | Left edge in screen pixels. |
| `crop_y` | integer | no | Top edge in screen pixels (y=0 at top of screen). |
| `crop_w` | integer | no | Width in pixels. |
| `crop_h` | integer | no | Height in pixels. |
| `display` | integer | no | Screen index (default 0 = primary display). |
## Context & Focus
### td_get_focus
Get the current user focus in TouchDesigner (TD): which network is open, selected operators, current operator, and rollover (what is under the mouse cursor). IMPORTANT: when the user says 'this operator' or 'вот этот', they mean the SELECTED/CURRENT operator, NOT the rollover. Rollover is just incidental mouse position and should be ignored for intent. Pass screenshots=true to immediately start a screenshot batch for all selected operators — response includes a 'screenshots' field with batchId; retrieve with td_get_screenshots(batch_id=...).
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `screenshots` | boolean | no | If true, start a screenshot batch for all selected operators. Retrieve with td_get_screenshots(batch_id=...). |
| `max_size` | integer | no | Max screenshot size when screenshots=true (default 512). |
| `as_top` | boolean | no | Passed to the screenshot batch when screenshots=true. |
### td_get_errors
Find errors and warnings in TouchDesigner (TD) operators. Checks operator errors, warnings, AND broken parameter expressions (missing channels, bad references, etc). Also includes recent script errors from the log (tracebacks), grouped and deduplicated — e.g. 1000 identical mouse-move errors shown as ×1000 with one entry. If path is given, checks that operator and its children. If no path, checks the currently open network. Use '/' for entire project. Use when user says something is broken, has errors, red nodes, горит ошибка, etc. TIP: call td_clear_textport before reproducing an error to keep log focused. TIP: combine with td_get_perf when user says 'тупит/лагает' to check both errors and performance.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Path to check. If omitted, checks the current network. Use '/' to scan entire project. |
| `recursive` | boolean | no | Check children recursively (default true) |
| `include_log` | boolean | no | Include recent script errors from log, grouped by unique signature (default true). Use td_clear_textport before reproducing an error to keep results focused. |
### td_get_perf
Get performance data from TouchDesigner (TD). Returns TSV: header with fps/budget/memory summary, then slowest operators sorted by cook time. Columns: path, OPType, cpu/cook(ms), gpu/cook(ms), cpu/s, gpu/s, rate, flags. Use when user reports lag, low FPS, slow performance, тупит, тормозит.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | no | Path to profile. If omitted, profiles the current network. Use '/' for entire project. |
| `top` | integer | no | Number of slowest operators to return |
## Documentation
### td_get_docs
Get comprehensive documentation on a TouchDesigner topic. Unlike td_get_hints (compact tips), this returns in-depth reference material. Call without arguments to see available topics with descriptions. Call with a topic name to get the full documentation.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `topic` | string | no | Topic to get docs for. Omit to list available topics. |
### td_get_hints
Get TouchDesigner tips and common patterns for a topic. Call this BEFORE creating operators or writing TD Python code to learn correct parameter names, expressions, and idiomatic approaches. Available topics: animation, noise, connections, parameters, scripting, construction, ui_analysis, panel_layout, screenshots, input_simulation, undo. IMPORTANT: always call with topic='construction' before building multi-operator setups to get correct TOP/CHOP parameter names, compositeTOP input ordering, and layout guidelines. IMPORTANT: always call with topic='input_simulation' before using td_input_execute to learn focus recovery, coordinate systems, and testing workflow.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `topic` | string | yes | Topic to get hints for. Available: 'animation', 'noise', 'connections', 'parameters', 'scripting', 'construction', 'ui_analysis', 'panel_layout', 'screenshots', 'input_simulation', 'undo', 'networking', 'all' |
### td_agents_md
Read, write, or update the agents_md documentation inside a COMP container. agents_md is a Markdown textDAT describing the container's purpose, structure, and conventions. action='read': returns content + staleness check (compares documented children vs live state). action='update': refreshes auto-generated sections (children list, connections) from live state, preserves human-written sections. action='write': sets full content, creates the DAT if missing.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the COMP container |
| `action` | `read` / `update` / `write` | yes | read=get content+staleness, update=refresh auto sections, write=set content |
| `content` | string | no | Markdown content (only for action='write') |
## Input Automation
### td_input_execute
Send a sequence of mouse/keyboard commands to TouchDesigner. Commands execute sequentially with smooth bezier movement. Returns immediately — poll td_input_status() until status='idle' before proceeding. Command types: 'focus' — bring TD to foreground. 'move' — smooth mouse move: {type,x,y,duration,easing}. 'click' — click: {type,x,y,button,hold,duration,easing}. hold=seconds to hold down. duration=smooth move before click. 'dblclick' — double click: {type,x,y,duration}. 'mousedown'/'mouseup' — {type,x,y,button}. 'key' — keystroke: {type,keys} e.g. 'ctrl+z','tab','escape','shift+f5'. Requires Accessibility permission on Mac. 'type' — human-like typing: {type,text,wpm,variance} — layout-independent Unicode, variable timing. 'wait' — pause: {type,duration}. 'scroll' — {type,x,y,dx,dy,steps} — human-like scroll: moves mouse to (x,y) first, then sends dy (vertical, +up) and dx (horizontal, +right) as multiple ticks with natural timing. steps=4 by default. Mouse commands may include coord_space='logical' (default) or coord_space='physical'. On macOS, 'physical' means actual screen pixels from td_get_screen_screenshot and is converted to CGEvent logical coords automatically. Top-level coord_space applies to commands that do not override it. on_error: 'stop' (default) clears queue on error; 'continue' skips failed command. IMPORTANT: call td_get_hints('input_simulation') before first use to learn focus recovery, coordinate systems, and testing workflow.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `commands` | array | yes | List of command dicts to execute in sequence. |
| `coord_space` | `logical` / `physical` | no | Default coordinate space for mouse commands that do not specify their own coord_space. 'logical' uses CGEvent coords directly. 'physical' uses actual screen pixels from td_get_screen_screenshot and is auto-converted on macOS. |
| `on_error` | `stop` / `continue` | no | What to do on error. Default 'stop'. |
### td_input_status
Get current status of the td_input command queue. Poll this after td_input_execute until status='idle'. Returns: status ('idle'/'running'), current command, queue_remaining, last error.
No parameters (other than optional `target_instance`).
### td_input_clear
Clear the td_input command queue and stop current execution immediately.
No parameters (other than optional `target_instance`).
### td_op_screen_rect
Get the screen coordinates of an operator node in the network editor. Returns {x,y,w,h,cx,cy} where cx,cy is the center for clicking. Use this to find where to click on a specific operator. Only works if the operator's parent network is currently open in a network editor pane.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Full path to the operator, e.g. '/project1/myComp/noise1' |
### td_click_screen_point
Resolve a point inside a previous td_get_screen_screenshot result and click it. Pass the screenshot request_id plus either normalized u/v or image_x/image_y. Queues a td_input click using physical screen coordinates, so it works directly with screenshot-derived points. Use duration/easing to control the cursor travel before the click.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | yes | Request ID originally returned by td_get_screen_screenshot. |
| `u` | number | no | Normalized horizontal position inside the screenshot region (0=left, 1=right). Use with v. |
| `v` | number | no | Normalized vertical position inside the screenshot region (0=top, 1=bottom). Use with u. |
| `image_x` | number | no | Horizontal pixel coordinate inside the returned screenshot image. Use with image_y. |
| `image_y` | number | no | Vertical pixel coordinate inside the returned screenshot image. Use with image_x. |
| `button` | `left` / `right` / `middle` | no | Mouse button to click. Default left. |
| `hold` | number | no | Seconds to hold the mouse button down before releasing. |
| `duration` | number | no | Seconds for the cursor to travel to the target before clicking. |
| `easing` | `linear` / `ease-in` / `ease-out` / `ease-in-out` | no | Cursor movement easing for the pre-click travel. |
| `focus` | boolean | no | If true, bring TD to the front before clicking and wait briefly for focus to settle. |
### td_screen_point_to_global
Convert a point inside a previous td_get_screen_screenshot result into absolute screen coordinates. Pass the screenshot request_id plus either normalized u/v (0..1 inside that screenshot region) or image_x/image_y in returned image pixels. Returns absolute physical screen coordinates, logical coordinates, and a ready-to-use td_input_execute payload. Metadata is kept for the most recent screen screenshots so multiple agents can resolve points later by request_id.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `request_id` | string | yes | Request ID originally returned by td_get_screen_screenshot. |
| `u` | number | no | Normalized horizontal position inside the screenshot region (0=left, 1=right). Use with v. |
| `v` | number | no | Normalized vertical position inside the screenshot region (0=top, 1=bottom). Use with u. |
| `image_x` | number | no | Horizontal pixel coordinate inside the returned screenshot image. Use with image_y. |
| `image_y` | number | no | Vertical pixel coordinate inside the returned screenshot image. Use with image_x. |
## System
### td_list_instances
List all running TouchDesigner (TD) instances with active MCP servers. Returns port, project name, PID, and instanceId for each instance. Call this at the start of every conversation to discover available instances and choose which one to work with. instanceId is stable for the lifetime of a TD process and is used as target_instance in all other tool calls.
No parameters (other than optional `target_instance`).
### td_project_quit
Save and/or close the current TouchDesigner (TD) project. Can save before closing. Reports if project has unsaved changes. To close a different instance, pass target_instance=instanceId. WARNING: this will shut down the MCP server on that instance.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `save` | boolean | no | Save the project before closing. Default true. |
| `force` | boolean | no | Force close without save dialog. Default false. |
### td_reinit_extension
Reinitialize an extension on a COMP in TouchDesigner (TD). Call this AFTER finishing all code edits via td_write_dat to apply changes. Do NOT call after every small edit - batch your changes first.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `path` | string | yes | Path to the COMP with the extension |
### td_dev_log
Read the last N entries from the MCP dev log. Only available when Devmode is enabled. Shows request/response history.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `count` | integer | no | Number of recent log entries to return |
### td_clear_dev_log
Clear the current MCP dev log by closing the old file and starting a fresh one. Only available when Devmode is enabled.
No parameters (other than optional `target_instance`).
### td_test_session
Manage test sessions, bug reports, and conversation export. IMPORTANT: Do NOT proactively suggest exporting chat or submitting reports. These are tools for specific situations: - export_chat / submit_report: ONLY when the user encounters a BUG with the plugin or TouchDesigner and wants to report it, or when the user explicitly asks to export the conversation. Never suggest this at session end or as routine action. USER PHRASES → ACTIONS: 'разбор тестовых сессий' / 'analyze test sessions' → list, then pull, read meta.json → index.jsonl → calls/. 'разбор репортов' / 'analyze user reports' → list with session='user', then pull by name. 'экспортируй чат' / 'export chat' → (1) export_chat_id → marker, (2) export_chat with session=marker. 'сообщи о проблеме' / 'report bug' → export chat, review for privacy, then submit_report with summary + tags + result_op=file_path. ACTIONS: export_chat_id | export_chat | submit_report | start | note | import_chat | end | list | pull. list: default=auto-detect repo. session='user' for user_reports (dev only). pull: auto-searches both repos. Auto-detects dev vs user Hub access.
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `action` | `export_chat_id` / `export_chat` / `submit_report` / `start` / `note` / `import_chat` / `end` / `list` / `pull` | yes | Action: export_chat_id / export_chat / submit_report / start / note / import_chat / end / list / pull |
| `prompt` | string | no | (start) The test prompt/task description |
| `tags` | array | no | (start) Tags for categorization, e.g. ['ui', 'layout'] |
| `text` | string | no | (note) Observation text. (import_chat) Full conversation text. |
| `outcome` | `success` / `partial` / `failure` | no | (end) Result: success / partial / failure |
| `summary` | string | no | (end) Brief summary of what happened |
| `result_op` | string | no | (end) Path to operator to save as result.tox |
| `session` | string | no | (pull) Session name or substring to download |
@@ -0,0 +1,211 @@
# MIDI / OSC Reference
External controller input and output — MIDI hardware, TouchOSC mobile UIs, OSC routing across the network.
For audio-driven MIDI patterns (track triggers from spectrum analysis), see also `audio-reactive.md`.
---
## MIDI Input — Hardware Controllers
### Discovery
List connected MIDI devices first. Use a `midiinDAT` to enumerate:
```python
mdat = root.create(midiinDAT, 'mid_devices')
# Read available device names from the DAT after one cook
```
Or via Python directly:
```python
# In td_execute_python
import td
devices = [d for d in op.MIDI.devices] # verify with td_get_docs('midi')
```
Verify the API with `td_get_docs(topic='midi')` since this varies between TD versions.
### MIDI In CHOP
Standard pattern:
```python
midi_in = root.create(midiinCHOP, 'midi_in')
midi_in.par.device = 0 # device index from discovery
midi_in.par.activechan = True
```
Output channels follow the convention `chCcN` and `chCnN`:
- `ch1c74` — channel 1, CC 74
- `ch1n60` — channel 1, note 60 (middle C) — value is velocity 0-127
**Map a CC to a parameter:**
```python
op('/project1/bloom1').par.threshold.mode = ParMode.EXPRESSION
op('/project1/bloom1').par.threshold.expr = "op('midi_in')['ch1c74'][0] / 127.0"
```
**Map a note as a trigger:**
Notes in `midiinCHOP` output velocity while held, 0 when released. Use a `triggerCHOP` to convert a held note into pulses:
```python
trig = root.create(triggerCHOP, 'note_trig')
trig.par.threshold = 1
trig.par.triggeron = 'increase'
trig.inputConnectors[0].connect(op('midi_in'))
# Filter to a single channel via a selectCHOP if desired
```
### MIDI Learn Pattern
Build a reusable learn pattern when you don't know the controller's CC layout in advance:
1. Drop a `midiinCHOP` and `selectCHOP` after it.
2. User wiggles the controller knob.
3. Use `td_read_chop` on the midiinCHOP to identify which channel is non-zero — that's the active CC.
4. Set the `selectCHOP.par.channames` to that channel name.
5. Save the mapping to a `tableDAT` so it persists across sessions.
---
## MIDI Output
```python
midi_out = root.create(midioutCHOP, 'midi_out')
midi_out.par.device = 0
midi_out.par.outputformat = 'continuous' # 'continuous' | 'event'
# Drive an output: send out a CC mapped from any 0-1 source
src = root.create(constantCHOP, 'cc_src')
src.par.name0 = 'ch1c20'
src.par.value0 = 0.5
midi_out.inputConnectors[0].connect(src)
```
For note events specifically, use `event` mode and pulse the value with a `pulseCHOP` or `triggerCHOP`.
---
## OSC Input — Network Control
OSC is the more flexible cousin of MIDI. Used heavily for:
- TouchOSC / Lemur mobile control surfaces
- Show control systems (QLab, Watchout)
- Inter-application sync (Ableton via Max for Live, Resolume, etc.)
### OSC In CHOP
```python
osc_in = root.create(oscinCHOP, 'osc_in')
osc_in.par.port = 7000 # listen on UDP 7000
osc_in.par.localaddress = '' # empty = all interfaces
osc_in.par.queued = False # immediate vs. queued processing
```
Each incoming OSC address becomes a channel. `/scene/1/intensity` becomes a channel named `scene_1_intensity` (TD sanitizes slashes to underscores).
**Common gotcha:** TD only creates the channel after the FIRST message arrives at that address. Send a "hello" message from the controller during setup, or pre-declare channel names manually.
### OSC In DAT (for raw events)
Use a `oscinDAT` when you need full message access (multiple typed args, addresses with brackets/regex).
```python
osc_dat = root.create(oscinDAT, 'osc_events')
osc_dat.par.port = 7001
# Each row: timestamp, address, type tags, args...
```
Drive logic via a `datExecuteDAT` watching the `oscinDAT`:
```python
def onTableChange(dat):
last = dat[dat.numRows - 1, 'message']
parsed = last.val.split()
addr = parsed[0]
args = parsed[1:]
if addr == '/scene/trigger':
op('/project1/scene_switcher').par.index = int(args[0])
return
```
---
## OSC Output — Sending to External Apps
```python
osc_out = root.create(oscoutCHOP, 'osc_out')
osc_out.par.netaddress = '127.0.0.1' # destination IP
osc_out.par.port = 9000
# Channel names become OSC addresses
src = root.create(constantCHOP, 'send')
src.par.name0 = 'scene/intensity' # → /scene/intensity
src.par.value0 = 0.7
osc_out.inputConnectors[0].connect(src)
```
**Channel-to-address mapping:** TD prepends `/` automatically. Use `/` in channel names to nest.
For one-shot string/typed messages, use `oscoutDAT` and call `.sendOSC(address, args)`:
```python
op('osc_out_dat').sendOSC('/scene/trigger', [1, 'fade'])
```
---
## TouchOSC / Mobile UI Pattern
Common setup for live VJ control from a phone/tablet:
1. **Configure TouchOSC layout** — assign each control an OSC address like `/vj/master`, `/vj/scene/1`, etc.
2. **Find your machine's LAN IP** — TouchOSC needs to point at it.
3. **TD listens** on `oscinCHOP.par.port = 8000` (or whichever).
4. **Map channels to params** via expressions:
```python
op('/project1/master_level').par.opacity.mode = ParMode.EXPRESSION
op('/project1/master_level').par.opacity.expr = "op('osc_in')['vj_master']"
```
5. **Send feedback** to the controller via `oscoutCHOP` — useful for syncing state across multiple devices.
---
## Network / Multi-Machine
OSC over LAN works out-of-the-box. For multi-TD-instance sync (e.g., projection cluster):
- One TD acts as **master**, broadcasts `/sync/...` over OSC
- Worker TDs run `oscinCHOP` listening on the same port
- Use UDP **broadcast address** (e.g., `192.168.1.255`) on the master's `oscoutCHOP.par.netaddress` to hit all peers
For reliability over WAN, use `webserverDAT` or `websocketDAT` with an external relay instead — UDP loss is invisible.
---
## Pitfalls
1. **MIDI device indexing** — device `0` is whichever device TD enumerated first. Reorder may shift it. Pin by name when possible.
2. **OSC channel names** — TD doesn't create a channel until the first message lands. New channels invalidate cooked dependents on first arrival, causing a one-frame stutter.
3. **OSC queued mode**`par.queued = True` defers processing to a single per-frame batch. Lower latency but messages arriving same frame collapse to the last value. Off for triggers, on for continuous knobs.
4. **MIDI clock vs. transport**`midiinCHOP` reports clock if available. Use `midisyncCHOP` (if your TD version exposes it) or compute BPM from clock pulses (24 per quarter note).
5. **Latency** — wired MIDI is ~1-3ms. WiFi OSC is 10-30ms with jitter. Use wired for tight beat-locked work.
6. **Port conflicts** — only one process can bind a UDP port on most OS. If `oscinCHOP` shows no traffic, check that another app (Max, Ableton, etc.) isn't already listening on that port.
---
## Quick Recipes
| Goal | Op chain |
|---|---|
| Knob → bloom intensity | `midiinCHOP` → expression on `bloom.par.threshold` |
| Note → scene change | `midiinCHOP``triggerCHOP``selectCHOP` → drive `switchTOP.par.index` |
| Phone slider → master fader | TouchOSC `/master``oscinCHOP` → expression on output `level.par.opacity` |
| TD → Resolume scene trigger | `oscoutCHOP` channel `composition/layers/1/clips/1/connect` → Resolume listening on 7000 |
| Multi-projector sync | Master TD `oscoutCHOP` broadcast → workers `oscinCHOP` |
@@ -0,0 +1,966 @@
# TouchDesigner Network Patterns
Complete network recipes for common creative coding tasks. Each pattern shows the operator chain, MCP tool calls to build it, and key parameter settings.
## Audio-Reactive Visuals
### Pattern 1: Audio Spectrum -> Noise Displacement
Audio drives noise parameters for organic, music-responsive textures.
```
Audio File In CHOP -> Audio Spectrum CHOP -> Math CHOP (scale)
|
v (export to noise params)
Noise TOP -> Level TOP -> Feedback TOP -> Composite TOP -> Null TOP (out)
^ |
|________________|
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="audiofileinChop", name="audio_in")
2. td_create_operator(parent="/project1", type="audiospectrumChop", name="spectrum")
3. td_create_operator(parent="/project1", type="mathChop", name="spectrum_scale")
4. td_create_operator(parent="/project1", type="noiseTop", name="noise1")
5. td_create_operator(parent="/project1", type="levelTop", name="level1")
6. td_create_operator(parent="/project1", type="feedbackTop", name="feedback1")
7. td_create_operator(parent="/project1", type="compositeTop", name="comp1")
8. td_create_operator(parent="/project1", type="nullTop", name="out")
9. td_set_operator_pars(path="/project1/audio_in",
properties={"file": "/path/to/music.wav", "play": true})
10. td_set_operator_pars(path="/project1/spectrum",
properties={"size": 512})
11. td_set_operator_pars(path="/project1/spectrum_scale",
properties={"gain": 2.0, "postoff": 0.0})
12. td_set_operator_pars(path="/project1/noise1",
properties={"type": 1, "monochrome": false, "resolutionw": 1280, "resolutionh": 720,
"period": 4.0, "harmonics": 3, "amp": 1.0})
13. td_set_operator_pars(path="/project1/level1",
properties={"opacity": 0.95, "gamma1": 0.75})
14. td_set_operator_pars(path="/project1/feedback1",
properties={"top": "/project1/comp1"})
15. td_set_operator_pars(path="/project1/comp1",
properties={"operand": 0})
16. td_execute_python: """
op('/project1/audio_in').outputConnectors[0].connect(op('/project1/spectrum'))
op('/project1/spectrum').outputConnectors[0].connect(op('/project1/spectrum_scale'))
op('/project1/noise1').outputConnectors[0].connect(op('/project1/level1'))
op('/project1/level1').outputConnectors[0].connect(op('/project1/comp1').inputConnectors[0])
op('/project1/feedback1').outputConnectors[0].connect(op('/project1/comp1').inputConnectors[1])
op('/project1/comp1').outputConnectors[0].connect(op('/project1/out'))
"""
17. td_execute_python: """
# Export spectrum values to drive noise parameters
# This makes the noise react to audio frequencies
op('/project1/noise1').par.seed.expr = "op('/project1/spectrum_scale')['chan1']"
op('/project1/noise1').par.period.expr = "tdu.remap(op('/project1/spectrum_scale')['chan1'].eval(), 0, 1, 1, 8)"
"""
```
### Pattern 2: Beat Detection -> Visual Pulses
Detect beats from audio and trigger visual events.
```
Audio Device In CHOP -> Audio Spectrum CHOP -> Math CHOP (isolate bass)
|
Trigger CHOP (envelope)
|
[export to visual params]
```
**Key parameter settings:**
```
# Isolate bass frequencies (20-200 Hz)
Math CHOP: chanop=1 (Add channels), range1low=0, range1high=10
(first 10 FFT bins = bass frequencies with 512 FFT at 44100Hz)
# ADSR envelope on each beat
Trigger CHOP: attack=0.02, peak=1.0, decay=0.3, sustain=0.0, release=0.1
# Export to visual: Scale, brightness, or color intensity
td_execute_python: "op('/project1/level1').par.brightness1.expr = \"1.0 + op('/project1/trigger1')['chan1'] * 0.5\""
```
### Pattern 3: Multi-Band Audio -> Multi-Layer Visuals
Split audio into frequency bands, drive different visual layers per band.
```
Audio In -> Spectrum -> Audio Band EQ (3 bands: bass, mid, treble)
|
+---------+---------+
| | |
Bass Mids Treble
| | |
Noise TOP Circle TOP Text TOP
(slow,dark) (mid,warm) (fast,bright)
| | |
+-----+----+----+----+
| |
Composite Composite
|
Out
```
### Pattern 3b: Audio-Reactive GLSL Fractal (Proven Recipe)
Complete working recipe. Plays an MP3, runs FFT, feeds spectrum as a texture into a GLSL shader where inner fractal reacts to bass, outer to treble.
**Network:**
```
AudioFileIn CHOP → AudioSpectrum CHOP (FFT=512, outlength=256)
→ Math CHOP (gain=10) → CHOP To TOP (256x2 spectrum texture, dataformat=r)
Constant TOP (time, rgba32float) → GLSL TOP (input 0=time, input 1=spectrum) → Null → MovieFileOut
AudioFileIn CHOP → Audio Device Out CHOP Record to .mov
```
**Build via td_execute_python (one call per step for reliability):**
```python
# Step 1: Audio chain
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
audio = root.create(audiofileinCHOP, 'audio_in')
audio.par.file = '/path/to/music.mp3'
audio.par.playmode = 0 # Locked to timeline
audio.par.volume = 0.5
spec = root.create(audiospectrumCHOP, 'spectrum')
audio.outputConnectors[0].connect(spec.inputConnectors[0])
math_n = root.create(mathCHOP, 'math_norm')
spec.outputConnectors[0].connect(math_n.inputConnectors[0])
math_n.par.gain = 5 # boost signal
resamp = root.create(resampleCHOP, 'resample_spec')
math_n.outputConnectors[0].connect(resamp.inputConnectors[0])
resamp.par.timeslice = True
resamp.par.rate = 256
chop2top = root.create(choptoTOP, 'spectrum_tex')
chop2top.par.chop = resamp # CHOP To TOP has NO input connectors — use par.chop reference
# Audio output (hear the music)
aout = root.create(audiodeviceoutCHOP, 'audio_out')
audio.outputConnectors[0].connect(aout.inputConnectors[0])
result = 'audio chain ok'
""")
# Step 2: Time driver (MUST be rgba32float — see pitfalls #6)
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
td = root.create(constantTOP, 'time_driver')
td.par.format = 'rgba32float'
td.par.outputresolution = 'custom'
td.par.resolutionw = 1
td.par.resolutionh = 1
td.par.colorr.expr = "absTime.seconds % 1000.0"
td.par.colorg.expr = "int(absTime.seconds / 1000.0)"
result = 'time ok'
""")
# Step 3: GLSL shader (write to /tmp, load from file)
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
glsl = root.create(glslTOP, 'audio_shader')
glsl.par.outputresolution = 'custom'
glsl.par.resolutionw = 1280
glsl.par.resolutionh = 720
sd = root.create(textDAT, 'shader_code')
sd.text = open('/tmp/my_shader.glsl').read()
glsl.par.pixeldat = sd
# Wire: input 0 = time, input 1 = spectrum texture
op('/project1/time_driver').outputConnectors[0].connect(glsl.inputConnectors[0])
op('/project1/spectrum_tex').outputConnectors[0].connect(glsl.inputConnectors[1])
result = 'glsl ok'
""")
# Step 4: Output + recorder
# td_execute_python script:
td_execute_python(code="""
root = op('/project1')
out = root.create(nullTOP, 'output')
op('/project1/audio_shader').outputConnectors[0].connect(out.inputConnectors[0])
rec = root.create(moviefileoutTOP, 'recorder')
out.outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'mjpa'
result = 'output ok'
""")
```
**GLSL shader pattern (audio-reactive fractal):**
```glsl
out vec4 fragColor;
vec3 palette(float t) {
vec3 a = vec3(0.5); vec3 b = vec3(0.5);
vec3 c = vec3(1.0); vec3 d = vec3(0.263, 0.416, 0.557);
return a + b * cos(6.28318 * (c * t + d));
}
void main() {
// Input 0 = time (1x1 rgba32float constant)
// Input 1 = audio spectrum (256x2 CHOP To TOP, stereo — sample at y=0.25 for first channel)
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = (gl_FragCoord.xy * 2.0 - res) / min(res.x, res.y);
vec2 uv0 = uv;
vec3 finalColor = vec3(0.0);
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r;
float mids = texture(sTD2DInputs[1], vec2(0.25, 0.25)).r;
for (float i = 0.0; i < 4.0; i++) {
uv = fract(uv * (1.4 + bass * 0.3)) - 0.5;
float d = length(uv) * exp(-length(uv0));
// Sample spectrum at distance: inner=bass, outer=treble
float freq = texture(sTD2DInputs[1], vec2(clamp(d * 0.5, 0.0, 1.0), 0.25)).r;
vec3 col = palette(length(uv0) + i * 0.4 + t * 0.35);
d = sin(d * (7.0 + bass * 4.0) + t * 1.5) / 8.0;
d = abs(d);
d = pow(0.012 / d, 1.2 + freq * 0.8 + bass * 0.5);
finalColor += col * d;
}
// Tone mapping
finalColor = finalColor / (finalColor + vec3(1.0));
fragColor = TDOutputSwizzle(vec4(finalColor, 1.0));
}
```
**Key insights from testing:**
- `spectrum_tex` (CHOP To TOP) produces a 256x2 texture — x position = frequency, y=0.25 for first channel
- Sampling at `vec2(0.05, 0.0)` gets bass, `vec2(0.65, 0.0)` gets treble
- Sampling based on pixel distance (`d * 0.5`) makes inner fractal react to bass, outer to treble
- `bass * 0.3` in the `fract()` zoom makes the fractal breathe with kicks
- Math CHOP gain of 5 is needed because raw spectrum values are very small
## Generative Art
### Pattern 4: Feedback Loop with Transform
Classic generative technique — texture evolves through recursive transformation.
```
Noise TOP -> Composite TOP -> Level TOP -> Null TOP (out)
^ |
| v
Transform TOP <- Feedback TOP
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="noiseTop", name="seed_noise")
2. td_create_operator(parent="/project1", type="compositeTop", name="mix")
3. td_create_operator(parent="/project1", type="transformTop", name="evolve")
4. td_create_operator(parent="/project1", type="feedbackTop", name="fb")
5. td_create_operator(parent="/project1", type="levelTop", name="color_correct")
6. td_create_operator(parent="/project1", type="nullTop", name="out")
7. td_set_operator_pars(path="/project1/seed_noise",
properties={"type": 1, "monochrome": false, "period": 2.0, "amp": 0.3,
"resolutionw": 1280, "resolutionh": 720})
8. td_set_operator_pars(path="/project1/mix",
properties={"operand": 27}) # 27 = Screen blend
9. td_set_operator_pars(path="/project1/evolve",
properties={"sx": 1.003, "sy": 1.003, "rz": 0.5, "extend": 2}) # slight zoom + rotate, repeat edges
10. td_set_operator_pars(path="/project1/fb",
properties={"top": "/project1/mix"})
11. td_set_operator_pars(path="/project1/color_correct",
properties={"opacity": 0.98, "gamma1": 0.85})
12. td_execute_python: """
op('/project1/seed_noise').outputConnectors[0].connect(op('/project1/mix').inputConnectors[0])
op('/project1/fb').outputConnectors[0].connect(op('/project1/evolve'))
op('/project1/evolve').outputConnectors[0].connect(op('/project1/mix').inputConnectors[1])
op('/project1/mix').outputConnectors[0].connect(op('/project1/color_correct'))
op('/project1/color_correct').outputConnectors[0].connect(op('/project1/out'))
"""
```
**Variations:**
- Change Transform: `rz` (rotation), `sx/sy` (zoom), `tx/ty` (drift)
- Change Composite operand: Screen (glow), Add (bright), Multiply (dark)
- Add HSV Adjust in the feedback loop for color evolution
- Add Blur for dreamlike softness
- Replace Noise with a GLSL TOP for custom seed patterns
### Pattern 5: Instancing (Particle-Like Systems)
Render thousands of copies of geometry, each with unique position/rotation/scale driven by CHOP data or DATs.
```
Table DAT (instance data) -> DAT to CHOP -> Geometry COMP (instancing on) -> Render TOP
+ Sphere SOP (template geometry)
+ Constant MAT (material)
+ Camera COMP
+ Light COMP
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="tableDat", name="instance_data")
2. td_create_operator(parent="/project1", type="geometryComp", name="geo1")
3. td_create_operator(parent="/project1/geo1", type="sphereSop", name="sphere")
4. td_create_operator(parent="/project1", type="constMat", name="mat1")
5. td_create_operator(parent="/project1", type="cameraComp", name="cam1")
6. td_create_operator(parent="/project1", type="lightComp", name="light1")
7. td_create_operator(parent="/project1", type="renderTop", name="render1")
8. td_execute_python: """
import random, math
dat = op('/project1/instance_data')
dat.clear()
dat.appendRow(['tx', 'ty', 'tz', 'sx', 'sy', 'sz', 'cr', 'cg', 'cb'])
for i in range(500):
angle = i * 0.1
r = 2 + i * 0.01
dat.appendRow([
str(math.cos(angle) * r),
str(math.sin(angle) * r),
str((i - 250) * 0.02),
'0.05', '0.05', '0.05',
str(random.random()),
str(random.random()),
str(random.random())
])
"""
9. td_set_operator_pars(path="/project1/geo1",
properties={"instancing": true, "instancechop": "",
"instancedat": "/project1/instance_data",
"material": "/project1/mat1"})
10. td_set_operator_pars(path="/project1/render1",
properties={"camera": "/project1/cam1", "geometry": "/project1/geo1",
"light": "/project1/light1",
"resolutionw": 1280, "resolutionh": 720})
11. td_set_operator_pars(path="/project1/cam1",
properties={"tz": 10})
```
### Pattern 6: Reaction-Diffusion (GLSL)
Classic Gray-Scott reaction-diffusion system running on the GPU.
```
Text DAT (GLSL code) -> GLSL TOP (resolution, dat reference) -> Feedback TOP
^ |
|_______________________________________|
Level TOP (out)
```
**Key GLSL code (write to Text DAT via td_execute_python):**
```glsl
// Gray-Scott reaction-diffusion
uniform float feed; // 0.037
uniform float kill; // 0.06
uniform float dA; // 1.0
uniform float dB; // 0.5
layout(location = 0) out vec4 fragColor;
void main() {
vec2 uv = vUV.st;
vec2 texel = 1.0 / uTDOutputInfo.res.zw;
vec4 c = texture(sTD2DInputs[0], uv);
float a = c.r;
float b = c.g;
// Laplacian (9-point stencil)
float lA = 0.0, lB = 0.0;
for(int dx = -1; dx <= 1; dx++) {
for(int dy = -1; dy <= 1; dy++) {
float w = (dx == 0 && dy == 0) ? -1.0 : (abs(dx) + abs(dy) == 1 ? 0.2 : 0.05);
vec4 s = texture(sTD2DInputs[0], uv + vec2(dx, dy) * texel);
lA += s.r * w;
lB += s.g * w;
}
}
float reaction = a * b * b;
float newA = a + (dA * lA - reaction + feed * (1.0 - a));
float newB = b + (dB * lB + reaction - (kill + feed) * b);
fragColor = vec4(clamp(newA, 0.0, 1.0), clamp(newB, 0.0, 1.0), 0.0, 1.0);
}
```
## Video Processing
### Pattern 7: Video Effects Chain
Apply a chain of effects to a video file.
```
Movie File In TOP -> HSV Adjust TOP -> Level TOP -> Blur TOP -> Composite TOP -> Null TOP (out)
^
Text TOP ---+
```
**MCP Build Sequence:**
```
1. td_create_operator(parent="/project1", type="moviefileinTop", name="video_in")
2. td_create_operator(parent="/project1", type="hsvadjustTop", name="color")
3. td_create_operator(parent="/project1", type="levelTop", name="levels")
4. td_create_operator(parent="/project1", type="blurTop", name="blur")
5. td_create_operator(parent="/project1", type="compositeTop", name="overlay")
6. td_create_operator(parent="/project1", type="textTop", name="title")
7. td_create_operator(parent="/project1", type="nullTop", name="out")
8. td_set_operator_pars(path="/project1/video_in",
properties={"file": "/path/to/video.mp4", "play": true})
9. td_set_operator_pars(path="/project1/color",
properties={"hueoffset": 0.1, "saturationmult": 1.3})
10. td_set_operator_pars(path="/project1/levels",
properties={"brightness1": 1.1, "contrast": 1.2, "gamma1": 0.9})
11. td_set_operator_pars(path="/project1/blur",
properties={"sizex": 2, "sizey": 2})
12. td_set_operator_pars(path="/project1/title",
properties={"text": "My Video", "fontsizex": 48, "alignx": 1, "aligny": 1})
13. td_execute_python: """
chain = ['video_in', 'color', 'levels', 'blur']
for i in range(len(chain) - 1):
op(f'/project1/{chain[i]}').outputConnectors[0].connect(op(f'/project1/{chain[i+1]}'))
op('/project1/blur').outputConnectors[0].connect(op('/project1/overlay').inputConnectors[0])
op('/project1/title').outputConnectors[0].connect(op('/project1/overlay').inputConnectors[1])
op('/project1/overlay').outputConnectors[0].connect(op('/project1/out'))
"""
```
### Pattern 8: Video Recording
Record the output to a file. **H.264/H.265 require a Commercial license** — use Motion JPEG (`mjpa`) on Non-Commercial.
```
[any TOP chain] -> Null TOP -> Movie File Out TOP
```
```python
# Build via td_execute_python:
root = op('/project1')
# Always put a Null TOP before the recorder
null_out = root.op('out') # or create one
rec = root.create(moviefileoutTOP, 'recorder')
null_out.outputConnectors[0].connect(rec.inputConnectors[0])
rec.par.type = 'movie'
rec.par.file = '/tmp/output.mov'
rec.par.videocodec = 'mjpa' # Motion JPEG — works on Non-Commercial
# Start recording (par.record is a toggle — .record() method may not exist)
rec.par.record = True
# ... let TD run for desired duration ...
rec.par.record = False
# For image sequences:
# rec.par.type = 'imagesequence'
# rec.par.imagefiletype = 'png'
# rec.par.file.expr = "'/tmp/frames/out' + me.fileSuffix" # fileSuffix REQUIRED
```
**Pitfalls:**
- Setting `par.file` + `par.record = True` in the same script may race — use `run("...", delayFrames=2)`
- `TOP.save()` called rapidly always captures the same frame — use MovieFileOut for animation
- See `pitfalls.md` #25-27 for full details
### Pattern 8b: TD → External Pipeline (FFmpeg / Python / Post-Processing)
Export TD visuals for use in another tool (ffmpeg, Python, ASCII art, etc.). This is the standard workflow when you need to composite TD output with external processing (ASCII conversion, Python shader chains, ML inference, etc.).
**Step 1: Record to video in TD**
```python
# Preferred: ProRes on macOS (lossless, Non-Commercial OK, ~55MB/s at 1280x720)
rec.par.videocodec = 'prores'
# Fallback for non-macOS: mjpa (Motion JPEG)
# rec.par.videocodec = 'mjpa'
rec.par.record = True
# ... wait N seconds ...
rec.par.record = False
```
**Step 2: Extract frames with ffmpeg**
```bash
# Extract all frames at 30fps
ffmpeg -y -i /tmp/output.mov -vf 'fps=30' /tmp/frames/frame_%06d.png
# Or extract a specific duration
ffmpeg -y -i /tmp/output.mov -t 25 -vf 'fps=30' /tmp/frames/frame_%06d.png
# Or extract specific frame range
ffmpeg -y -i /tmp/output.mov -vf 'select=between(n\,0\,749)' -vsync vfr /tmp/frames/frame_%06d.png
```
**Step 3: Process frames in Python**
```python
from PIL import Image
import os
frames_dir = '/tmp/frames'
output_dir = '/tmp/processed'
os.makedirs(output_dir, exist_ok=True)
for fname in sorted(os.listdir(frames_dir)):
if not fname.endswith('.png'):
continue
img = Image.open(os.path.join(frames_dir, fname))
# ... apply your processing ...
img.save(os.path.join(output_dir, fname))
```
**Step 4: Mux processed frames back with audio**
```bash
# Create video from processed frames + audio with fade-out
ffmpeg -y \
-framerate 30 -i /tmp/processed/frame_%06d.png \
-i /tmp/audio.mp3 \
-c:v libx264 -pix_fmt yuv420p -crf 18 \
-c:a aac -b:a 192k \
-shortest \
-af 'afade=t=out:st=23:d=2' \
/tmp/final_output.mp4
```
**Key considerations:**
- Use ProRes for the TD recording step to avoid generation loss during compositing
- Extract at the target output framerate (not TD's render framerate)
- For audio-synced content, analyze the audio file separately in Python (scipy FFT) to get per-frame features (rms, spectral bands, beats) and drive compositing parameters
- Always verify TD FPS > 0 before recording (see pitfalls #37, #38)
## Data Visualization
### Pattern 9: Table Data -> Bar Chart via Instancing
Visualize tabular data as a 3D bar chart.
```
Table DAT (data) -> Script DAT (transform to instance format) -> DAT to CHOP
|
Box SOP -> Geometry COMP (instancing from CHOP) -> Render TOP -> Null TOP (out)
+ PBR MAT
+ Camera COMP
+ Light COMP
```
```python
# Script DAT code to transform data to instance positions
td_execute_python: """
source = op('/project1/data_table')
instance = op('/project1/instance_transform')
instance.clear()
instance.appendRow(['tx', 'ty', 'tz', 'sx', 'sy', 'sz', 'cr', 'cg', 'cb'])
for i in range(1, source.numRows):
value = float(source[i, 'value'])
name = source[i, 'name']
instance.appendRow([
str(i * 1.5), # x position (spread bars)
str(value / 2), # y position (center bar vertically)
'0', # z position
'1', str(value), '1', # scale (height = data value)
'0.2', '0.6', '1.0' # color (blue)
])
"""
```
### Pattern 9b: Audio-Reactive GLSL Fractal (Proven Recipe)
Audio spectrum drives a GLSL fractal shader directly via a spectrum texture input. Bass thickens inner fractal lines, mids twist rotation, highs light outer edges. **Always run discovery (SKILL.md Step 0) before using any param names from these recipes — they may differ in your TD version.**
```
Audio File In CHOP → Audio Spectrum CHOP (FFT=512, outlength=256)
→ Math CHOP (gain=10)
→ CHOP To TOP (spectrum texture, 256x2, dataformat=r)
↓ (input 1)
Constant TOP (rgba32float, time) → GLSL TOP (audio-reactive shader) → Null TOP
(input 0) ↑
Text DAT (shader code)
```
**Build via td_execute_python (complete working script):**
```python
# td_execute_python script:
td_execute_python(code="""
import os
root = op('/project1')
# Audio input
audio = root.create(audiofileinCHOP, 'audio_in')
audio.par.file = '/path/to/music.mp3'
audio.par.playmode = 0 # Locked to timeline
# FFT analysis (output length manually set to 256 bins)
spectrum = root.create(audiospectrumCHOP, 'spectrum')
audio.outputConnectors[0].connect(spectrum.inputConnectors[0])
spectrum.par.fftsize = '512'
spectrum.par.outputmenu = 'setmanually'
spectrum.par.outlength = 256
# THEN boost gain on the raw spectrum (NO Lag CHOP — see pitfall #34)
math = root.create(mathCHOP, 'math_norm')
spectrum.outputConnectors[0].connect(math.inputConnectors[0])
math.par.gain = 10
# Spectrum → texture (256x2 image — stereo, sample at y=0.25 for first channel)
# NOTE: choptoTOP has NO input connectors — use par.chop reference!
spec_tex = root.create(choptoTOP, 'spectrum_tex')
spec_tex.par.chop = math
spec_tex.par.dataformat = 'r'
spec_tex.par.layout = 'rowscropped'
# Time driver (rgba32float to avoid 0-1 clamping!)
time_drv = root.create(constantTOP, 'time_driver')
time_drv.par.format = 'rgba32float'
time_drv.par.outputresolution = 'custom'
time_drv.par.resolutionw = 1
time_drv.par.resolutionh = 1
time_drv.par.colorr.expr = "absTime.seconds % 1000.0"
time_drv.par.colorg.expr = "int(absTime.seconds / 1000.0)"
# GLSL shader
glsl = root.create(glslTOP, 'audio_shader')
glsl.par.outputresolution = 'custom'
glsl.par.resolutionw = 1280; glsl.par.resolutionh = 720
shader_dat = root.create(textDAT, 'shader_code')
shader_dat.text = open('/tmp/shader.glsl').read()
glsl.par.pixeldat = shader_dat
# Wire: input 0=time, input 1=spectrum
time_drv.outputConnectors[0].connect(glsl.inputConnectors[0])
spec_tex.outputConnectors[0].connect(glsl.inputConnectors[1])
# Output + audio playback
out = root.create(nullTOP, 'output')
glsl.outputConnectors[0].connect(out.inputConnectors[0])
audio_out = root.create(audiodeviceoutCHOP, 'audio_out')
audio.outputConnectors[0].connect(audio_out.inputConnectors[0])
result = 'network built'
""")
```
**GLSL shader (reads spectrum from input 1 texture):**
```glsl
out vec4 fragColor;
vec3 palette(float t) {
vec3 a = vec3(0.5); vec3 b = vec3(0.5);
vec3 c = vec3(1.0); vec3 d = vec3(0.263, 0.416, 0.557);
return a + b * cos(6.28318 * (c * t + d));
}
void main() {
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = (gl_FragCoord.xy * 2.0 - res) / min(res.x, res.y);
vec2 uv0 = uv;
vec3 finalColor = vec3(0.0);
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r;
float mids = texture(sTD2DInputs[1], vec2(0.25, 0.25)).r;
float highs = texture(sTD2DInputs[1], vec2(0.65, 0.25)).r;
float ca = cos(t * (0.15 + mids * 0.3));
float sa = sin(t * (0.15 + mids * 0.3));
uv = mat2(ca, -sa, sa, ca) * uv;
for (float i = 0.0; i < 4.0; i++) {
uv = fract(uv * (1.4 + bass * 0.3)) - 0.5;
float d = length(uv) * exp(-length(uv0));
float freq = texture(sTD2DInputs[1], vec2(clamp(d*0.5, 0.0, 1.0), 0.25)).r;
vec3 col = palette(length(uv0) + i * 0.4 + t * 0.35);
d = sin(d * (7.0 + bass * 4.0) + t * 1.5) / 8.0;
d = abs(d);
d = pow(0.012 / d, 1.2 + freq * 0.8 + bass * 0.5);
finalColor += col * d;
}
float glow = (0.03 + bass * 0.05) / (length(uv0) + 0.03);
finalColor += vec3(0.4, 0.1, 0.7) * glow * (0.6 + 0.4 * sin(t * 2.5));
float ring = abs(length(uv0) - 0.4 - mids * 0.3);
finalColor += vec3(0.1, 0.6, 0.8) * (0.005 / ring) * (0.2 + highs * 0.5);
finalColor *= smoothstep(0.0, 1.0, 1.0 - dot(uv0*0.55, uv0*0.55));
finalColor = finalColor / (finalColor + vec3(1.0));
fragColor = TDOutputSwizzle(vec4(finalColor, 1.0));
}
```
**How spectrum sampling drives the visual:**
- `texture(sTD2DInputs[1], vec2(x, 0.0)).r` — x position = frequency (0=bass, 1=treble)
- Inner fractal iterations sample lower x → react to bass
- Outer iterations sample higher x → react to treble
- `bass * 0.3` on `fract()` scale → fractal zoom pulses with bass
- `bass * 4.0` on sin frequency → line density pulses with bass
- `mids * 0.3` on rotation speed → spiral twists faster during vocal/mid sections
- `highs * 0.5` on ring opacity → high-frequency sparkle on outer ring
**Recording the output:** Use MovieFileOut TOP with `mjpa` codec (H.264 requires Commercial license). See pitfalls #25-27.
## GLSL Shaders
### Pattern 10: Custom Fragment Shader
Write a custom visual effect as a GLSL fragment shader.
```
Text DAT (shader code) -> GLSL TOP -> Level TOP -> Null TOP (out)
+ optional input TOPs for texture sampling
```
**Common GLSL uniforms available in TouchDesigner:**
```glsl
// Automatically provided by TD
uniform vec4 uTDOutputInfo; // .res.zw = resolution
// NOTE: uTDCurrentTime does NOT exist in TD 099!
// Feed time via a 1x1 Constant TOP (format=rgba32float):
// t.par.colorr.expr = "absTime.seconds % 1000.0"
// t.par.colorg.expr = "int(absTime.seconds / 1000.0)"
// Then read in GLSL:
// vec4 td = texture(sTD2DInputs[0], vec2(0.5));
// float t = td.r + td.g * 1000.0;
// Input textures (from connected TOP inputs)
uniform sampler2D sTD2DInputs[1]; // array of input samplers
// From vertex shader
in vec3 vUV; // UV coordinates (0-1 range)
```
**Example: Plasma shader (using time from input texture)**
```glsl
layout(location = 0) out vec4 fragColor;
void main() {
vec2 uv = vUV.st;
// Read time from Constant TOP input 0 (rgba32float format)
vec4 td = texture(sTD2DInputs[0], vec2(0.5));
float t = td.r + td.g * 1000.0;
float v1 = sin(uv.x * 10.0 + t);
float v2 = sin(uv.y * 10.0 + t * 0.7);
float v3 = sin((uv.x + uv.y) * 10.0 + t * 1.3);
float v4 = sin(length(uv - 0.5) * 20.0 - t * 2.0);
float v = (v1 + v2 + v3 + v4) * 0.25;
vec3 color = vec3(
sin(v * 3.14159 + 0.0) * 0.5 + 0.5,
sin(v * 3.14159 + 2.094) * 0.5 + 0.5,
sin(v * 3.14159 + 4.189) * 0.5 + 0.5
);
fragColor = vec4(color, 1.0);
}
```
### Pattern 11: Multi-Pass GLSL (Ping-Pong)
For effects needing state across frames (particles, fluid, cellular automata), use GLSL Multi TOP with multiple passes or a Feedback TOP loop.
```
GLSL Multi TOP (pass 0: simulation, pass 1: rendering)
+ Text DAT (simulation shader)
+ Text DAT (render shader)
-> Level TOP -> Null TOP (out)
^
|__ Feedback TOP (feeds simulation state back)
```
## Interactive Installations
### Pattern 12: Mouse/Touch -> Visual Response
```
Mouse In CHOP -> Math CHOP (normalize to 0-1) -> [export to visual params]
# Or for touch/multi-touch:
Multi Touch In DAT -> Script CHOP (parse touches) -> [export to visual params]
```
```python
# Normalize mouse position to 0-1 range
td_execute_python: """
op('/project1/noise1').par.offsetx.expr = "op('/project1/mouse_norm')['tx']"
op('/project1/noise1').par.offsety.expr = "op('/project1/mouse_norm')['ty']"
"""
```
### Pattern 13: OSC Control (from external software)
```
OSC In CHOP (port 7000) -> Select CHOP (pick channels) -> [export to visual params]
```
```
1. td_create_operator(parent="/project1", type="oscinChop", name="osc_in")
2. td_set_operator_pars(path="/project1/osc_in", properties={"port": 7000})
# OSC messages like /frequency 440 will appear as channel "frequency" with value 440
# Export to any parameter:
3. td_execute_python: "op('/project1/noise1').par.period.expr = \"op('/project1/osc_in')['frequency']\""
```
### Pattern 14: MIDI Control (DJ/VJ)
```
MIDI In CHOP (device) -> Select CHOP -> [export channels to visual params]
```
Common MIDI mappings:
- CC channels (knobs/faders): continuous 0-127, map to float params
- Note On/Off: binary triggers, map to Trigger CHOP for envelopes
- Velocity: intensity/brightness
## Live Performance
### Pattern 15: Multi-Source VJ Setup
```
Source A (generative) ----+
Source B (video) ---------+-- Switch/Cross TOP -- Level TOP -- Window COMP (output)
Source C (camera) --------+
^
MIDI/OSC control selects active source and crossfade
```
```python
# MIDI CC1 controls which source is active (0-127 -> 0-2)
td_execute_python: """
op('/project1/switch1').par.index.expr = "int(op('/project1/midi_in')['cc1'] / 42)"
"""
# MIDI CC2 controls crossfade between current and next
td_execute_python: """
op('/project1/cross1').par.cross.expr = "op('/project1/midi_in')['cc2'] / 127.0"
"""
```
### Pattern 16: Projection Mapping
```
Content TOPs ----+
|
Stoner TOP (UV mapping) -> Composite TOP -> Window COMP (projector output)
or
Kantan Mapper COMP (external .tox)
```
For projection mapping, the key is:
1. Create your visual content as standard TOPs
2. Use Stoner TOP or a third-party mapping tool to UV-map content to physical surfaces
3. Output via Window COMP to the projector
### Pattern 17: Cue System
```
Table DAT (cue list: cue_number, scene_name, duration, transition_type)
|
Script CHOP (cue state: current_cue, progress, next_cue_trigger)
|
[export to Switch/Cross TOPs to transition between scenes]
```
```python
td_execute_python: """
# Simple cue system
cue_table = op('/project1/cue_list')
cue_state = op('/project1/cue_state')
def advance_cue():
current = int(cue_state.par.value0.val)
next_cue = min(current + 1, cue_table.numRows - 1)
cue_state.par.value0.val = next_cue
scene = cue_table[next_cue, 'scene']
duration = float(cue_table[next_cue, 'duration'])
# Set crossfade target and duration
op('/project1/cross1').par.cross.val = 0
# Animate cross to 1.0 over duration seconds
# (use a Timer CHOP or LFO CHOP for smooth animation)
"""
```
## Networking
### Pattern 18: OSC Server/Client
```
# Sending OSC
OSC Out CHOP -> (network) -> external application
# Receiving OSC
(network) -> OSC In CHOP -> Select CHOP -> [use values]
```
### Pattern 19: NDI Video Streaming
```
# Send video over network
[any TOP chain] -> NDI Out TOP (source name)
# Receive video from network
NDI In TOP (select source) -> [process as normal TOP]
```
### Pattern 20: WebSocket Communication
```
WebSocket DAT -> Script DAT (parse JSON messages) -> [update visuals]
```
```python
td_execute_python: """
ws = op('/project1/websocket1')
ws.par.address = 'ws://localhost:8080'
ws.par.active = True
# In a DAT Execute callback (Script DAT watching WebSocket DAT):
# def onTableChange(dat):
# import json
# msg = json.loads(dat.text)
# op('/project1/noise1').par.seed.val = msg.get('seed', 0)
"""
```
@@ -0,0 +1,106 @@
# Operator Tips
## Wireframe Rendering Pattern
Reusable setup for wireframe geometry on black background:
```python
# 1. Material
mat = root.create(wireframeMAT, 'wire_mat')
mat.par.colorr = 1.0; mat.par.colorg = 0.0; mat.par.colorb = 0.0
mat.par.linewidth = 3
# 2. Geometry COMP
geo = root.create(geometryCOMP, 'my_geo')
geo.par.rx.expr = 'absTime.seconds * 30'
geo.par.ry.expr = 'absTime.seconds * 45'
geo.par.material = mat.path # NOTE: 'material' not 'mat'
# 3. Shape inside the geo
box = geo.create(boxSOP, 'cube')
box.par.sizex = 1.5; box.par.sizey = 1.5; box.par.sizez = 1.5
# 4. Camera
cam = root.create(cameraCOMP, 'cam1')
cam.par.tx = 0; cam.par.ty = 0; cam.par.tz = 4; cam.par.fov = 45
# 5. Render TOP
render = root.create(renderTOP, 'render1')
render.par.outputresolution = 'custom'
render.par.resolutionw = 1280; render.par.resolutionh = 720
render.par.bgcolorr = 0; render.par.bgcolorg = 0; render.par.bgcolorb = 0
render.par.camera = cam.path
render.par.geometry = geo.path
# 6. Output null
out = root.create(nullTOP, 'out1')
out.inputConnectors[0].connect(render.outputConnectors[0])
```
**Key rules:**
- Class names: `wireframeMAT` not `wireframeMat` (all-caps suffix)
- Geometry SOPs/POPs go INSIDE the geo comp
- Material: `geo.par.material` not `geo.par.mat`
- Render geometry: `render.par.geometry = geo.path` (string path)
- `wireframeMAT.par.wireframemode = 'topology'` for clean wireframe (vs `'tesselated'` for triangle edges)
- Alternative: Use `renderTOP.par.overridemat` instead of per-geo material
## Feedback TOP
### Basic Structure
```
input (initial state) ──┐
├──→ feedback_top ──→ processing ──→ null_out
│ ↑
└── par.top = 'null_out' ────────────────┘
```
### Setup Pattern
```python
# 1. Processing chain
glsl = root.create(glslTOP, 'sim')
null_out = root.create(nullTOP, 'null_out')
glsl.outputConnectors[0].connect(null_out.inputConnectors[0])
# 2. Feedback referencing null_out
feedback = root.create(feedbackTOP, 'feedback')
feedback.par.top = 'null_out'
# 3. Black initial state
const_init = root.create(constantTOP, 'const_init')
const_init.par.colorr = 0; const_init.par.colorg = 0; const_init.par.colorb = 0
# 4. Wire: initial → feedback, feedback → processing
feedback.inputConnectors[0].connect(const_init)
glsl.inputConnectors[0].connect(feedback)
# 5. Reset to apply initial state
feedback.par.resetpulse.pulse()
```
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| "Not enough sources specified" | No input connected | Connect initial state TOP |
| Unexpected initial pattern | Wrong initial state | Use Constant TOP (black) |
### Tips
1. Use float format for simulations: `glsl.par.format = 'rgba32float'`
2. Reset after setup: `feedback.par.resetpulse.pulse()`
3. Match resolutions — feedback, processing, and initial state must match
4. Soft boundary prevents edge artifacts:
```glsl
float edge = 3.0 * texel.x;
float bx = smoothstep(0.0, edge, uv.x) * smoothstep(0.0, edge, 1.0 - uv.x);
float by = smoothstep(0.0, edge, uv.y) * smoothstep(0.0, edge, 1.0 - uv.y);
value *= bx * by;
```
### Use Cases
- **Wave Simulation** — R=height, G=velocity, black initial state
- **Cellular Automata** — white=alive, black=dead, random noise initial state
- **Trail / Motion Blur** — blend current frame with feedback, black initial
@@ -0,0 +1,239 @@
# TouchDesigner Operator Reference
## Operator Families Overview
TouchDesigner has 6 operator families. Each family processes a specific data type and is color-coded in the UI. Operators can only connect to others of the SAME family (with cross-family converters as the bridge).
## TOPs — Texture Operators (Purple)
2D image/texture processing on the GPU. The workhorse of visual output.
### Generators (create images from nothing)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Noise TOP | `noiseTop` | `type` (0-6), `monochrome`, `seed`, `period`, `harmonics`, `exponent`, `amp`, `offset`, `resolutionw/h` | Procedural noise textures — Perlin, Simplex, Sparse, etc. Foundation of generative art. |
| Constant TOP | `constantTop` | `colorr/g/b/a`, `resolutionw/h` | Solid color. Use as background or blend input. |
| Text TOP | `textTop` | `text`, `fontsizex`, `fontfile`, `alignx/y`, `colorr/g/b` | Render text to texture. Supports multi-line, word wrap. |
| Ramp TOP | `rampTop` | `type` (0=horizontal, 1=vertical, 2=radial, 3=circular), `phase`, `period` | Gradient textures for masking, color mapping. |
| Circle TOP | `circleTop` | `radiusx/y`, `centerx/y`, `width` | Circles, rings, ellipses. |
| Rectangle TOP | `rectangleTop` | `sizex/y`, `centerx/y`, `softness` | Rectangles with optional softness. |
| GLSL TOP | `glslTop` | `dat` (points to shader DAT), `resolutionw/h`, `outputformat`, custom uniforms | Custom fragment shaders. Most powerful TOP for custom visuals. |
| GLSL Multi TOP | `glslmultiTop` | `dat`, `numinputs`, `numoutputs`, `numcomputepasses` | Multi-pass GLSL with compute shaders. Advanced. |
| Render TOP | `renderTop` | `camera`, `geometry`, `lights`, `resolutionw/h` | Renders 3D scenes (SOPs + MATs + Camera/Light COMPs). |
### Filters (modify a single input)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Level TOP | `levelTop` | `opacity`, `brightness1/2`, `gamma1/2`, `contrast`, `invert`, `blacklevel/whitelevel` | Brightness, contrast, gamma, levels. Essential color correction. |
| Blur TOP | `blurTop` | `sizex/y`, `type` (0=Gaussian, 1=Box, 2=Bartlett) | Gaussian/box blur. |
| Transform TOP | `transformTop` | `tx/ty`, `sx/sy`, `rz`, `pivotx/y`, `extend` (0=Hold, 1=Zero, 2=Repeat, 3=Mirror) | Translate, scale, rotate textures. |
| HSV Adjust TOP | `hsvadjustTop` | `hueoffset`, `saturationmult`, `valuemult` | HSV color adjustments. |
| Lookup TOP | `lookupTop` | (input: texture + lookup table) | Color remapping via lookup table texture. |
| Edge TOP | `edgeTop` | `type` (0=Sobel, 1=Frei-Chen) | Edge detection. |
| Displace TOP | `displaceTop` | `scalex/y` | Pixel displacement using a second input as displacement map. |
| Flip TOP | `flipTop` | `flipx`, `flipy`, `flop` (diagonal) | Mirror/flip textures. |
| Crop TOP | `cropTop` | `cropleft/right/top/bottom` | Crop region of texture. |
| Resolution TOP | `resolutionTop` | `resolutionw/h`, `outputresolution` | Resize textures. |
| Null TOP | `nullTop` | (none significant) | Pass-through. Use for organization, referencing, feedback delay. |
| Cache TOP | `cacheTop` | `length`, `step` | Store N frames of history. Useful for trails, time effects. |
### Compositors (combine multiple inputs)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Composite TOP | `compositeTop` | `operand` (0-31: Over, Add, Multiply, Screen, etc.) | Blend two textures with standard compositing modes. |
| Over TOP | `overTop` | (simple alpha compositing) | Layer with alpha. Simpler than Composite. |
| Add TOP | `addTop` | (additive blend) | Additive blending. Great for glow, light effects. |
| Multiply TOP | `multiplyTop` | (multiplicative blend) | Multiply blend. Good for masking, darkening. |
| Switch TOP | `switchTop` | `index` (0-based) | Switch between multiple inputs by index. |
| Cross TOP | `crossTop` | `cross` (0.0-1.0) | Crossfade between two inputs. |
### I/O (input/output)
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Movie File In TOP | `moviefileinTop` | `file`, `speed`, `trim`, `index` | Load video files, image sequences. |
| Movie File Out TOP | `moviefileoutTop` | `file`, `type` (codec), `record` (toggle) | Record/export video files. |
| NDI In TOP | `ndiinTop` | `sourcename` | Receive NDI video streams. |
| NDI Out TOP | `ndioutTop` | `sourcename` | Send NDI video streams. |
| Syphon Spout In/Out TOP | `syphonspoutinTop` / `syphonspoutoutTop` | `servername` | Inter-app texture sharing. |
| Video Device In TOP | `videodeviceinTop` | `device` | Webcam/capture card input. |
| Feedback TOP | `feedbackTop` | `top` (path to the TOP to feed back) | One-frame delay feedback. Essential for recursive effects. |
### Converters
| Operator | Type Name | Direction | Use |
|----------|-----------|-----------|-----|
| CHOP to TOP | `choptopTop` | CHOP -> TOP | Visualize channel data as texture (waveform, spectrum display). |
| TOP to CHOP | `topchopChop` | TOP -> CHOP | Sample texture pixels as channel data. |
## CHOPs — Channel Operators (Green)
Time-varying numeric data: audio, animation curves, sensor data, control signals.
### Generators
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Constant CHOP | `constantChop` | `name0/value0`, `name1/value1`... | Static named channels. Control panel for parameters. |
| LFO CHOP | `lfoChop` | `frequency`, `type` (0=Sin, 1=Tri, 2=Square, 3=Ramp, 4=Pulse), `amp`, `offset`, `phase` | Low frequency oscillator. Animation driver. |
| Noise CHOP | `noiseChop` | `type`, `roughness`, `period`, `amp`, `seed`, `channels` | Smooth random motion. Organic animation. |
| Pattern CHOP | `patternChop` | `type` (0=Sine, 1=Triangle, ...), `length`, `cycles` | Generate waveform patterns. |
| Timer CHOP | `timerChop` | `length`, `play`, `cue`, `cycles` | Countdown/count-up timer with cue points. |
| Count CHOP | `countChop` | `threshold`, `limittype`, `limitmin/max` | Event counter with wrapping/clamping. |
### Audio
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Audio File In CHOP | `audiofileinChop` | `file`, `volume`, `play`, `speed`, `trim` | Play audio files. |
| Audio Device In CHOP | `audiodeviceinChop` | `device`, `channels` | Live microphone/line input. |
| Audio Spectrum CHOP | `audiospectrumChop` | `size` (FFT size), `outputformat` (0=Power, 1=Magnitude) | FFT frequency analysis. |
| Audio Band EQ CHOP | `audiobandeqChop` | `bands`, `gaindb` per band | Frequency band isolation. |
| Audio Device Out CHOP | `audiodeviceoutChop` | `device` | Audio playback output. |
### Math/Logic
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Math CHOP | `mathChop` | `preoff`, `gain`, `postoff`, `chanop` (0=Off, 1=Add, 2=Subtract, 3=Multiply...) | Math operations on channels. The Swiss army knife. |
| Logic CHOP | `logicChop` | `preop` (0=Off, 1=AND, 2=OR, 3=XOR, 4=NAND), `convert` | Boolean logic on channels. |
| Filter CHOP | `filterChop` | `type` (0=Low Pass, 1=Band Pass, 2=High Pass, 3=Notch), `cutofffreq`, `filterwidth` | Smooth, dampen, filter signals. |
| Lag CHOP | `lagChop` | `lag1/2`, `overshoot1/2` | Smooth transitions with overshoot. |
| Limit CHOP | `limitChop` | `type` (0=Clamp, 1=Loop, 2=ZigZag), `min/max` | Clamp or wrap channel values. |
| Speed CHOP | `speedChop` | (none significant) | Integrate values (velocity to position, acceleration to velocity). |
| Trigger CHOP | `triggerChop` | `attack`, `peak`, `decay`, `sustain`, `release` | ADSR envelope from trigger events. |
| Select CHOP | `selectChop` | `chop` (path), `channames` | Reference channels from another CHOP. |
| Merge CHOP | `mergeChop` | `align` (0=Extend, 1=Trim to First, 2=Trim to Shortest) | Combine channels from multiple CHOPs. |
| Null CHOP | `nullChop` | (none significant) | Pass-through for organization and referencing. |
### Input Devices
| Operator | Type Name | Use |
|----------|-----------|-----|
| Mouse In CHOP | `mouseinChop` | Mouse position, buttons, wheel. |
| Keyboard In CHOP | `keyboardinChop` | Keyboard key states. |
| MIDI In CHOP | `midiinChop` | MIDI note/CC input. |
| OSC In CHOP | `oscinChop` | OSC message input (network). |
## SOPs — Surface Operators (Blue)
3D geometry: points, polygons, NURBS, meshes.
### Generators
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Grid SOP | `gridSop` | `rows`, `cols`, `sizex/y`, `type` (0=Polygon, 1=Mesh, 2=NURBS) | Flat grid mesh. Foundation for displacement, instancing. |
| Sphere SOP | `sphereSop` | `type`, `rows`, `cols`, `radius` | Sphere geometry. |
| Box SOP | `boxSop` | `sizex/y/z` | Box geometry. |
| Torus SOP | `torusSop` | `radiusx/y`, `rows`, `cols` | Donut shape. |
| Circle SOP | `circleSop` | `type`, `radius`, `divs` | Circle/ring geometry. |
| Line SOP | `lineSop` | `dist`, `points` | Line segments. |
| Text SOP | `textSop` | `text`, `fontsizex`, `fontfile`, `extrude` | 3D text geometry. |
### Modifiers
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Transform SOP | `transformSop` | `tx/ty/tz`, `rx/ry/rz`, `sx/sy/sz` | Transform geometry (translate, rotate, scale). |
| Noise SOP | `noiseSop` | `type`, `amp`, `period`, `roughness` | Deform geometry with noise. |
| Sort SOP | `sortSop` | `ptsort`, `primsort` | Reorder points/primitives. |
| Facet SOP | `facetSop` | `unique`, `consolidate`, `computenormals` | Normals, consolidation, unique points. |
| Merge SOP | `mergeSop` | (none significant) | Combine multiple geometry inputs. |
| Null SOP | `nullSop` | (none significant) | Pass-through. |
## DATs — Data Operators (White)
Text, tables, scripts, network data.
### Core
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Table DAT | `tableDat` | (edit content directly) | Spreadsheet-like data tables. |
| Text DAT | `textDat` | (edit content directly) | Arbitrary text content. Shader code, configs, scripts. |
| Script DAT | `scriptDat` | `language` (0=Python, 1=C++) | Custom callbacks and DAT processing. |
| CHOP Execute DAT | `chopexecDat` | `chop` (path to watch), callbacks | Trigger Python on CHOP value changes. |
| DAT Execute DAT | `datexecDat` | `dat` (path to watch) | Trigger Python on DAT content changes. |
| Panel Execute DAT | `panelexecDat` | `panel` | Trigger Python on UI panel events. |
### I/O
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Web DAT | `webDat` | `url`, `fetchmethod` (0=GET, 1=POST) | HTTP requests. API integration. |
| TCP/IP DAT | `tcpipDat` | `address`, `port`, `mode` | TCP networking. |
| OSC In DAT | `oscinDat` | `port` | Receive OSC as text messages. |
| Serial DAT | `serialDat` | `port`, `baudrate` | Serial port communication (Arduino, etc.). |
| File In DAT | `fileinDat` | `file` | Read text files. |
| File Out DAT | `fileoutDat` | `file`, `write` | Write text files. |
### Conversions
| Operator | Type Name | Direction | Use |
|----------|-----------|-----------|-----|
| DAT to CHOP | `dattochopChop` | DAT -> CHOP | Convert table data to channels. |
| CHOP to DAT | `choptodatDat` | CHOP -> DAT | Convert channel data to table rows. |
| SOP to DAT | `soptodatDat` | SOP -> DAT | Extract geometry data as table. |
## MATs — Material Operators (Yellow)
Materials for 3D rendering in Render TOP / Geometry COMP.
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Phong MAT | `phongMat` | `diff_colorr/g/b`, `spec_colorr/g/b`, `shininess`, `colormap`, `normalmap` | Classic Phong shading. Simple, fast. |
| PBR MAT | `pbrMat` | `basecolorr/g/b`, `metallic`, `roughness`, `normalmap`, `emitcolorr/g/b` | Physically-based rendering. Realistic materials. |
| GLSL MAT | `glslMat` | `dat` (shader DAT), custom uniforms | Custom vertex + fragment shaders for 3D. |
| Constant MAT | `constMat` | `colorr/g/b`, `colormap` | Flat unlit color/texture. No shading. |
| Point Sprite MAT | `pointspriteMat` | `colormap`, `scale` | Render points as camera-facing sprites. Great for particles. |
| Wireframe MAT | `wireframeMat` | `colorr/g/b`, `width` | Wireframe rendering. |
| Depth MAT | `depthMat` | `near`, `far` | Render depth buffer as grayscale. |
## COMPs — Component Operators (Gray)
Containers, 3D scene elements, UI components.
### 3D Scene
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Geometry COMP | `geometryComp` | `material` (path), `instancechop` (path), `instancing` (toggle) | Renders geometry with material. Instancing host. |
| Camera COMP | `cameraComp` | `tx/ty/tz`, `rx/ry/rz`, `fov`, `near/far` | Camera for Render TOP. |
| Light COMP | `lightComp` | `lighttype` (0=Point, 1=Directional, 2=Spot, 3=Cone), `dimmer`, `colorr/g/b` | Lighting for 3D scenes. |
| Ambient Light COMP | `ambientlightComp` | `dimmer`, `colorr/g/b` | Ambient lighting. |
| Environment Light COMP | `envlightComp` | `envmap` | Image-based lighting (IBL). |
### Containers
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Container COMP | `containerComp` | `w`, `h`, `bgcolor1/2/3` | UI container. Holds other COMPs for panel layouts. |
| Base COMP | `baseComp` | (none significant) | Generic container. Networks-inside-networks. |
| Replicator COMP | `replicatorComp` | `template`, `operatorsdat` | Clone a template operator N times from a table. |
### Utilities
| Operator | Type Name | Key Parameters | Use |
|----------|-----------|---------------|-----|
| Window COMP | `windowComp` | `winw/h`, `winoffsetx/y`, `monitor`, `borders` | Output window for display/projection. |
| Select COMP | `selectComp` | `rowcol`, `panel` | Select and display content from elsewhere. |
| Engine COMP | `engineComp` | `tox`, `externaltox` | Load external .tox components. Sub-process isolation. |
## Cross-Family Converter Summary
| From | To | Operator | Type Name |
|------|-----|----------|-----------|
| CHOP | TOP | CHOP to TOP | `choptopTop` |
| TOP | CHOP | TOP to CHOP | `topchopChop` |
| DAT | CHOP | DAT to CHOP | `dattochopChop` |
| CHOP | DAT | CHOP to DAT | `choptodatDat` |
| SOP | CHOP | SOP to CHOP | `soptochopChop` |
| CHOP | SOP | CHOP to SOP | `choptosopSop` |
| SOP | DAT | SOP to DAT | `soptodatDat` |
| DAT | SOP | DAT to SOP | `dattosopSop` |
| SOP | TOP | (use Render TOP + Geometry COMP) | — |
| TOP | SOP | TOP to SOP | `toptosopSop` |
@@ -0,0 +1,281 @@
# Panel & UI Reference
Interactive control surfaces inside TouchDesigner — buttons, sliders, fields, custom parameter pages, panel callbacks. For HUD overlays (rendered text on visuals) see `layout-compositor.md`.
Use cases:
- VJ control rack (master fader, scene buttons, FX toggles)
- Installation operator console
- Self-contained TOX components with their own parameter UIs
- Phone-style touch interfaces displayed on a tablet
---
## Two Layers of UI
| Layer | What it is | Use for |
|---|---|---|
| **Custom Parameters** | Params on any COMP, edited like built-in TD params | Configurable components, presets, "settings" panels |
| **Panel COMPs** | Visible widgets (button, slider, field) inside a containerCOMP | Interactive control surfaces, real-time UIs |
Combine both: build a containerCOMP with panel widgets that read/write custom parameters on a parent component.
---
## Custom Parameters
Add user-editable params to any COMP. Params persist with the COMP, drive expressions, and survive save/reload.
```python
# Add a custom page to a baseCOMP
comp = op('/project1/my_component')
page = comp.appendCustomPage('Controls')
# Add typed params
page.appendFloat('Intensity', label='Intensity')[0] # returns a Par
page.appendInt('Count', label='Count')[0]
page.appendToggle('Enabled', label='Enabled')[0]
page.appendMenu('Mode', menuNames=['off', 'soft', 'hard'], menuLabels=['Off', 'Soft', 'Hard'])[0]
page.appendStr('Title', label='Title')[0]
page.appendRGB('Color', label='Color') # returns 3 pars
page.appendXY('Offset', label='Offset') # returns 2 pars
page.appendPulse('Reset', label='Reset')[0]
page.appendFile('TextureFile', label='Texture')[0]
```
**Read/write from anywhere:**
```python
val = op('/project1/my_component').par.Intensity.eval()
op('/project1/my_component').par.Intensity = 0.7
```
**Drive other params via expression:**
```python
op('bloom1').par.threshold.mode = ParMode.EXPRESSION
op('bloom1').par.threshold.expr = "op('/project1/my_component').par.Intensity"
```
**Pulse handler (Reset button):**
Use a `parameterExecuteDAT` watching the COMP's pulse params. See `dat-scripting.md`.
---
## Panel COMPs — The Widgets
Each is a COMP that renders as a clickable/draggable widget inside a `containerCOMP`.
| Type | Type Name | Use |
|---|---|---|
| Button | `buttonCOMP` | Click action — momentary or toggle |
| Slider | `sliderCOMP` | Drag to set 0-1 value (1D or 2D) |
| Field | `fieldCOMP` | Text input |
| Container | `containerCOMP` | Layout + visual styling, holds children |
| Select | `selectCOMP` | Reference and display content from another COMP |
| List | `listCOMP` | Scrollable list with row callbacks |
### Button
```python
btn = root.create(buttonCOMP, 'play_btn')
btn.par.w = 120; btn.par.h = 40
btn.par.buttontype = 'momentary' # 'momentary' | 'toggleup' | 'togglepress' | 'radio'
btn.par.bgcolorr = 0.1; btn.par.bgcolorg = 0.1; btn.par.bgcolorb = 0.1
btn.par.text = 'Play'
# Read state
state = btn.panel.state # 1 when active
```
### Slider
```python
sld = root.create(sliderCOMP, 'master_fader')
sld.par.w = 60; sld.par.h = 300
sld.par.style = 'vertical' # 'vertical' | 'horizontal' | 'xy'
sld.par.value0min = 0.0
sld.par.value0max = 1.0
# Drive a parameter via expression (always-on, no callback needed)
op('/project1/master_level').par.opacity.mode = ParMode.EXPRESSION
op('/project1/master_level').par.opacity.expr = "op('master_fader').panel.u"
```
`panel.u` and `panel.v` give the 0-1 normalized values. For 2D sliders both are populated.
### Field (Text Input)
```python
fld = root.create(fieldCOMP, 'scene_name')
fld.par.w = 200; fld.par.h = 30
fld.par.fieldtype = 'string' # 'string' | 'integer' | 'float'
# Read current text
text = fld.panel.field # the text content
```
### List
For scrollable lists with selectable rows, use the docked `list1_callbacks` DAT to handle row interactions. Set up cells via the `list_definition` table DAT.
---
## Container COMP — Layout & Styling
`containerCOMP` is the primary parent for grouping widgets and arranging layouts.
```python
panel = root.create(containerCOMP, 'control_panel')
panel.par.w = 400; panel.par.h = 600
panel.par.bgcolorr = 0.05
panel.par.bgcolorg = 0.05
panel.par.bgcolorb = 0.05
panel.par.bgalpha = 1.0
# Layout child panels in vertical stack
panel.par.align = 'lefttoright' # 'lefttoright' | 'toptobottom' | etc.
```
Children are positioned automatically based on `par.align`. For absolute positioning use `par.align = 'fillresize'` and set each child's `par.x` / `par.y`.
### Layout Strategies
| `par.align` | Behavior |
|---|---|
| `lefttoright` | Children stacked horizontally |
| `toptobottom` | Children stacked vertically |
| `righttoleft` / `bottomtotop` | Reversed stacks |
| `fillresize` | Children sized to fill, manual positioning |
| `top` / `bottom` / `left` / `right` | Fixed positioning |
For complex grids: nest containers — vertical container holding horizontal containers.
---
## Panel Callbacks — Reacting to Events
`panelExecuteDAT` watches a panel and fires Python callbacks on user interaction.
```python
pe = root.create(panelExecuteDAT, 'btn_handler')
pe.par.panel = '/project1/play_btn'
pe.par.click = True # respond to clicks
pe.par.value = True # respond to value changes
```
In its docked DAT:
```python
def onOffToOn(panelValue):
# Click pressed
op('/project1/scene_timer').par.start.pulse()
return
def onOnToOff(panelValue):
# Click released
return
def onValueChange(panelValue):
# Slider drag, field change, etc.
new_val = panelValue.eval()
op('/project1/master').par.opacity = new_val
return
```
For pulse params on custom-parameter pages, use a `parameterExecuteDAT` instead.
---
## Building a Complete VJ Control Panel
End-to-end pattern:
```python
# 1. Top-level container
panel = root.create(containerCOMP, 'vj_control')
panel.par.w = 800; panel.par.h = 200
panel.par.align = 'lefttoright'
# 2. Master fader column
master_col = panel.create(containerCOMP, 'master')
master_col.par.w = 120; master_col.par.h = 200
master_col.par.align = 'toptobottom'
master_label = master_col.create(textTOP, 'lbl')
master_label.par.text = 'MASTER'
master_sld = master_col.create(sliderCOMP, 'fader')
master_sld.par.w = 60; master_sld.par.h = 150
master_sld.par.style = 'vertical'
# 3. Scene buttons row
scene_col = panel.create(containerCOMP, 'scenes')
scene_col.par.w = 400; scene_col.par.h = 200
scene_col.par.align = 'lefttoright'
for i in range(8):
b = scene_col.create(buttonCOMP, f'scene_{i+1}')
b.par.w = 50; b.par.h = 50
b.par.text = str(i+1)
b.par.buttontype = 'radio' # only one active at a time
# 4. FX toggle column
fx_col = panel.create(containerCOMP, 'fx')
fx_col.par.w = 280; fx_col.par.h = 200
fx_col.par.align = 'toptobottom'
for fx in ['Bloom', 'CRT', 'Glitch', 'Strobe']:
t = fx_col.create(buttonCOMP, fx.lower())
t.par.w = 220; t.par.h = 35
t.par.text = fx
t.par.buttontype = 'toggleup'
# 5. Display in a window
win = root.create(windowCOMP, 'control_win')
win.par.winop = panel.path
win.par.winw = 800; win.par.winh = 200
win.par.borders = True
win.par.winopen.pulse()
```
Then wire panel values to ops via expressions or panelExecuteDATs.
---
## Showing the Panel — Window or Embedded
| Approach | When |
|---|---|
| `windowCOMP` pointing at panel | Standalone control surface, separate display |
| Render the containerCOMP via `renderTOP` | Composite UI over visuals (HUD-style) |
| Use a `panelCOMP` directly inside a network editor pane | Designer/dev preview only — panel is fully interactive |
For a touch-screen tablet, use a `windowCOMP` on a second display routed to the tablet's HDMI input.
---
## Pitfalls
1. **Panel won't respond to clicks** — likely `par.disabled = True` or the parent container has `par.disableinputs = True`. Check the panel hierarchy.
2. **Slider value not updating**`panel.u/v` reads the visual position. If you set `par.value0` directly, the visual lags. Use `par.value0` AS the source of truth and let the slider follow.
3. **Custom param won't appear** — must call `appendCustomPage` first, then append params. Pages with no params don't show.
4. **Custom param disappears on reload** — params added via Python at runtime persist only if the COMP is saved AFTER. Use a `tox` save (`comp.save('mycomp.tox')`) or commit via `td_execute_python` then save the project.
5. **Event callback fires twice** — both `onOffToOn` and `onValueChange` may fire on a single button press. Pick one to handle the action; don't double-trigger.
6. **Pulse params need `.pulse()`** — setting `par.X = True` on a pulse param does nothing. Always use `.pulse()`.
7. **Field text doesn't commit until Tab/Enter** — fields don't fire callbacks while typing. Use `par.committemode = 'all'` to fire on every keystroke (heavy).
8. **`par.text` vs panel content** — `buttonCOMP.par.text` is the LABEL on the button. The button's STATE is `panel.state` (0/1). Don't confuse them.
9. **Touch input on macOS** — multi-touch via direct touch panels works but TD's gesture handling is rudimentary. For complex multi-touch (pinch/rotate), use TouchOSC on a tablet instead.
10. **Layout doesn't update** — changing `par.align` requires the container to re-cook. Touch a child or pulse the container to trigger.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| Master fader | `sliderCOMP` (vertical) → expression on `level.par.opacity` |
| Scene picker | 8 `buttonCOMP` (radio) → `selectCHOP` on their state → drive `switchTOP.par.index` |
| FX toggle | `buttonCOMP` (toggleup) → expression on `bypass` of an FX op |
| Numeric input | `fieldCOMP` (float) → expression on target par |
| Component settings | Custom params on the component COMP, panel widgets inside drive them |
| Touch tablet UI | `containerCOMP` with widgets → `windowCOMP` to second display |
| Status display | `textTOP` rendered into the panel via `selectCOMP` |
@@ -0,0 +1,245 @@
# Particles Reference
Particle systems in TouchDesigner — modern POPs (Particle Operators) and the legacy particleSOP path.
For instancing static geometry (without per-instance lifetime/velocity), see `geometry-comp.md`. For GLSL-driven feedback simulations (no particle abstraction), see `operator-tips.md` (Feedback TOP section).
Always call `td_get_par_info` for the op type before setting params. Param names below reflect TD 2025.32 — verify before relying on them.
---
## Two Paths: POPs vs. SOPs
| | **POP family** (modern) | **particleSOP** (legacy) |
|---|---|---|
| GPU? | Yes (compute) | No (CPU) |
| Particle count | 100k+ comfortably | ~5k before slowdown |
| API style | Source / Force / Solver / Render chain | Single op with many params |
| Use for | New projects, anything intensive | Quick demos, low counts, TD < 2023 |
**Default to POPs.** Only fall back to particleSOP if a POP variant of an op you need doesn't exist.
---
## POP Pipeline Overview
A POP system is a chain of operators inside a `geometryCOMP`:
```
popSourceTOP / popSourceSOP ← spawn new particles
popForceTOP (gravity, wind, etc.)
popForceTOP (attractor, vortex, ...)
popDeleteTOP (lifetime, bounds)
popSolverTOP ← integrates velocity, updates positions
[render via geometryCOMP / glslMAT instancing]
```
POP buffers carry standard channels: `P` (position), `v` (velocity), `life`, `id`, `Cd` (color), plus any custom channels you add.
---
## Minimal POP Setup
```python
# Create a geometry COMP to hold the POP network
geo = root.create(geometryCOMP, 'particles_geo')
# 1. Source — emit particles from a point
src = geo.create(popSourceTOP, 'src')
src.par.birthrate = 500 # per second
src.par.life = 4.0 # seconds
# 2. Gravity force
grav = geo.create(popForceTOP, 'gravity')
grav.par.forcetype = 'gravity'
grav.par.fy = -9.8
# 3. Lifetime cleanup
delp = geo.create(popDeleteTOP, 'cull')
delp.par.condition = 'lifeleq' # delete when life <= 0
delp.par.value = 0
# 4. Solver
solv = geo.create(popSolverTOP, 'solver')
solv.par.timestep = 'frame'
# Wire: source → force → delete → solver
src.outputConnectors[0].connect(grav.inputConnectors[0])
grav.outputConnectors[0].connect(delp.inputConnectors[0])
delp.outputConnectors[0].connect(solv.inputConnectors[0])
```
The `popSolverTOP` output IS the live particle buffer. Render it via `glslMAT` instancing on a small SOP (sphere, point) as the "shape" of each particle.
---
## Common Forces
| Force type | Effect | Common params |
|---|---|---|
| `gravity` | Constant directional pull | `fx`, `fy`, `fz` |
| `wind` | Constant velocity addition | `wx`, `wy`, `wz` |
| `drag` | Velocity damping over time | `dragstrength` |
| `noise` | Curl-noise turbulence | `noiseamp`, `noisefreq`, `noiseseed` |
| `attractor` | Pull toward a point | `position`, `strength`, `falloff` |
| `vortex` | Swirl around an axis | `axis`, `strength` |
| `point` (custom) | GLSL-evaluated arbitrary force | via `popforceadvancedTOP` |
Stack multiple `popForceTOP`s in series — each modifies velocity additively.
---
## Lifecycle Patterns
### Continuous emission (e.g. smoke plume)
```python
src.par.birthrate = 800
src.par.life = 6.0 # variance via 'lifevariance'
src.par.lifevariance = 1.5
```
### Burst emission (e.g. explosion)
```python
src.par.birthrate = 0 # no continuous emission
src.par.burst.pulse() # one burst on demand (verify param name)
src.par.burstcount = 5000
src.par.life = 1.5
```
### Beat-triggered burst
Wire a `triggerCHOP` (from audio or MIDI) to pulse the burst:
```python
op('/project1/audio_kick_trigger').outputConnectors[0].connect(...)
# Then via a chopExecuteDAT, on each kick:
def offToOn(channel, sampleIndex, val, prev):
op('/project1/particles_geo/src').par.burst.pulse()
return
```
---
## Rendering Particles
### Point Sprites (simplest)
```python
# Inside the geometryCOMP, render the solver output directly
# The geo's first SOP child becomes the geometry
# But for POPs, we typically render via glslMAT on a small "shape"
# Simple billboard sphere per particle:
shape = geo.create(sphereSOP, 'shape')
shape.par.rad = 0.05
shape.par.rows = 6; shape.par.cols = 6 # low-poly to keep it fast
# Material that uses POP buffer for instancing
mat = root.create(glslMAT, 'particle_mat')
# Configure mat.par.instancingTOP = solver output (verify param name)
```
The exact instancing setup varies by TD version — call `td_get_hints(topic='popInstancing')` (or `popRender` / `instancing` — try a few).
### GPU Sprites via glslcopyPOP
For dense smoke/fire-like effects, use a `glslcopyPOP` that writes per-particle color/size from a compute shader, then render as point sprites with additive blending in a `renderTOP`.
---
## Collisions
```python
# Collision detection against an SOP
coll = geo.create(popCollideTOP, 'ground_coll')
coll.par.collidewithsop = '/project1/ground_geo' # path to colliding SOP
coll.par.bounce = 0.3
coll.par.friction = 0.1
# Insert between force and solver
```
For plane/box collisions only, use `popPlaneCollideTOP` (cheaper).
---
## Custom Per-Particle Data
Add a custom channel via `popAttribCreateTOP` (or by writing through `glslcopyPOP`):
```python
# Add a "phase" attribute initialized random per-particle, used in render shader
attr = geo.create(popAttribCreateTOP, 'add_phase')
attr.par.attribname = 'phase'
attr.par.value0 = 'rand(@id)' # expression in TD's POP attribute language
```
Then in the render shader, `texture(sTDPOPInputs[0].phase, ...)` (or whichever sampler convention your TD version uses — verify with `td_get_docs(topic='pops')`).
---
## Legacy particleSOP (Use Sparingly)
For quick demos or low-count systems:
```python
# Inside a geo
psrc = geo.create(addSOP, 'point_src') # source: a single point
psrc.par.points = '0 0 0'
part = geo.create(particleSOP, 'particles')
part.par.life = 3.0
part.par.birthrate = 100
part.par.gravityy = -9.8
part.par.windx = 0.5
part.inputConnectors[0].connect(psrc)
```
CPU-bound. Beyond ~5,000 active particles you'll see frame drops.
---
## Pitfalls
1. **Particles don't appear** — usually a render-side issue. Check via `td_get_screenshot` on the solver output (renders the buffer as a TOP-like view in newer TD). Then check the `geometryCOMP`'s render path.
2. **Burst won't fire** — verify the `burst` param is a pulse, not a toggle. Pulses must use `.pulse()`, not `= True`.
3. **Particles teleport on first frame** — uninitialized velocity. Set `popSourceTOP.par.initialvelocityX/Y/Z` or zero them explicitly.
4. **Gravity feels wrong** — TD's "1 unit" depends on your scene scale. Start with `fy = -1.0` and scale up rather than using real-world 9.8.
5. **High birthrate = stuttering** — birthrate is per-second, not per-frame. At 60fps, `birthrate = 6000` is 100/frame which is fine; `birthrate = 600000` will tank.
6. **POP solver order matters** — forces apply in the order they appear in the chain. Putting gravity AFTER drag dampens gravity itself; usually not what you want.
7. **Instancing param name varies**`mat.par.instancingTOP` vs. `mat.par.instanceop` vs. `mat.par.instances` differs across TD versions. Always check `td_get_par_info(op_type='glslMAT')`.
8. **Cooking dependency loops** — POP solvers create implicit time-loops. The "cook dependency loop" warning is expected and harmless for POPs.
9. **CHOP-driven force values** — when a force param is expression-bound to a CHOP (e.g., audio-reactive gravity), make sure the CHOP cooks before the solver. If not, force lags by one frame.
---
## Performance Targets
| Particle count | Setup | Frame budget @ 60fps |
|---|---|---|
| < 1k | particleSOP fine | trivial |
| 1k - 10k | POPs, simple forces | ~2-5ms |
| 10k - 100k | POPs, GPU-only forces | ~5-15ms |
| 100k+ | `glslcopyPOP`, custom compute | ~10-25ms |
| 1M+ | Custom GPU buffer, no POP framework | depends on shader |
Use `td_get_perf` to find which op in the POP chain is the bottleneck.
---
## Quick Recipes
| Goal | Pipeline |
|---|---|
| Smoke plume | `popSourceTOP` (point) → gravity + wind + noise → `popDeleteTOP` (life) → solver → glslMAT instancing |
| Beat-triggered burst | `triggerCHOP` (audio) → chopExecuteDAT pulses `popSourceTOP.par.burst` |
| Fireworks shell | Burst at point → drag + gravity → secondary burst on lifetime threshold |
| Snow/rain | Continuous emission across XZ plane (high y), gravity + small wind, infinite life box-deleted |
| Sparks | Burst, very short life (0.3s), bright additive render, motion blur via feedback |
| Audio particles | Birthrate driven by audio envelope, color driven by frequency band |
@@ -0,0 +1,704 @@
# TouchDesigner MCP — Pitfalls & Lessons Learned
Hard-won knowledge from real TD sessions. Read this before building anything.
## Parameter Names
### 1. NEVER hardcode parameter names — always discover
Parameter names change between TD versions. What works in one build may not work in another. ALWAYS use td_get_par_info to discover actual names from TD.
The agent's LLM training data contains WRONG parameter names. Do not trust them.
Known historical differences (may vary further — always verify):
| What docs/training say | Actual in some versions | Notes |
|---------------|---------------|-------|
| `dat` | `pixeldat` | GLSL TOP pixel shader DAT |
| `colora` | `alpha` | Constant TOP alpha |
| `sizex` / `sizey` | `size` | Blur TOP (single value) |
| `fontr/g/b/a` | `fontcolorr/g/b/a` | Text TOP font color (r/g/b) |
| `fontcolora` | `fontalpha` | Text TOP font alpha (NOT `fontcolora`) |
| `bgcolora` | `bgalpha` | Text TOP bg alpha |
| `value1name` | `vec0name` | GLSL TOP uniform name |
### 2. twozero td_execute_python response format
When calling `td_execute_python` via twozero MCP, successful responses return `(ok)` followed by FPS/error summary (e.g. `[fps 60.0/60] [0 err/0 warn]`), NOT the raw Python `result` dict. If you're parsing responses programmatically, check for the `(ok)` prefix — don't pattern-match on Python variable names from the script. Use `td_get_operator_info` or separate inspection calls to read back values.
### 3. When using td_set_operator_pars, param names must match exactly
Use td_get_par_info to discover them. The MCP tool validates parameter names and returns clear errors explaining what went wrong, unlike raw Python which crashes the whole script with tdAttributeError and stops execution. Always discover before setting.
### 4. Use `safe_par()` pattern for cross-version compatibility
```python
def safe_par(node, name, value):
p = getattr(node.par, name, None)
if p is not None:
p.val = value
return True
return False
```
### 5. `td.tdAttributeError` crashes the whole script — use defensive access
If you do `node.par.nonexistent = value`, TD raises `tdAttributeError` and stops the entire script. Prevention is better than catching:
- Use `op()` instead of `opex()``op()` returns None on failure, `opex()` raises
- Use `hasattr(node.par, 'name')` before accessing any parameter
- Use `getattr(node.par, 'name', None)` with a default
- Use the `safe_par()` pattern from pitfall #3
```python
# WRONG — crashes if param doesn't exist:
node.par.nonexistent = value
# CORRECT — defensive access:
if hasattr(node.par, 'nonexistent'):
node.par.nonexistent = value
```
### 6. `outputresolution` is a string menu, not an integer
```
menuNames: ['useinput','eighth','quarter','half','2x','4x','8x','fit','limit','custom','parpanel']
```
Always use the string form. Setting `outputresolution = 9` may silently fail.
```python
node.par.outputresolution = 'custom' # correct
node.par.resolutionw = 1280; node.par.resolutionh = 720
```
Discover valid values: `list(node.par.outputresolution.menuNames)`
## GLSL Shaders
### 7. `uTDCurrentTime` does NOT exist in GLSL TOP
There is NO built-in time uniform for GLSL TOPs. GLSL MAT has `uTDGeneral.seconds` but that's NOT available in GLSL TOP context.
**PRIMARY — GLSL TOP Vectors/Values page:**
```python
gl.par.value0name = 'uTime'
gl.par.value0.expr = "absTime.seconds"
# In GLSL: uniform float uTime;
```
**FALLBACK — Constant TOP texture (for complex time data):**
CRITICAL: set format to `rgba32float` — default 8-bit clamps to 0-1:
```python
t = root.create(constantTOP, 'time_driver')
t.par.format = 'rgba32float'
t.par.outputresolution = 'custom'
t.par.resolutionw = 1; t.par.resolutionh = 1
t.par.colorr.expr = "absTime.seconds % 1000.0"
t.outputConnectors[0].connect(glsl.inputConnectors[0])
```
### 8. GLSL compile errors are silent in the API
The GLSL TOP shows a yellow warning triangle in the UI but `node.errors()` may return empty string. Check `node.warnings()` too, and create an Info DAT pointed at the GLSL TOP to read the actual compiler output.
### 9. TD GLSL uses `vUV.st` not `gl_FragCoord` — and REQUIRES `TDOutputSwizzle()` on macOS
Standard GLSL patterns don't work. TD provides:
- `vUV.st` — UV coordinates (0-1)
- `uTDOutputInfo.res.zw` — resolution
- `sTD2DInputs[0]` — input textures
- `layout(location = 0) out vec4 fragColor` — output
CRITICAL on macOS: Always wrap output with `TDOutputSwizzle()`:
```glsl
fragColor = TDOutputSwizzle(color);
```
TD uses GLSL 4.60 (Vulkan backend). GLSL 3.30 and earlier removed.
### 10. Large GLSL shaders — write to temp file
GLSL code with special characters can corrupt JSON payloads. Write the shader to a temp file and load it in TD:
```python
# Agent side: write shader to /tmp/shader.glsl via write_file
# TD side:
sd = root.create(textDAT, 'shader_code')
with open('/tmp/shader.glsl', 'r') as f:
sd.text = f.read()
```
## Node Management
### 11. Destroying nodes while iterating `root.children` causes `tdError`
The iterator is invalidated when a child is destroyed. Always snapshot first:
```python
kids = list(root.children) # snapshot
for child in kids:
if child.valid: # check — earlier destroys may cascade
child.destroy()
```
### 11b. Split cleanup and creation into SEPARATE td_execute_python calls
Creating nodes with the same names you just destroyed in the SAME script causes "Invalid OP object" errors — even with `list()` snapshot. TD's internal references can go stale within one execution context.
**WRONG (single call):**
```python
# td_execute_python:
for c in list(root.children):
if c.valid and c.name.startswith('my_'):
c.destroy()
# ... then create my_audio, my_shader etc. in same script → CRASHES
```
**CORRECT (two separate calls):**
```python
# Call 1: td_execute_python — clean only
for c in list(root.children):
if c.valid and c.name.startswith('my_'):
c.destroy()
# Call 2: td_execute_python — build (separate MCP call)
audio = root.create(audiofileinCHOP, 'my_audio')
# ... rest of build
```
### 12. Feedback TOP: use `top` parameter, NOT direct input wire
The feedbackTOP's `top` parameter references which TOP to delay. Do NOT also wire that TOP directly into the feedback's input — this creates a real cook dependency loop.
Correct setup:
```python
fb = root.create(feedbackTOP, 'fb_delay')
fb.par.top = comp.path # reference only — no wire to fb input
fb.outputConnectors[0].connect(xf) # fb output -> transform -> fade -> comp
```
The "Cook dependency loop detected" warning on the transform/fade chain is expected.
### 13. GLSL TOP auto-creates companion nodes
Creating a `glslTOP` also creates `name_pixel` (Text DAT), `name_info` (Info DAT), and `name_compute` (Text DAT). These are visible in the network. Don't be alarmed by "extra" nodes.
### 14. The default project root is `/project1`
New TD files start with `/project1` as the main container. System nodes live at `/`, `/ui`, `/sys`, `/local`, `/perform`. Don't create user nodes outside `/project1`.
### 15. Non-Commercial license caps resolution at 1280x1280
Setting `resolutionw=1920` silently clamps to 1280. Always check effective resolution after creation:
```python
n.cook(force=True)
actual = str(n.width) + 'x' + str(n.height)
```
## Recording & Codecs
### 16. MovieFileOut TOP: H.264/H.265/AV1 requires Commercial license
In Non-Commercial TD, these codecs produce an error. Recommended alternatives:
- `prores` — Apple ProRes, **best on macOS**, HW accelerated, NOT license-restricted. ~55MB/s at 1280x720 but lossless quality. **Use this as default on macOS.**
- `cineform` — GoPro Cineform, supports alpha
- `hap` — GPU-accelerated playback, large files
- `notchlc` — GPU-accelerated, good quality
- `mjpa` — Motion JPEG, legacy fallback (lossy, use only if ProRes unavailable)
For image sequences: `rec.par.type = 'imagesequence'`, `rec.par.imagefiletype = 'png'`
### 17. MovieFileOut `.record()` method may not exist
Use the toggle parameter instead:
```python
rec.par.record = True # start recording
rec.par.record = False # stop recording
```
When setting file path and starting recording in the same script, use delayFrames:
```python
rec.par.file = '/tmp/new_output.mov'
run("op('/project1/recorder').par.record = True", delayFrames=2)
```
### 18. TOP.save() captures same frame when called rapidly
Use MovieFileOut for real-time recording. Set `project.realTime = False` for frame-accurate output.
### 19. AudioFileIn CHOP: cue and recording sequence matters
The recording sequence must be done in exact order, or the recording will be empty, audio will start mid-file, or the file won't be written.
**Proven recording sequence:**
```python
# Step 1: Stop any existing recording
rec.par.record = False
# Step 2: Reset audio to beginning
audio.par.play = False
audio.par.cue = True
audio.par.cuepoint = 0 # may need cuepointunit=0 too
# Verify: audio.par.cue.eval() should be True
# Step 3: Set output file path
rec.par.file = '/tmp/output.mov'
# Step 4: Release cue + start playing + start recording (with frame delay)
audio.par.cue = False
audio.par.play = True
audio.par.playmode = 2 # Sequential — plays once through
run("op('/project1/recorder').par.record = True", delayFrames=3)
```
**Why each step matters:**
- `rec.par.record = False` first — if a previous recording is active, setting `par.file` may fail silently
- `audio.par.cue = True` + `cuepoint = 0` — guarantees audio starts from the beginning, otherwise the spectrum may be silent for the first few seconds
- `delayFrames=3` on the record start — setting `par.file` and `par.record = True` in the same script can race; the file path needs a frame to register before recording starts
- `playmode = 2` (Sequential) — plays the file once. Use `playmode = 0` (Locked to Timeline) if you want TD's timeline to control position
## TD Python API Patterns
### 20. COMP extension setup: ext0object format is CRITICAL
`ext0object` expects a CONSTANT string (NOT expression mode):
```python
comp.par.ext0object = "op('./myExtensionDat').module.MyClassName(me)"
```
NEVER set as just the DAT name. NEVER use ParMode.EXPRESSION. ALWAYS ensure the DAT has `par.language='python'`.
### 21. td.Panel is NOT subscriptable — use attribute access
```python
comp.panel.select # correct (attribute access, returns float)
comp.panel['select'] # WRONG — 'td.Panel' object is not subscriptable
```
### 22. ALWAYS use relative paths in script callbacks
In scriptTOP/CHOP/SOP/DAT callbacks, use paths relative to `scriptOp` or `me`:
```python
root = scriptOp.parent().parent()
dat = root.op('pixel_data')
```
NEVER hardcode absolute paths like `op('/project1/myComp/child')` — they break when containers are renamed or copied.
### 23. keyboardinCHOP channel names have 'k' prefix
Channel names are `kup`, `kdown`, `kleft`, `kright`, `ka`, `kb`, etc. — NOT `up`, `down`, `a`, `b`. Always verify with:
```python
channels = [c.name for c in op('/project1/keyboard1').chans()]
```
### 24. expressCHOP cook-only properties — false positive errors
`me.inputVal`, `me.chanIndex`, `me.sampleIndex` work ONLY in cook-context. Calling `par.expr0expr.eval()` from outside always raises an error — this is NOT a real operator error. Ignore these in error scans.
### 25. td.Vertex attributes — use index access not named attributes
In TD 2025.32, `td.Vertex` objects do NOT have `.x`, `.y`, `.z` attributes:
```python
# WRONG — crashes:
vertex.x, vertex.y, vertex.z
# CORRECT — index-based:
vertex.point.P[0], vertex.point.P[1], vertex.point.P[2]
# Or for SOP point positions:
pt = sop.points()[i]
pos = pt.P # use P[0], P[1], P[2]
```
## Audio
### 26. Audio Spectrum CHOP output is weak — boost it
Raw output is very small (0.001-0.05). Use built-in boost: `spectrum.par.highfrequencyboost = 3.0`
If still weak, add Math CHOP in Range mode: `fromrangehi=0.05, torangehi=1.0`
### 27. AudioSpectrum CHOP: timeslice and sample count are the #1 gotcha
AudioSpectrum at 44100Hz with `timeslice=False` outputs the ENTIRE audio file as samples (~24000+). CHOP-to-TOP then exceeds texture resolution max and warns/fails.
**Fix:** Keep `timeslice = True` (default) for real-time per-frame FFT. Set `fftsize` to control bin count (it's a STRING enum: `'256'` not `256`).
If the CHOP-to-TOP still gets too many samples, set `layout = 'rowscropped'` on the choptoTOP.
```python
spectrum.par.fftsize = '256' # STRING, not int — enum values
spectrum.par.timeslice = True # MUST be True for real-time audio reactivity
spectex.par.layout = 'rowscropped' # handles oversized CHOP inputs
```
**resampleCHOP has NO `numsamples` param.** It uses `rate`, `start`, `end`, `method`. Don't guess — always `td_get_par_info('resampleCHOP')` first.
### 28. CHOP To TOP has NO input connectors — use par.chop reference
```python
spec_tex = root.create(choptoTOP, 'spectrum_tex')
spec_tex.par.chop = resample # correct: parameter reference
# NOT: resample.outputConnectors[0].connect(spec_tex.inputConnectors[0]) # WRONG
```
## Workflow
### 29. Always verify after building — errors are silent
Node errors and broken connections produce no output. Always check:
```python
for c in list(root.children):
e = c.errors()
w = c.warnings()
if e: print(c.name, 'ERR:', e)
if w: print(c.name, 'WARN:', w)
```
### 30. Window COMP param for display target is `winop`
```python
win = root.create(windowCOMP, 'display')
win.par.winop = '/project1/logo_out'
win.par.winw = 1280; win.par.winh = 720
win.par.winopen.pulse()
```
### 31. `sample()` returns frozen pixels in rapid calls
`out.sample(x, y)` returns pixels from a single cook snapshot. Compare samples with 2+ second delays, or use screencapture on the display window.
### 32. Audio-reactive GLSL: TD-side pipeline
For audio-synced visuals: AudioFileIn → AudioSpectrum(timeslice=True, fftsize='256') → Math(gain=5) → choptoTOP(par.chop=math, layout='rowscropped') → GLSL input. The shader samples `sTD2DInputs[1]` at different x positions for bass/mid/hi. Record the TD output with MovieFileOut.
**Key gotcha:** AudioFileIn must be cued (`par.cue=True``par.cuepulse.pulse()`) then uncued (`par.cue=False`, `par.play=True`) before recording starts. Otherwise the spectrum is silent for the first few seconds.
### 33. twozero MCP: prefer native tools
**Always prefer native MCP tools over td_execute_python:**
- `td_create_operator` over `root.create()` scripts (handles viewport positioning)
- `td_set_operator_pars` over `node.par.X = Y` scripts (validates param names)
- `td_get_par_info` over temp-node discovery dance (instant, no cleanup)
- `td_get_errors` over manual `c.errors()` loops
- `td_get_focus` for context awareness (no equivalent in old method)
Only fall back to `td_execute_python` for multi-step logic (wiring chains, conditional builds, loops).
### 34. twozero td_execute_python response wrapping
twozero wraps `td_execute_python` responses with status info: `(ok)\n\n[fps 60.0/60] [0 err/0 warn]`. Your Python `result` variable value may not appear verbatim in the response text. If you need to check results programmatically, use `print()` statements in the script — they appear in the response. Don't rely on string-matching the `result` dict.
### 35. Audio-reactive chain: DO NOT use Lag CHOP or Filter CHOP for spectrum smoothing
The Derivative docs and tutorials suggest using Lag CHOP (lag1=0.2, lag2=0.5) to smooth raw FFT output before passing to a shader. **This does NOT work with AudioSpectrum → CHOP to TOP → GLSL.**
What happens: Lag CHOP operates in timeslice mode. A 256-sample spectrum input gets expanded to 1600-2400 samples. The Lag averaging drives all values to near-zero (~1e-06). The CHOP to TOP produces a 2400x2 texture instead of 256x2. The shader receives effectively zero audio data.
**The correct chain is: Spectrum(outlength=256) → Math(gain=10) → CHOPtoTOP → GLSL.** No CHOP smoothing at all. If you need smoothing, do it in the GLSL shader via temporal lerp with a feedback texture.
Verified values with audio playing:
- Without Lag CHOP: bass bins = 5.0-5.4, mid bins = 1.0-1.7 (strong, usable)
- With Lag CHOP: ALL bins = 0.000001-0.00004 (dead, zero audio reactivity)
### 36. AudioSpectrum Output Length: set manually to avoid CHOP to TOP overflow
AudioSpectrum in Visualization mode with FFT 8192 outputs 22,050 samples by default (1 per Hz, 022050). CHOP to TOP cannot handle this — you get "Number of samples exceeded texture resolution max".
Fix: `spectrum.par.outputmenu = 'setmanually'` and `spectrum.par.outlength = 256`. This gives 256 frequency bins — plenty for visual FFT.
DO NOT set `timeslice = False` as a workaround — that processes the entire audio file at once and produces even more samples.
### 37. GLSL spectrum texture from CHOP to TOP is 256x2 not 256x1
AudioSpectrum outputs 2 channels (stereo: chan1, chan2). CHOP to TOP with `dataformat='r'` creates a 256x2 texture — one row per channel. Sample the first channel at `y=0.25` (center of first row), NOT `y=0.5` (boundary between rows):
```glsl
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r; // correct
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.5)).r; // WRONG — samples between rows
```
### 38. FPS=0 doesn't mean ops aren't cooking — check play state
TD can show `fps:0` in `td_get_perf` while ops still cook and `TOP.save()` still produces valid screenshots. The two most common causes:
**a) Project is paused (playbar stopped).** TD's playbar can be toggled with spacebar. The `root` at `/` has no `.playbar` attribute (it's on the perform COMP). The easiest fix is sending a spacebar keypress via `td_input_execute`, though this tool can sometimes error. As a workaround, `TOP.save()` always works regardless of play state — use it to verify rendering is actually happening before spending time debugging FPS.
**b) Audio device CHOP blocking the main thread (MOST COMMON).** An `audiodeviceoutCHOP` with `active=True` can consume 300-400ms/s (2000%+ of frame budget), stalling the cook loop at FPS=0. **`volume=0` is NOT sufficient** — the audio driver still blocks. Fix: `par.active = False`. This completely stops the CHOP from interacting with the audio driver. If you need audio monitoring, enable it only during short playback checks, then disable before recording.
Verified April 2026: disabling `audiodeviceoutCHOP` (`active=False`) restored FPS from 0 to 60 instantly, recovering from 2348% budget usage to 0.1%.
Diagnostic sequence when FPS=0:
1. `td_get_perf` — check if any op has extreme CPU/s (audiodeviceoutCHOP is the usual suspect)
2. If audiodeviceoutCHOP shows >100ms/s: set `par.active = False` immediately
3. `TOP.save()` on the output — if it produces a valid image, the pipeline works, just not at real-time rate
4. Check for other blocking CHOPs (audiodevin, etc.)
5. Toggle play state (spacebar, or check if absTime.seconds is advancing)
### 39. Recording while FPS=0 produces empty or near-empty files
This is the #1 cause of "I recorded for 30 seconds but got a 2-frame video." If TD's cook loop is stalled (FPS=0 or very low), MovieFileOut has nothing to record. Unlike `TOP.save()` which captures the last cooked frame regardless, MovieFileOut only writes frames that actually cook.
**Always verify FPS before starting a recording:**
```python
# Check via td_get_perf first
# If FPS < 30, do NOT start recording — fix the performance issue first
# If FPS=0, the playbar is likely paused — see pitfall #37
```
Common causes of recording empty video:
- Playbar paused (FPS=0) — see pitfall #37
- Audio device CHOP blocking the main thread — see pitfall #37b
- Recording started before audio was cued — audio is silent, GLSL outputs black, MovieFileOut records black frames that look empty
- `par.file` set in the same script as `par.record = True` — see pitfall #18
### 40. GLSL shader produces black output — test before committing to a long render
New GLSL shaders can fail silently (see pitfall #7). Before recording a long take, always:
1. **Write a minimal test shader first** that just outputs a solid color or pass-through:
```glsl
void main() {
vec2 uv = vUV.st;
fragColor = TDOutputSwizzle(vec4(uv, 0.0, 1.0));
}
```
2. **Verify the test renders correctly** via `td_get_screenshot` on the GLSL TOP's output.
3. **Swap in the real shader** and screenshot again immediately. If black, the shader has a compile error or logic issue.
4. **Only then start recording.** A 90-second ProRes recording is ~5GB. Recording black frames wastes disk and time.
Common causes of black GLSL output:
- Missing `TDOutputSwizzle()` on macOS (pitfall #8)
- Time uniform not connected — shader uses default 0.0, fractal stays at origin
- Spectrum texture not connected — audio values all 0.0, driving everything to black
- Integer division where float division was expected (`1/2 = 0` not `0.5`)
- `absTime.seconds % 1000.0` rolled over past 1000 and the modulo produces unexpected values
### 41. td_write_dat uses `text` parameter, NOT `content`
The MCP tool `td_write_dat` expects a `text` parameter for full replacement. Passing `content` returns an error: `"Provide either 'text' for full replace, or 'old_text'+'new_text' for patching"`.
If `td_write_dat` fails, fall back to `td_execute_python`:
```python
op("/project1/shader_code").text = shader_string
```
### 42. td_execute_python DOES return print() output — use it for debugging
`print()` statements in `td_execute_python` scripts appear in the MCP response text. This is the correct way to read values back from scripts. The response format is: printed output first, then `[fps X.X/X] [N err/N warn]` on a separate line.
However, the `result` variable (if you set one) does NOT appear verbatim — use `print()` for anything you need to read back:
```python
# CORRECT — appears in response:
print('value:', some_value)
# WRONG — not reliably in response:
result = some_value
```
For structured data, use dedicated inspection tools (`td_get_operator_info`, `td_read_chop`) which return clean JSON.
### 43. td_get_operator_info JSON is appended with `[fps X.X/X]` — breaks json.loads()
The response text from `td_get_operator_info` has `[fps 60.0/60]` appended after the JSON object. This causes `json.loads()` to fail with "Extra data" errors. Strip it before parsing:
```python
clean = response_text.rsplit('[fps', 1)[0]
data = json.loads(clean)
```
### 44. td_get_screenshot is unreliable — returns `{"status": "pending"}` and may never deliver
Screenshots don't complete instantly. The tool returns `{"status": "pending", "requestId": "..."}` and the actual file may appear later — or may NEVER appear at all. In testing (April 2026), screenshots stayed "pending" indefinitely with no file written to disk, even though the shader was cooking at 8-30fps.
**Do NOT rely on `td_get_screenshot` for frame capture.** For reliable frame capture, use MovieFileOut recording + ffmpeg frame extraction:
```bash
# Record in TD first, then extract frames:
ffmpeg -y -i /tmp/td_output.mov -t 25 -vf 'fps=24' /tmp/td_frames/frame_%06d.png
```
If you need a quick visual check, `td_get_screenshot` is worth trying (it sometimes works), but always have the recording fallback. There is no callback or completion notification — if the file doesn't appear after 5-10 seconds, it's not coming.
### 45. Heavy shaders cook below record FPS — many duplicate frames in output
A raymarched GLSL shader may only cook at 8-15fps even though MovieFileOut records at 60fps. The recording still works (TD writes the last-cooked frame each time), but the resulting file has many duplicate frames. When extracting frames for post-processing, use a lower fps filter to avoid redundant frames:
```bash
# Extract at 24fps from a 60fps recording of an 8fps shader:
ffmpeg -y -i /tmp/td_output.mov -t 25 -vf 'fps=24' /tmp/td_frames/frame_%06d.png
```
Check actual cook FPS with `td_get_perf` before committing to a long recording. If FPS < 15, the output will be a slideshow regardless of the recording codec.
### 46. Recording duration is manual — no auto-stop at audio end
MovieFileOut records until `par.record = False` is set. If audio ends before you stop recording, the file keeps growing with repeated frames. Always stop recording promptly after the audio duration. For precision: set a timer on the agent side matching the audio length, then send `par.record = False`. Trim excess with ffmpeg as a safety net:
```bash
ffmpeg -i raw.mov -t 25 -c copy trimmed.mov
```
### 47. AudioFileIn par.index stays at 0 in sequential mode — not a reliable progress indicator
When `audiofileinCHOP` is in `playmode=2` (sequential), `par.index.eval()` returns 0.0 even while audio IS actively playing and the spectrum IS receiving data. Do NOT use `par.index` to check playback progress in sequential mode.
**How to verify audio is actually playing:**
- Read the spectrum CHOP values via `td_read_chop` — if values are non-zero and CHANGE between reads 1-2s apart, audio is flowing
- Read the audio CHOP itself: non-zero waveform samples confirm the file is loaded and playing
- `par.play.eval()` returning True is necessary but NOT sufficient — it can be True with no audio flowing if cue is stuck
### 48. GLSL shader whiteout — clamp audio spectrum values in the shader
Raw spectrum values multiplied by Math CHOP gain can produce very large numbers (5-20+) that blow out the shader's lighting, producing flat white/grey. The shader MUST clamp audio inputs:
```glsl
float bass = texture(sTD2DInputs[1], vec2(0.05, 0.25)).r;
bass = clamp(bass, 0.0, 3.0); // prevent whiteout
mids = clamp(mids, 0.0, 3.0);
hi = clamp(hi, 0.0, 3.0);
```
Discovered when gain=10 produced ~0.13 (too dark) during quiet passages but gain=50 produced ~9.4 (total whiteout). Fix: keep gain=10, use `highfreqboost=3.0` on AudioSpectrum, clamp in shader.
### 49. Non-Commercial TD records at 1280x1280 (square) — always crop in post
Even with `resolutionw=1280, resolutionh=720` on the GLSL TOP, Non-Commercial TD may output 1280x1280 to MovieFileOut. Always check dimensions with ffprobe and crop during extraction:
```bash
# Center-crop from 1280x1280 to 1280x720:
ffmpeg -y -i /tmp/td_output.mov -t 25 -r 24 -vf "crop=1280:720:0:280" /tmp/frames/frame_%06d.png
```
Large ProRes files (1-2GB) at 1280x1280 decode at ~3fps, so 25s of footage takes ~3 minutes to extract.
## Advanced Patterns (pitfalls 51+)
### 51. Connection syntax: use `outputConnectors`/`inputConnectors`, NOT `outputs`/`inputs`
```python
# CORRECT
src.outputConnectors[0].connect(dst.inputConnectors[0])
# WRONG — raises IndexError or AttributeError
src.outputs[0].connect(dst.inputs[0])
```
For feedback TOP, BOTH are required:
```python
fb.par.top = target.path
target.outputConnectors[0].connect(fb.inputConnectors[0])
```
### 52. moviefileoutTOP `par.input` doesn't resolve via Python in TD 2025.32460
Setting `moviefileoutTOP.par.input` programmatically does NOT work. All forms fail silently with "Not enough sources specified."
**Workaround — frame capture + ffmpeg:**
```python
out = op('/project1/out')
for i in range(300):
delay = i * 5
run(f"op('/project1/out').save('/tmp/frames/f_{i:04d}.png')", delayFrames=delay)
# Then: ffmpeg -y -framerate 30 -i /tmp/frames/f_%04d.png -c:v prores -pix_fmt yuv420p /tmp/output.mov
```
### 53. Batch frame capture — use `me.fetch`/`me.store` for state across calls
```python
start = me.fetch('cap_frame', 0)
for i in range(60):
frame = start + i
op('/project1/out').save(f'/tmp/frames/frame_{str(frame).zfill(4)}.png')
me.store('cap_frame', start + 60)
```
Call 5 times for 300 frames. Each picks up where the last left off.
### 54. GLSL TOP pixel shader requirements in TD 2025
```glsl
// REQUIRED — declare output
layout(location = 0) out vec4 fragColor;
void main() {
vec3 col = vec3(1.0, 0.0, 0.0);
fragColor = TDOutputSwizzle(vec4(col, 1.0));
}
```
**Built-in uniforms available:** `uTDOutputInfo.res` (vec4), `uTDTimeInfo.seconds`, `sTD2DInputs[N]`.
**Auto-created DATs:** `name_pixel`, `name_vertex`, `name_compute` textDATs with example code.
### 55. TOP.save() doesn't advance time — identical frames in tight loops
`.save()` captures the current cooked frame without advancing TD's timeline:
```python
# WRONG — all frames identical
for i in range(300):
op('/project1/out').save(f'frames/f_{i:04d}.png')
# CORRECT — use run() with delayFrames
for i in range(300):
delay = i * 5
run(f"op('/project1/out').save('frames/f_{i:04d}.png')", delayFrames=delay)
```
**NEVER use `time.sleep()` in TD** — it blocks the main thread and freezes the UI.
### 56. Feedback loop masks input changes — force switch during capture
With feedback TOP opacity 0.7+, the buffer dominates output. Switching input produces nearly identical frames.
**Fix — force switch index per capture:**
```python
for i in range(300):
idx = (i // 8) % num_inputs
delay = i * 5
run(f"op('/project1/vswitch').par.index={idx}; op('/project1/out').save('f_{i:04d}.png')", delayFrames=delay)
```
### 57. Large td_execute_python scripts fail — split into incremental calls
10+ operator creations in one script cause timing issues. Split into 2-4 calls of 2-4 operators each. Within one call, `create()` handles work immediately. Across calls, `op('name')` may return `None` if the previous call hasn't committed.
### 58. MCP instance reconnection after project.load()
`project.load(path)` changes the PID. After loading, call `td_list_instances()` and use the new `target_instance`. For TOX files: import as child comp instead (doesn't disconnect).
### 59. TOX reverse-engineering workflow
```python
comp = root.loadTox(r'/path/to/file.tox')
comp.name = '_study_comp'
for child in comp.children:
print(f'{child.name} ({child.OPType})')
# Use td_get_operators_info, td_read_dat, check custom params
```
### 60. sliderCOMP naming — TD appends suffix
TD auto-renames: `slider_brightness``slider_brightness1`. Always check names after creation.
### 61. create() requires full operator type suffix
```python
# CORRECT
proj.create('audiofileinCHOP', 'audio_in')
proj.create('glslTOP', 'render')
# WRONG — raises "Unknown operator type"
proj.create('audiofilein', 'audio_in')
proj.create('glsl', 'render')
```
### 62. Reparenting COMPs — use copyOPs, not connect()
Moving COMPs with `inputCOMPConnectors[0].connect()` fails. Use copy + destroy:
```python
copied = target.copyOPs([source]) # preserves internal wiring
source.destroy()
# Re-wire external connections manually after the move
```
### 63. Slider wiring — expressionCHOP with op() expressions crashes TD
```python
# CRASHES TD — don't do this
echop = root.create(expressionCHOP, 'slider_ctrl')
echop.par.chan0expr = 'op("/project1/controls/slider_brightness1").par.value0'
# WORKING — parameterCHOP as bridge
pchop = root.create(parameterCHOP, 'slider_vals')
pchop.par.ops = '/project1/controls'
pchop.par.parameters = 'value0'
pchop.par.custom = True
pchop.par.builtin = False
```
@@ -0,0 +1,183 @@
# Post-FX Reference
Bloom, CRT scanlines, chromatic aberration, and feedback glow patterns for live visual work.
---
## Bloom
### Built-in Bloom TOP
TD's `bloomTOP` is the fastest path — GPU-accelerated, no shader needed.
```python
bloom = root.create(bloomTOP, 'bloom1')
bloom.par.threshold = 0.6 # Luminance threshold (0-1)
bloom.par.size = 0.03 # Spread radius (0-1)
bloom.par.strength = 1.5 # Bloom intensity
bloom.par.blendmode = 'add' # 'add' or 'screen'
```
**Audio reactive bloom:**
```python
bloom.par.strength.mode = ParMode.EXPRESSION
bloom.par.strength.expr = "op('audio_env')['envelope'][0] * 3.0 + 0.5"
```
### GLSL Bloom (More Control)
For multi-pass bloom with color tinting:
```glsl
// bloom_pixel.glsl — pass1: threshold + tint
out vec4 fragColor;
uniform float uThreshold;
uniform vec3 uBloomColor;
void main() {
vec4 col = texture(sTD2DInputs[0], vUV.st);
float luma = dot(col.rgb, vec3(0.299, 0.587, 0.114));
float bloom = max(0.0, luma - uThreshold);
fragColor = TDOutputSwizzle(vec4(col.rgb * bloom * uBloomColor, col.a));
}
```
Then blur with `blurTOP` (size ~0.02-0.05), composite back over source with `addTOP` or `compositeTOP` in Add mode.
---
## CRT / Scanlines
Pure GLSL — create a `glslTOP` and paste into its `_pixel` DAT.
```glsl
// crt_pixel.glsl
out vec4 fragColor;
uniform float uTime;
uniform float uScanlineIntensity; // 0.0 - 1.0, default 0.4
uniform float uCurvature; // 0.0 - 0.15, default 0.05
uniform float uVignette; // 0.0 - 1.0, default 0.8
vec2 curveUV(vec2 uv, float amount) {
uv = uv * 2.0 - 1.0;
vec2 offset = abs(uv.yx) / vec2(6.0, 4.0);
uv = uv + uv * offset * offset * amount;
return uv * 0.5 + 0.5;
}
void main() {
vec2 res = uTDOutputInfo.res.zw;
vec2 uv = vUV.st;
// CRT barrel distortion
uv = curveUV(uv, uCurvature * 10.0);
// Kill pixels outside curved screen
if (uv.x < 0.0 || uv.x > 1.0 || uv.y < 0.0 || uv.y > 1.0) {
fragColor = vec4(0.0, 0.0, 0.0, 1.0);
return;
}
vec4 col = texture(sTD2DInputs[0], uv);
// Scanlines
float scanline = sin(uv.y * res.y * 3.14159) * 0.5 + 0.5;
col.rgb *= mix(1.0, scanline, uScanlineIntensity);
// Horizontal noise flicker
float flicker = TDSimplexNoise(vec2(uv.y * 100.0, uTime * 8.0)) * 0.03;
col.rgb += flicker;
// Vignette
vec2 vig = uv * (1.0 - uv.yx);
float v = pow(vig.x * vig.y * 15.0, uVignette);
col.rgb *= v;
fragColor = TDOutputSwizzle(col);
}
```
---
## Chromatic Aberration
Splits RGB channels and offsets them along screen axes.
```glsl
out vec4 fragColor;
uniform float uAmount; // 0.001 - 0.02, default 0.006
void main() {
vec2 uv = vUV.st;
vec2 dir = uv - 0.5;
float r = texture(sTD2DInputs[0], uv + dir * uAmount).r;
float g = texture(sTD2DInputs[0], uv).g;
float b = texture(sTD2DInputs[0], uv - dir * uAmount).b;
float a = texture(sTD2DInputs[0], uv).a;
fragColor = TDOutputSwizzle(vec4(r, g, b, a));
}
```
**Audio-reactive variant** — spike aberration on beats:
```glsl
uniform float uBeat;
void main() {
vec2 uv = vUV.st;
vec2 dir = uv - 0.5;
float amount = uAmount + uBeat * 0.04;
float r = texture(sTD2DInputs[0], uv + dir * amount * 1.2).r;
float g = texture(sTD2DInputs[0], uv).g;
float b = texture(sTD2DInputs[0], uv - dir * amount * 0.8).b;
fragColor = TDOutputSwizzle(vec4(r, g, b, 1.0));
}
```
---
## Feedback Glow
Warm persistent trails for glow effects.
```glsl
out vec4 fragColor;
uniform float uDecay; // 0.92 - 0.98 for slow trails
uniform vec3 uGlowColor; // tint accumulated feedback
void main() {
vec2 uv = vUV.st;
vec4 prev = texture(sTD2DInputs[0], uv); // feedback input
vec4 curr = texture(sTD2DInputs[1], uv); // current frame
vec3 glow = prev.rgb * uDecay * uGlowColor;
vec3 result = max(glow, curr.rgb);
fragColor = TDOutputSwizzle(vec4(result, 1.0));
}
```
**Tips:**
- `uDecay = 0.95` → medium trail
- `uDecay = 0.98` → long comet tail
- Set `glslTOP` format to `rgba16float` for smooth gradients
---
## Full Post-FX Stack
Recommended order:
```
[scene / composite]
bloomTOP ← luminance threshold bloom
glslTOP (chrom) ← chromatic aberration
glslTOP (crt) ← scanlines + barrel distortion + vignette
null_out ← final output
```
**Performance note:** Each glslTOP is a full GPU pass. For 1920×1080 at 60fps this stack is comfortably real-time. For 4K, consider downsampling bloom input with `resolutionTOP` first.
@@ -0,0 +1,211 @@
# Projection Mapping Reference
Multi-window output, surface mapping, edge blending, and projector calibration patterns for installation/event work.
For HUD layouts and on-screen panel grids, see `layout-compositor.md`. For wireframe/test-pattern generation, see `operator-tips.md`.
---
## Window COMP — Output to a Display
The `windowCOMP` is how TD pushes pixels to a real display.
```python
win = root.create(windowCOMP, 'output_window')
win.par.winop = '/project1/final_out' # path to the TOP being displayed
win.par.winw = 1920
win.par.winh = 1080
win.par.winoffsetx = 0 # screen-space offset
win.par.winoffsety = 0
win.par.borders = False # no chrome
win.par.alwaysontop = True
win.par.cursor = False # hide cursor in fullscreen
win.par.justify = 'fillaspect' # 'fill' | 'fitaspect' | 'fillaspect' | 'native'
win.par.winopen.pulse() # OPEN the window
```
To target a specific physical display, set `par.location`:
```python
win.par.location = 'secondary' # 'primary' | 'secondary' | 'monitor1' | 'monitor2' | ...
```
Or set absolute coordinates using `winoffsetx/y` matched to your OS display layout.
**Always pulse `winopen` — setting params alone doesn't open the window.**
---
## Multi-Window Output
For multi-projector or multi-display setups, create one `windowCOMP` per output, each pointing at a different TOP.
```python
for i, screen_top in enumerate(['out_left', 'out_center', 'out_right']):
w = root.create(windowCOMP, f'win_{i}')
w.par.winop = f'/project1/{screen_top}'
w.par.winw = 1920; w.par.winh = 1080
w.par.winoffsetx = i * 1920
w.par.winoffsety = 0
w.par.borders = False
w.par.alwaysontop = True
w.par.cursor = False
w.par.winopen.pulse()
```
For ultra-wide single-output spans, use ONE windowCOMP at e.g. 5760×1080 spanning three projectors via the GPU's mosaic/spanning mode (Nvidia Mosaic, AMD Eyefinity), then split content via `cropTOP` per screen inside TD.
---
## 4-Point Corner Pin (Quad Warp)
The simplest projection mapping primitive — warping a rectangle onto a quadrilateral.
```python
# Source content
src = op('/project1/scene_out')
# Manual: cornerPinTOP (TD has this built-in)
cp = root.create(cornerPinTOP, 'corner_pin')
cp.par.tlx = 0.05; cp.par.tly = 0.10 # top-left (normalized 0-1)
cp.par.trx = 0.95; cp.par.try = 0.08 # top-right
cp.par.brx = 0.93; cp.par.bry = 0.92 # bottom-right
cp.par.blx = 0.07; cp.par.bly = 0.94 # bottom-left
cp.inputConnectors[0].connect(src)
```
Alternative: use a `geometryCOMP` with a `gridSOP` and bend the verts in vertex GLSL. More flexible (curved surfaces) but more setup.
Verify TD 2025.32 param names with `td_get_par_info(op_type='cornerPinTOP')`.
---
## Bezier / Mesh Warp (Curved Surfaces)
For non-flat surfaces (domes, columns, curved walls), use a subdivided mesh and per-vertex displacement.
### Pattern: Grid Mesh + GLSL Displacement
```python
# Subdivided grid in a geo
geo = root.create(geometryCOMP, 'warp_geo')
grid = geo.create(gridSOP, 'warp_grid')
grid.par.rows = 32 # higher = smoother curve
grid.par.cols = 32
grid.par.sizex = 2; grid.par.sizey = 2
# Texture the source onto it
mat = root.create(constMAT, 'warp_mat') # use constMAT for unlit projection
mat.par.maptop = '/project1/scene_out' # source TOP
geo.par.material = mat.path
# Render to a TOP that goes to the projector window
cam = root.create(cameraCOMP, 'cam_proj')
cam.par.tz = 4
render = root.create(renderTOP, 'projection_out')
render.par.camera = cam.path
render.par.geometry = geo.path
render.par.outputresolution = 'custom'
render.par.resolutionw = 1920; render.par.resolutionh = 1080
```
For per-vertex offsets, write a vertex GLSL on the constMAT (or use `glslMAT`) and read displacement values from a CHOP via uniform.
Calibration is iterative: render a checkerboard from `scene_out`, project it, photograph the projection, manually nudge corner/grid points until aligned.
---
## Edge Blending (Multi-Projector Overlap)
When two projectors overlap, the overlap region is twice as bright. Blend by ramping each projector's edge alpha to 0 across the overlap zone.
### GLSL Edge Blend Shader
Per-projector output pass that fades the inside edge to black:
```glsl
// edge_blend_pixel.glsl
out vec4 fragColor;
uniform float uBlendLeft; // overlap width on left edge (0-0.5, 0=no blend)
uniform float uBlendRight;
uniform float uGamma; // typically 2.2 — perceptual ramp
void main() {
vec2 uv = vUV.st;
vec4 col = texture(sTD2DInputs[0], uv);
float aL = (uBlendLeft > 0.0) ? smoothstep(0.0, uBlendLeft, uv.x) : 1.0;
float aR = (uBlendRight > 0.0) ? smoothstep(0.0, uBlendRight, 1.0 - uv.x) : 1.0;
float a = pow(aL * aR, uGamma);
fragColor = TDOutputSwizzle(vec4(col.rgb * a, 1.0));
}
```
Apply this to each overlap-touching projector's output. Tune `uBlendLeft` / `uBlendRight` to match your physical overlap.
For top/bottom blends or cylindrical setups, extend the shader with `uBlendTop` / `uBlendBottom`.
---
## Calibration Patterns
Useful test patterns for aligning projectors. Build a `switchTOP` selecting one of these, route to all projector windows during setup.
```python
# Solid white — for brightness/uniformity check
white = root.create(constantTOP, 'cal_white')
white.par.colorr = 1.0; white.par.colorg = 1.0; white.par.colorb = 1.0
# Centered crosshair — for keystone alignment
gridcross = root.create(textTOP, 'cal_cross')
gridcross.par.text = '+'
gridcross.par.fontsizex = 200
# Fine grid — for warp/mesh alignment (use rampTOP + math + threshold, or build via GLSL)
# Color bars for projector color calibration
bars = root.create(rampTOP, 'cal_bars')
bars.par.type = 'horizontal'
```
Or use the bundled `testpatternTOP` if your TD version includes it.
---
## Projection Audit Workflow
When debugging a multi-screen setup:
1. Render a unique color and label per output (`textTOP` saying "LEFT", "CENTER", "RIGHT").
2. Check that each window is sourcing the correct path: `td_get_operator_info(path='/project1/win_0')`.
3. Verify display assignment: walk to each projector and confirm visually.
4. Check resolution: physical projector native res vs. TD output res — mismatches cause scaling artifacts.
5. Cook flag: `td_get_perf` — if a window's source TOP isn't cooking, the projector shows last frame frozen.
---
## Pitfalls
1. **Window won't open** — you forgot `winopen.pulse()`. Setting params alone doesn't open it.
2. **Wrong display**`par.location='secondary'` depends on OS display order. Set `winoffsetx/y` to absolute coords as a more reliable override.
3. **Cursor visible** — set `par.cursor = False` BEFORE opening, or close+reopen.
4. **Black projection** — usually a cooking issue. Verify `final_out` TOP is cooking via `td_get_perf`. Check `td_get_errors` recursively from `/`.
5. **Tearing / vsync**`windowCOMP` honors `par.vsync`. For projection always set `vsync='vsync'` (default). Tearing means GPU is over-budget — reduce render resolution.
6. **Aspect mismatch** — projector native is often 1920×1200 (16:10) not 1080. Use `justify='fitaspect'` or render at native projector res.
7. **Non-Commercial license** — caps total resolution at 1280×1280. For real installation work you need Commercial. Pro license adds 4K+.
8. **Multiple monitors on macOS**`windowCOMP` honors macOS Spaces. Disable Spaces or pin TD to a specific display in System Settings before showtime.
---
## Quick Recipes
| Goal | Approach |
|---|---|
| Single fullscreen output | One `windowCOMP`, `justify='fillaspect'`, `winopen.pulse()` |
| 3-projector wide span | 3 `windowCOMP` + per-output `cropTOP` from one wide source |
| Single quad surface | `cornerPinTOP``windowCOMP` |
| Curved/dome | Subdivided gridSOP with vertex GLSL → `renderTOP``windowCOMP` |
| Edge blend overlap | GLSL fade shader per projector → `windowCOMP` |
| Calibration mode | `switchTOP` between scene and test patterns, hot-key triggered |
@@ -0,0 +1,463 @@
# TouchDesigner Python API Reference
## The td Module
TouchDesigner's Python environment auto-imports the `td` module. All TD-specific classes, functions, and constants live here. Scripts inside TD (Script DATs, CHOP/DAT Execute callbacks, Extensions) have full access.
When using the MCP `execute_python_script` tool, these globals are pre-loaded:
- `op` — shortcut for `td.op()`, finds operators by path
- `ops` — shortcut for `td.ops()`, finds multiple operators by pattern
- `me` — the operator running the script (via MCP this is the twozero internal executor)
- `parent` — shortcut for `me.parent()`
- `project` — the root project component
- `td` — the full td module
## Finding Operators: op() and ops()
### op(path) — Find a single operator
```python
# Absolute path (always works from MCP)
node = op('/project1/noise1')
# Relative path (relative to current operator — only in Script DATs)
node = op('noise1') # sibling
node = op('../noise1') # parent's sibling
# Returns None if not found (does NOT raise)
node = op('/project1/nonexistent') # None
```
### ops(pattern) — Find multiple operators
```python
# Glob patterns
nodes = ops('/project1/noise*') # all nodes starting with "noise"
nodes = ops('/project1/*') # all direct children
nodes = ops('/project1/container1/*') # all children of container1
# Returns a tuple of operators (may be empty)
for n in ops('/project1/*'):
print(n.name, n.OPType)
```
### Navigation from a node
```python
node = op('/project1/noise1')
node.name # 'noise1'
node.path # '/project1/noise1'
node.OPType # 'noiseTop'
node.type # <class 'noiseTop'>
node.family # 'TOP'
# Parent / children
node.parent() # the parent COMP
node.parent().children # all siblings + self
node.parent().findChildren(name='noise*') # filtered
# Type checking
node.isTOP # True
node.isCHOP # False
node.isSOP # False
node.isDAT # False
node.isMAT # False
node.isCOMP # False
```
## Parameters
Every operator has parameters accessed via the `.par` attribute.
### Reading parameters
```python
node = op('/project1/noise1')
# Direct access
node.par.seed.val # current evaluated value (may be an expression result)
node.par.seed.eval() # same as .val
node.par.seed.default # default value
node.par.monochrome.val # boolean parameters: True/False
# List all parameters
for p in node.pars():
print(f"{p.name}: {p.val} (default: {p.default})")
# Filter by page (parameter group)
for p in node.pars('Noise'): # page name
print(f"{p.name}: {p.val}")
```
### Setting parameters
```python
# Direct value setting
node.par.seed.val = 42
node.par.monochrome.val = True
node.par.resolutionw.val = 1920
node.par.resolutionh.val = 1080
# String parameters
op('/project1/text1').par.text.val = 'Hello World'
# File paths
op('/project1/moviefilein1').par.file.val = '/path/to/video.mp4'
# Reference another operator (for "dat", "chop", "top" type parameters)
op('/project1/glsl1').par.dat.val = '/project1/shader_code'
```
### Parameter expressions
```python
# Python expressions that evaluate dynamically
node.par.seed.expr = "me.time.frame"
node.par.tx.expr = "math.sin(me.time.seconds * 2)"
# Reference another parameter
node.par.brightness1.expr = "op('/project1/constant1').par.value0.val"
# Export (one-way binding from CHOP to parameter)
# This makes the parameter follow a CHOP channel value
op('/project1/noise1').par.seed.val # can also be driven by exports
```
### Parameter types
| Type | Python Type | Example |
|------|------------|---------|
| Float | `float` | `node.par.brightness1.val = 0.5` |
| Int | `int` | `node.par.seed.val = 42` |
| Toggle | `bool` | `node.par.monochrome.val = True` |
| String | `str` | `node.par.text.val = 'hello'` |
| Menu | `int` (index) or `str` (label) | `node.par.type.val = 'sine'` |
| File | `str` (path) | `node.par.file.val = '/path/to/file'` |
| OP reference | `str` (path) | `node.par.dat.val = '/project1/text1'` |
| Color | separate r/g/b/a floats | `node.par.colorr.val = 1.0` |
| XY/XYZ | separate x/y/z floats | `node.par.tx.val = 0.5` |
## Creating and Deleting Operators
```python
# Create via parent component
parent = op('/project1')
new_node = parent.create(noiseTop) # using class reference
new_node = parent.create(noiseTop, 'my_noise') # with custom name
# The MCP create_td_node tool handles this automatically:
# create_td_node(parentPath="/project1", nodeType="noiseTop", nodeName="my_noise")
# Delete
node = op('/project1/my_noise')
node.destroy()
# Copy
original = op('/project1/noise1')
copy = parent.copy(original, name='noise1_copy')
```
## Connections (Wiring Operators)
### Output to Input connections
```python
# Connect noise1's output to level1's input
op('/project1/noise1').outputConnectors[0].connect(op('/project1/level1'))
# Connect to specific input index (for multi-input operators like Composite)
op('/project1/noise1').outputConnectors[0].connect(op('/project1/composite1').inputConnectors[0])
op('/project1/text1').outputConnectors[0].connect(op('/project1/composite1').inputConnectors[1])
# Disconnect all outputs
op('/project1/noise1').outputConnectors[0].disconnect()
# Query connections
node = op('/project1/level1')
inputs = node.inputs # list of connected input operators
outputs = node.outputs # list of connected output operators
```
### Connection patterns for common setups
```python
# Linear chain: A -> B -> C -> D
ops_list = [op(f'/project1/{name}') for name in ['noise1', 'level1', 'blur1', 'null1']]
for i in range(len(ops_list) - 1):
ops_list[i].outputConnectors[0].connect(ops_list[i+1])
# Fan-out: A -> B, A -> C, A -> D
source = op('/project1/noise1')
for target_name in ['level1', 'composite1', 'transform1']:
source.outputConnectors[0].connect(op(f'/project1/{target_name}'))
# Merge: A + B + C -> Composite
comp = op('/project1/composite1')
for i, source_name in enumerate(['noise1', 'text1', 'ramp1']):
op(f'/project1/{source_name}').outputConnectors[0].connect(comp.inputConnectors[i])
```
## DAT Content Manipulation
### Text DATs
```python
dat = op('/project1/text1')
# Read
content = dat.text # full text as string
# Write
dat.text = "new content"
dat.text = '''multi
line
content'''
# Append
dat.text += "\nnew line"
```
### Table DATs
```python
dat = op('/project1/table1')
# Read cell
val = dat[0, 0] # row 0, col 0
val = dat[0, 'name'] # row 0, column named 'name'
val = dat['key', 1] # row named 'key', col 1
# Write cell
dat[0, 0] = 'value'
# Read row/col
row = dat.row(0) # list of Cell objects
col = dat.col('name') # list of Cell objects
# Dimensions
rows = dat.numRows
cols = dat.numCols
# Append row
dat.appendRow(['col1_val', 'col2_val', 'col3_val'])
# Clear
dat.clear()
# Set entire table
dat.clear()
dat.appendRow(['name', 'value', 'type'])
dat.appendRow(['frequency', '440', 'float'])
dat.appendRow(['amplitude', '0.8', 'float'])
```
## Time and Animation
```python
# Global time
td.absTime.frame # absolute frame number (never resets)
td.absTime.seconds # absolute seconds
# Timeline time (affected by play/pause/loop)
me.time.frame # current frame on timeline
me.time.seconds # current seconds on timeline
me.time.rate # FPS setting
# Timeline control (via execute_python_script)
project.play = True
project.play = False
project.frameRange = (1, 300) # set timeline range
# Cook frame (when operator was last computed)
node.cookFrame
node.cookTime
```
## Extensions (Custom Python Classes on Components)
Extensions add custom Python methods and attributes to COMPs.
```python
# Create extension on a Base COMP
base = op('/project1/myBase')
# The extension class is defined in a Text DAT inside the COMP
# Typically named 'ExtClass' with the extension code:
extension_code = '''
class MyExtension:
def __init__(self, ownerComp):
self.ownerComp = ownerComp
self.counter = 0
def Reset(self):
self.counter = 0
def Increment(self):
self.counter += 1
return self.counter
@property
def Count(self):
return self.counter
'''
# Write extension code to DAT inside the COMP
op('/project1/myBase/extClass').text = extension_code
# Configure the extension on the COMP
base.par.extension1 = 'extClass' # name of the DAT
base.par.promoteextension1 = True # promote methods to parent
# Call extension methods
base.Increment() # calls MyExtension.Increment()
count = base.Count # accesses MyExtension.Count property
base.Reset()
```
## Useful Built-in Modules
### tdu — TouchDesigner Utilities
```python
import tdu
# Dependency tracking (reactive values)
dep = tdu.Dependency(initial_value)
dep.val = new_value # triggers dependents to recook
# File path utilities
tdu.expandPath('$HOME/Desktop/output.mov')
# Math
tdu.clamp(value, min, max)
tdu.remap(value, from_min, from_max, to_min, to_max)
```
### TDFunctions
```python
from TDFunctions import *
# Commonly used utilities
clamp(value, low, high)
remap(value, inLow, inHigh, outLow, outHigh)
interp(value1, value2, t) # linear interpolation
```
### TDStoreTools — Persistent Storage
```python
from TDStoreTools import StorageManager
# Store data that survives project reload
me.store('myKey', 'myValue')
val = me.fetch('myKey', default='fallback')
# Storage dict
me.storage['key'] = value
```
## Common Patterns via execute_python_script
### Build a complete chain
```python
# Create a complete audio-reactive noise chain
parent = op('/project1')
# Create operators
audio_in = parent.create(audiofileinChop, 'audio_in')
spectrum = parent.create(audiospectrumChop, 'spectrum')
chop_to_top = parent.create(choptopTop, 'chop_to_top')
noise = parent.create(noiseTop, 'noise1')
level = parent.create(levelTop, 'level1')
null_out = parent.create(nullTop, 'out')
# Wire the chain
audio_in.outputConnectors[0].connect(spectrum)
spectrum.outputConnectors[0].connect(chop_to_top)
noise.outputConnectors[0].connect(level)
level.outputConnectors[0].connect(null_out)
# Set parameters
audio_in.par.file = '/path/to/music.wav'
audio_in.par.play = True
spectrum.par.size = 512
noise.par.type = 1 # Sparse
noise.par.monochrome = False
noise.par.resolutionw = 1920
noise.par.resolutionh = 1080
level.par.opacity = 0.8
level.par.gamma1 = 0.7
```
### Query network state
```python
# Get all TOPs in the project
tops = [c for c in op('/project1').findChildren(type=TOP)]
for t in tops:
print(f"{t.path}: {t.OPType} {'ERROR' if t.errors() else 'OK'}")
# Find all operators with errors
def find_errors(parent_path='/project1'):
parent = op(parent_path)
errors = []
for child in parent.findChildren(depth=-1):
if child.errors():
errors.append((child.path, child.errors()))
return errors
result = find_errors()
```
### Batch parameter changes
```python
# Set parameters on multiple nodes at once
settings = {
'/project1/noise1': {'seed': 42, 'monochrome': False, 'resolutionw': 1920},
'/project1/level1': {'brightness1': 1.2, 'gamma1': 0.8},
'/project1/blur1': {'sizex': 5, 'sizey': 5},
}
for path, params in settings.items():
node = op(path)
if node:
for key, val in params.items():
setattr(node.par, key, val)
```
## Python Version and Packages
TouchDesigner bundles Python 3.11+ with these pre-installed:
- **numpy** — array operations, fast math
- **scipy** — signal processing, FFT
- **OpenCV** (cv2) — computer vision
- **PIL/Pillow** — image processing
- **requests** — HTTP client
- **json**, **re**, **os**, **sys** — standard library
**IMPORTANT:** Parameter names in examples below are illustrative. Always run discovery (SKILL.md Step 0) to get actual names for your TD version. Do NOT copy param names from these examples verbatim.
Custom packages can be installed to TD's Python site-packages directory. See TD documentation for the exact path per platform.
## SOP Vertex/Point Access (TD 2025.32)
In TD 2025.32, `td.Vertex` does NOT have `.x`, `.y`, `.z` attributes. Use index access:
```python
# WRONG — crashes in TD 2025.32:
vertex.x, vertex.y, vertex.z
# CORRECT — index/attribute access:
pt = sop.points()[i]
pos = pt.P # Position object
x, y, z = pos[0], pos[1], pos[2]
# Always introspect first:
dir(sop.points()[0]) # see what attributes actually exist
dir(sop.points()[0].P) # see Position object interface
```
@@ -0,0 +1,198 @@
# Replicator COMP Reference
The `replicatorCOMP` clones a template operator N times, driven by a table of data. The fundamental TD pattern for data-driven networks: button grids, scene rosters, dynamic UI, parameter panels per-channel.
For visual instancing (per-pixel/per-render copies), see `geometry-comp.md`. Replicator builds NETWORK NODES; instancing builds RENDER COPIES. Different layer.
---
## Concept
```
[Template OP] [Data tableDAT]
│ │
└─────→ replicatorCOMP ←───────┘
[N clones], one per data row
Each clone gets per-row params
```
Edit the template once → all clones inherit. Edit the table → clones add/remove dynamically. Push parameter overrides per-row.
---
## Minimal Setup
```python
# 1. Make a template (the thing to clone)
template = root.create(buttonCOMP, 'btn_template')
template.par.w = 80; template.par.h = 80
template.par.text = 'X'
template.par.bgcolorr = 0.2
# 2. Make a data table (one row per clone)
data = root.create(tableDAT, 'scene_data')
data.appendRow(['name', 'color_r', 'color_g', 'color_b'])
data.appendRow(['Sunset', 1.0, 0.4, 0.0])
data.appendRow(['Midnight', 0.0, 0.1, 0.4])
data.appendRow(['Storm', 0.3, 0.3, 0.5])
data.appendRow(['Forest', 0.0, 0.5, 0.2])
# 3. Replicator — points at template + data
rep = root.create(replicatorCOMP, 'scene_buttons')
rep.par.template = template.path
rep.par.opfromdat = data.path
rep.par.namefromdatname = 'name' # use 'name' column for clone names
rep.par.incrementalnumbering = False
```
After cooking, the replicator creates 4 child COMPs named `Sunset`, `Midnight`, `Storm`, `Forest` (one per non-header row), each cloned from `btn_template`.
---
## Per-Row Parameter Overrides
The replicator's docked `replicator1_callbacks` DAT lets you customize each clone:
```python
def onReplicate(comp, allOps, newOps, template, master):
"""Called once per replicate cycle. newOps is the list of just-created clones."""
data = op('scene_data')
for i, clone in enumerate(newOps):
row = i + 1 # +1 to skip header
clone.par.text = data[row, 'name'].val
clone.par.bgcolorr = float(data[row, 'color_r'].val)
clone.par.bgcolorg = float(data[row, 'color_g'].val)
clone.par.bgcolorb = float(data[row, 'color_b'].val)
return
```
Or use parameter expressions referencing `digits` (the per-clone index, available as a built-in expression token inside the cloned subtree):
```python
# Inside the template, set a param expression like:
# par.value0.expr = "op('../scene_data')[me.digits + 1, 'value']"
```
`me.digits` resolves to the row index of the current clone. This is the cleanest way for static reference patterns — no callback needed.
---
## Layout: Buttons in a Grid
Drop the replicator inside a `containerCOMP` with auto-layout:
```python
panel = root.create(containerCOMP, 'scene_panel')
panel.par.w = 400; panel.par.h = 100
panel.par.align = 'lefttoright'
# Move the replicator inside
rep.parent = panel.path # or create rep as a child of panel directly
```
Each clone is a child of the replicator (which itself is a child of the panel). The panel auto-arranges everything.
For a 2D grid, set `par.align = 'fillresize'` on the container and override `par.x` / `par.y` per clone in the callback based on row/col index.
---
## Updating Without Rebuilding
When the data table changes, the replicator regenerates the clones. By default it destroys and recreates everything. To preserve state, set:
```python
rep.par.recreatemissing = True # only add/remove changed rows
rep.par.recreateallonchange = False
```
This pattern is essential for live-edit scenarios (designer adjusts table, network keeps running).
For incremental data ingestion (e.g., from a `webDAT` polling an API), have a `datExecuteDAT` watch the response, parse, write to the data table, and the replicator self-updates.
---
## Common Patterns
### Scene Roster (Data → Buttons + Logic)
```python
# Data per scene: name, file path, audio track, BPM
scene_data.appendRow(['name', 'file', 'audio', 'bpm'])
scene_data.appendRow(['Intro', '/scenes/intro.tox', '/audio/intro.wav', 110])
scene_data.appendRow(['Main', '/scenes/main.tox', '/audio/main.wav', 128])
# Replicator clones a buttonCOMP per scene
# Each button's onClick callback loads the corresponding tox + cues audio
```
### Dynamic Parameter Panel
For a list of audio bands, generate a fader strip per band:
```python
# Data: band names (sub, low, mid, hi-mid, high, air)
# Template: containerCOMP with label + sliderCOMP
# Replicator clones N strips
# Each slider's value is read at /audio_eq/{band_name}/fader
```
### Procedural Visual Network
Build a multi-channel visual network from a config file:
```python
# Data: which TOPs to chain, per "scene"
# Template: a baseCOMP with placeholder children
# Replicator builds one baseCOMP per scene; each scene contains a custom chain
# Switch between scenes via switchTOP.par.index driven by panel
```
### Per-Channel CHOP Display
Visualize each channel of a multi-channel CHOP separately:
```python
# Data table: one row per channel (auto-extracted via choptodatDAT)
# Template: a small chopVis COMP showing one channel
# Replicator generates N visualizers stacked vertically
```
---
## Replicator vs. Pure Python Loop
| Approach | When to use |
|---|---|
| **replicatorCOMP** | The set of clones changes (add/remove rows live). Visual editor expectations. Pattern is reusable across projects. |
| **Python loop** (in `td_execute_python`) | One-shot generation. Static set. Simpler logic, no template overhead. Faster to write. |
If you'll only ever build the network once, prefer a Python loop with `td_execute_python`. The replicator earns its weight when data is live.
---
## Pitfalls
1. **Header row**`tableDAT` rows are 0-indexed. If you have a header, your first data row is index 1. Off-by-one bugs are common in callbacks.
2. **`namefromdatname` column missing** — replicator silently uses `digits` (numeric suffix) names. Buttons end up named `1`, `2`, `3` instead of meaningful names. Set `par.namefromdatname` explicitly.
3. **Template lives in network** — the template OP is itself a real network node. Don't connect things downstream of it directly; connect to the clones (or use a `nullCOMP` between).
4. **Recreate-on-change wipes state** — toggles, slider positions, and uncached data inside clones are lost on each regeneration. Use `recreatemissing` to preserve.
5. **`onReplicate` doesn't fire on edit** — only fires when the clone set changes. Editing a value WITHIN an existing row doesn't re-trigger. Use `parameterExecuteDAT` or expressions for per-cell live updates.
6. **Custom params on clones** — pages added in the template propagate. Pages added in `onReplicate` don't survive the next regeneration. Always add custom pages on the template, not the clone.
7. **Cooking storms** — adding many rows fast triggers many clone events. Bundle adds via Python and call `data.cook(force=True)` once at the end.
8. **`me.digits` outside replicator children** — `me.digits` only resolves inside an op that's a descendant of the replicator. Don't reference it in unrelated networks.
9. **Cross-clone references** — referencing a sibling clone via relative path works from inside a clone (`op('../OtherClone/x')`), but breaks if names change. Prefer absolute paths via the data table.
---
## Quick Recipes
| Goal | Setup |
|---|---|
| 8-button scene picker | `tableDAT` (8 rows) + `buttonCOMP` template + `replicatorCOMP` |
| Per-band EQ strip panel | `tableDAT` (band names) + container template (label + slider) + replicator |
| Data-driven visual scenes | `tableDAT` (scene config) + `baseCOMP` template (visual chain) + replicator |
| Live-updating clone set | Same as above + `par.recreatemissing = True` |
| Per-row colored UI | Data table with color cols, `onReplicate` callback sets per-clone colors |
| List from API response | `webDAT``datExecuteDAT` parses JSON → writes to data table → replicator updates |
@@ -0,0 +1,244 @@
# TouchDesigner Troubleshooting (twozero MCP)
> See `references/pitfalls.md` for the comprehensive lessons-learned list.
## 1. Connection Issues
### Port 40404 not responding
Check these in order:
1. Is TouchDesigner running?
```bash
pgrep TouchDesigner
```
1b. Quick hub health check (no JSON-RPC needed):
A plain GET to the MCP URL returns instance info:
```
curl -s http://localhost:40404/mcp
```
Returns: `{"hub": true, "pid": ..., "instances": {"127.0.0.1_PID": {"project": "...", "tdVersion": "...", ...}}}`
If this returns JSON but `instances` is empty, TD is running but twozero hasn't registered yet.
2. Is twozero installed in TD?
Open TD Palette Browser > twozero should be listed. If not, install it.
3. Is MCP enabled in twozero settings?
In TD, open twozero preferences and confirm MCP server is toggled ON.
4. Test the port directly:
```bash
nc -z 127.0.0.1 40404
```
5. Test the MCP endpoint:
```bash
curl -s http://localhost:40404/mcp
```
Should return JSON with hub info. If it does, the server is running.
### Hub responds but no TD instances
The twozero MCP hub is running but TD hasn't registered. Causes:
- TD project not loaded yet (still on splash screen)
- twozero COMP not initialized in the current project
- twozero version mismatch
Fix: Open/reload a TD project that contains the twozero COMP. Use td_list_instances
to check which TD instances are registered.
### Multi-instance setup
twozero auto-assigns ports for multiple TD instances:
- First instance: 40404
- Second instance: 40405
- Third instance: 40406
- etc.
Use `td_list_instances` to discover all running instances and their ports.
## 2. MCP Tool Errors
### td_execute_python returns error
The error message from td_execute_python often contains the Python traceback.
If it's unclear, use `td_read_textport` to see the full TD console output —
Python exceptions are always printed there.
Common causes:
- Syntax error in the script
- Referencing a node that doesn't exist (op() returns None, then you call .par on None)
- Using wrong parameter names (see pitfalls.md)
### td_set_operator_pars fails
Parameter name mismatch is the #1 cause. The tool validates param names and
returns clear errors, but you must use exact names.
Fix: ALWAYS call `td_get_par_info` first to discover the real parameter names:
```
td_get_par_info(op_type='glslTOP')
td_get_par_info(op_type='noiseTOP')
```
### td_create_operator type name errors
Operator type names use camelCase with family suffix:
- CORRECT: noiseTOP, glslTOP, levelTOP, compositeTOP, audiospectrumCHOP
- WRONG: NoiseTOP, noise_top, NOISE TOP, Noise
### td_get_operator_info for deep inspection
If unsure about any aspect of an operator (params, inputs, outputs, state):
```
td_get_operator_info(path='/project1/noise1', detail='full')
```
## 3. Parameter Discovery
CRITICAL: ALWAYS use td_get_par_info to discover parameter names.
The agent's LLM training data contains WRONG parameter names for TouchDesigner.
Do not trust them. Known wrong names include dat vs pixeldat, colora vs alpha,
sizex vs size, and many more. See pitfalls.md for the full list.
Workflow:
1. td_get_par_info(op_type='glslTOP') — get all params for a type
2. td_get_operator_info(path='/project1/mynode', detail='full') — get params for a specific instance
3. Use ONLY the names returned by these tools
## 4. Performance
### Diagnosing slow performance
Use `td_get_perf` to see which operators are slow. Look at cook times —
anything over 1ms per frame is worth investigating.
Common causes:
- Resolution too high (especially on Non-Commercial)
- Complex GLSL shaders
- Too many TOP-to-CHOP or CHOP-to-TOP transfers (GPU-CPU memory copies)
- Feedback loops without decay (values accumulate, memory grows)
### Non-Commercial license restrictions
- Resolution cap: 1280x1280. Setting resolutionw=1920 silently clamps to 1280.
- H.264/H.265/AV1 encoding requires Commercial license. Use ProRes or Hap instead.
- No commercial use of output.
Always check effective resolution after creation:
```python
n.cook(force=True)
actual = str(n.width) + 'x' + str(n.height)
```
## 5. Hermes Configuration
### Config location
`$HERMES_HOME/config.yaml` (defaults to `~/.hermes/config.yaml` when `HERMES_HOME` is unset)
### MCP entry format
The twozero TD entry should look like:
```yaml
mcpServers:
twozero_td:
url: http://localhost:40404/mcp
```
### After config changes
Restart the Hermes session for changes to take effect. The MCP connection is
established at session startup.
### Verifying MCP tools are available
After restarting, the session log should show twozero MCP tools registered.
If tools show as registered but aren't callable, check:
- The twozero MCP hub is still running (curl test above)
- TD is still running with a project loaded
- No firewall blocking localhost:40404
## 6. Node Creation Issues
### "Node type not found" error
Wrong type string. Use camelCase with family suffix:
- Wrong: NoiseTop, noise_top, NOISE TOP
- Right: noiseTOP
### Node created but not visible
Check parentPath — use absolute paths like /project1. The default project
root is /project1. System nodes live at /, /ui, /sys, /local, /perform.
Don't create user nodes outside /project1.
### Cannot create node inside a non-COMP
Only COMP operators (Container, Base, Geometry, etc.) can contain children.
You cannot create nodes inside a TOP, CHOP, SOP, DAT, or MAT.
## 7. Wiring Issues
### Cross-family wiring
TOPs connect to TOPs, CHOPs to CHOPs, SOPs to SOPs, DATs to DATs.
Use converter operators to bridge: choptoTOP, topToCHOP, soptoDAT, etc.
Note: choptoTOP has NO input connectors. Use par.chop reference instead:
```python
spec_tex.par.chop = resample_node # correct
# NOT: resample.outputConnectors[0].connect(spec_tex.inputConnectors[0])
```
### Feedback loops
Never create A -> B -> A directly. Use a Feedback TOP:
```python
fb = root.create(feedbackTOP, 'fb')
fb.par.top = comp.path # reference only, no wire to fb input
fb.outputConnectors[0].connect(next_node)
```
"Cook dependency loop detected" warning on the chain is expected and correct.
## 8. GLSL Issues
### Shader compilation errors are silent
GLSL TOP shows a yellow warning in the UI but node.errors() may return empty.
Check node.warnings() too. Create an Info DAT pointed at the GLSL TOP for
full compiler output.
### TD GLSL specifics
- Uses GLSL 4.60 (Vulkan backend). GLSL 3.30 and earlier removed.
- UV coordinates: vUV.st (not gl_FragCoord)
- Input textures: sTD2DInputs[0]
- Output: layout(location = 0) out vec4 fragColor
- macOS CRITICAL: Always wrap output with TDOutputSwizzle(color)
- No built-in time uniform. Pass time via GLSL TOP Values page or Constant TOP.
## 9. Recording Issues
### H.264/H.265/AV1 requires Commercial license
Use Apple ProRes on macOS (hardware accelerated, not license-restricted):
```python
rec.par.videocodec = 'prores' # Preferred on macOS — lossless, Non-Commercial OK
# rec.par.videocodec = 'mjpa' # Fallback — lossy, works everywhere
```
### MovieFileOut has no .record() method
Use the toggle parameter:
```python
rec.par.record = True # start
rec.par.record = False # stop
```
### All exported frames identical
TOP.save() captures same frame when called rapidly. Use MovieFileOut for
real-time recording. Set project.realTime = False for frame-accurate output.
@@ -0,0 +1,115 @@
#!/usr/bin/env bash
# setup.sh — Automated setup for twozero MCP plugin for TouchDesigner
# Idempotent: safe to run multiple times.
set -euo pipefail
GREEN='\033[0;32m'; RED='\033[0;31m'; YELLOW='\033[1;33m'; CYAN='\033[0;36m'; NC='\033[0m'
OK="${GREEN}${NC}"; FAIL="${RED}${NC}"; WARN="${YELLOW}${NC}"
TWOZERO_URL="https://www.404zero.com/pisang/twozero.tox"
TOX_PATH="$HOME/Downloads/twozero.tox"
HERMES_HOME_DIR="${HERMES_HOME:-$HOME/.hermes}"
HERMES_CFG="${HERMES_HOME_DIR}/config.yaml"
MCP_PORT=40404
MCP_ENDPOINT="http://localhost:${MCP_PORT}/mcp"
manual_steps=()
echo -e "\n${CYAN}═══ twozero MCP for TouchDesigner — Setup ═══${NC}\n"
# ── 1. Check if TouchDesigner is running ──
# Match on process *name* (not full cmdline) to avoid self-matching shells
# that happen to have "TouchDesigner" in their args. macOS and Linux pgrep
# both support -x for exact name match.
if pgrep -x TouchDesigner >/dev/null 2>&1 || pgrep -x TouchDesignerFTE >/dev/null 2>&1; then
echo -e " ${OK} TouchDesigner is running"
td_running=true
else
echo -e " ${WARN} TouchDesigner is not running"
td_running=false
fi
# ── 2. Ensure twozero.tox exists ──
if [[ -f "$TOX_PATH" ]]; then
echo -e " ${OK} twozero.tox already exists at ${TOX_PATH}"
else
echo -e " ${WARN} twozero.tox not found — downloading..."
if curl -fSL -o "$TOX_PATH" "$TWOZERO_URL" 2>/dev/null; then
echo -e " ${OK} Downloaded twozero.tox to ${TOX_PATH}"
else
echo -e " ${FAIL} Failed to download twozero.tox from ${TWOZERO_URL}"
echo " Please download manually and place at ${TOX_PATH}"
manual_steps+=("Download twozero.tox from ${TWOZERO_URL} to ${TOX_PATH}")
fi
fi
# ── 3. Ensure Hermes config has twozero_td MCP entry ──
if [[ ! -f "$HERMES_CFG" ]]; then
echo -e " ${FAIL} Hermes config not found at ${HERMES_CFG}"
manual_steps+=("Create ${HERMES_CFG} with twozero_td MCP server entry")
elif grep -q 'twozero_td' "$HERMES_CFG" 2>/dev/null; then
echo -e " ${OK} twozero_td MCP entry exists in Hermes config"
else
echo -e " ${WARN} Adding twozero_td MCP entry to Hermes config..."
python3 -c "
import yaml, sys, copy
cfg_path = '$HERMES_CFG'
with open(cfg_path, 'r') as f:
cfg = yaml.safe_load(f) or {}
if 'mcp_servers' not in cfg:
cfg['mcp_servers'] = {}
if 'twozero_td' not in cfg['mcp_servers']:
cfg['mcp_servers']['twozero_td'] = {
'url': '${MCP_ENDPOINT}',
'timeout': 120,
'connect_timeout': 60
}
with open(cfg_path, 'w') as f:
yaml.dump(cfg, f, default_flow_style=False, sort_keys=False)
" 2>/dev/null && echo -e " ${OK} twozero_td MCP entry added to config" \
|| { echo -e " ${FAIL} Could not update config (is PyYAML installed?)"; \
manual_steps+=("Add twozero_td MCP entry to ${HERMES_CFG} manually"); }
manual_steps+=("Restart Hermes session to pick up config change")
fi
# ── 4. Test if MCP port is responding ──
if nc -z 127.0.0.1 "$MCP_PORT" 2>/dev/null; then
echo -e " ${OK} Port ${MCP_PORT} is open"
# ── 5. Verify MCP endpoint responds ──
resp=$(curl -s --max-time 3 "$MCP_ENDPOINT" 2>/dev/null || true)
if [[ -n "$resp" ]]; then
echo -e " ${OK} MCP endpoint responded at ${MCP_ENDPOINT}"
else
echo -e " ${WARN} Port open but MCP endpoint returned empty response"
manual_steps+=("Verify MCP is enabled in twozero settings")
fi
else
echo -e " ${WARN} Port ${MCP_PORT} is not open"
if [[ "$td_running" == true ]]; then
manual_steps+=("In TD: drag twozero.tox into network editor → click Install")
manual_steps+=("Enable MCP: twozero icon → Settings → mcp → 'auto start MCP' → Yes")
else
manual_steps+=("Launch TouchDesigner")
manual_steps+=("Drag twozero.tox into the TD network editor and click Install")
manual_steps+=("Enable MCP: twozero icon → Settings → mcp → 'auto start MCP' → Yes")
fi
fi
# ── Status Report ──
echo -e "\n${CYAN}═══ Status Report ═══${NC}\n"
if [[ ${#manual_steps[@]} -eq 0 ]]; then
echo -e " ${OK} ${GREEN}Fully configured! twozero MCP is ready to use.${NC}\n"
exit 0
else
echo -e " ${WARN} ${YELLOW}Manual steps remaining:${NC}\n"
for i in "${!manual_steps[@]}"; do
echo -e " $((i+1)). ${manual_steps[$i]}"
done
echo ""
exit 1
fi
@@ -1,11 +1,6 @@
---
name: jupyter-live-kernel
description: >
Use a live Jupyter kernel for stateful, iterative Python execution via hamelnb.
Load this skill when the task involves exploration, iteration, or inspecting
intermediate results — data science, ML experimentation, API exploration, or
building up complex code step-by-step. Uses terminal to run CLI commands against
a live Jupyter kernel. No new tools required.
description: "Iterative Python via live Jupyter kernel (hamelnb)."
version: 1.0.0
author: Hermes Agent
license: MIT
@@ -0,0 +1,152 @@
---
name: kanban-orchestrator
description: Decomposition playbook + specialist-roster conventions + anti-temptation rules for an orchestrator profile routing work through Kanban. The "don't do the work yourself" rule and the basic lifecycle are auto-injected into every kanban worker's system prompt; this skill is the deeper playbook when you're specifically playing the orchestrator role.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, orchestration, routing]
related_skills: [kanban-worker]
---
# Kanban Orchestrator — Decomposition Playbook
> The **core worker lifecycle** (including the `kanban_create` fan-out pattern and the "decompose, don't execute" rule) is auto-injected into every kanban process via the `KANBAN_GUIDANCE` system-prompt block. This skill is the deeper playbook when you're an orchestrator profile whose whole job is routing.
## When to use the board (vs. just doing the work)
Create Kanban tasks when any of these are true:
1. **Multiple specialists are needed.** Research + analysis + writing is three profiles.
2. **The work should survive a crash or restart.** Long-running, recurring, or important.
3. **The user might want to interject.** Human-in-the-loop at any step.
4. **Multiple subtasks can run in parallel.** Fan-out for speed.
5. **Review / iteration is expected.** A reviewer profile loops on drafter output.
6. **The audit trail matters.** Board rows persist in SQLite forever.
If *none* of those apply — it's a small one-shot reasoning task — use `delegate_task` instead or answer the user directly.
## The anti-temptation rules
Your job description says "route, don't execute." The rules that enforce that:
- **Do not execute the work yourself.** Your restricted toolset usually doesn't even include terminal/file/code/web for implementation. If you find yourself "just fixing this quickly" — stop and create a task for the right specialist.
- **For any concrete task, create a Kanban task and assign it.** Every single time.
- **If no specialist fits, ask the user which profile to create.** Do not default to doing it yourself under "close enough."
- **Decompose, route, and summarize — that's the whole job.**
## The standard specialist roster (convention)
Unless the user's setup has customized profiles, assume these exist. Adjust to whatever the user actually has — ask if you're unsure.
| Profile | Does | Typical workspace |
|---|---|---|
| `researcher` | Reads sources, gathers facts, writes findings | `scratch` |
| `analyst` | Synthesizes, ranks, de-dupes. Consumes multiple `researcher` outputs | `scratch` |
| `writer` | Drafts prose in the user's voice | `scratch` or `dir:` into their Obsidian vault |
| `reviewer` | Reads output, leaves findings, gates approval | `scratch` |
| `backend-eng` | Writes server-side code | `worktree` |
| `frontend-eng` | Writes client-side code | `worktree` |
| `ops` | Runs scripts, manages services, handles deployments | `dir:` into ops scripts repo |
| `pm` | Writes specs, acceptance criteria | `scratch` |
## Decomposition playbook
### Step 1 — Understand the goal
Ask clarifying questions if the goal is ambiguous. Cheap to ask; expensive to spawn the wrong fleet.
### Step 2 — Sketch the task graph
Before creating anything, draft the graph out loud (in your response to the user). Example for "Analyze whether we should migrate to Postgres":
```
T1 researcher research: Postgres cost vs current
T2 researcher research: Postgres performance vs current
T3 analyst synthesize migration recommendation parents: T1, T2
T4 writer draft decision memo parents: T3
```
Show this to the user. Let them correct it before you create anything.
### Step 3 — Create tasks and link
```python
t1 = kanban_create(
title="research: Postgres cost vs current",
assignee="researcher",
body="Compare estimated infrastructure costs, migration costs, and ongoing ops costs over a 3-year window. Sources: AWS/GCP pricing, team time estimates, current Postgres bills from peers.",
tenant=os.environ.get("HERMES_TENANT"),
)["task_id"]
t2 = kanban_create(
title="research: Postgres performance vs current",
assignee="researcher",
body="Compare query latency, throughput, and scaling characteristics at our expected data volume (~500GB, 10k QPS peak). Sources: benchmark papers, public case studies, pgbench results if easy.",
)["task_id"]
t3 = kanban_create(
title="synthesize migration recommendation",
assignee="analyst",
body="Read the findings from T1 (cost) and T2 (performance). Produce a 1-page recommendation with explicit trade-offs and a go/no-go call.",
parents=[t1, t2],
)["task_id"]
t4 = kanban_create(
title="draft decision memo",
assignee="writer",
body="Turn the analyst's recommendation into a 2-page memo for the CTO. Match the tone of previous decision memos in the team's knowledge base.",
parents=[t3],
)["task_id"]
```
`parents=[...]` gates promotion — children stay in `todo` until every parent reaches `done`, then auto-promote to `ready`. No manual coordination needed; the dispatcher and dependency engine handle it.
### Step 4 — Complete your own task
If you were spawned as a task yourself (e.g. `planner` profile was assigned `T0: "investigate Postgres migration"`), mark it done with a summary of what you created:
```python
kanban_complete(
summary="decomposed into T1-T4: 2 researchers parallel, 1 analyst on their outputs, 1 writer on the recommendation",
metadata={
"task_graph": {
"T1": {"assignee": "researcher", "parents": []},
"T2": {"assignee": "researcher", "parents": []},
"T3": {"assignee": "analyst", "parents": ["T1", "T2"]},
"T4": {"assignee": "writer", "parents": ["T3"]},
},
},
)
```
### Step 5 — Report back to the user
Tell them what you created in plain prose:
> I've queued 4 tasks:
> - **T1** (researcher): cost comparison
> - **T2** (researcher): performance comparison, in parallel with T1
> - **T3** (analyst): synthesizes T1 + T2 into a recommendation
> - **T4** (writer): turns T3 into a CTO memo
>
> The dispatcher will pick up T1 and T2 now. T3 starts when both finish. You'll get a gateway ping when T4 completes. Use the dashboard or `hermes kanban tail <id>` to follow along.
## Common patterns
**Fan-out + fan-in (research → synthesize):** N `researcher` tasks with no parents, one `analyst` task with all of them as parents.
**Pipeline with gates:** `pm → backend-eng → reviewer`. Each stage's `parents=[previous_task]`. Reviewer blocks or completes; if reviewer blocks, the operator unblocks with feedback and respawns.
**Same-profile queue:** 50 tasks, all assigned to `translator`, no dependencies between them. Dispatcher serializes — translator processes them in priority order, accumulating experience in their own memory.
**Human-in-the-loop:** Any task can `kanban_block()` to wait for input. Dispatcher respawns after `/unblock`. The comment thread carries the full context.
## Pitfalls
**Reassignment vs. new task.** If a reviewer blocks with "needs changes," create a NEW task linked from the reviewer's task — don't re-run the same task with a stern look. The new task is assigned to the original implementer profile.
**Argument order for links.** `kanban_link(parent_id=..., child_id=...)` — parent first. Mixing them up demotes the wrong task to `todo`.
**Don't pre-create the whole graph if the shape depends on intermediate findings.** If T3's structure depends on what T1 and T2 find, let T3 exist as a "synthesize findings" task whose own first step is to read parent handoffs and plan the rest. Orchestrators can spawn orchestrators.
**Tenant inheritance.** If `HERMES_TENANT` is set in your env, pass `tenant=os.environ.get("HERMES_TENANT")` on every `kanban_create` call so child tasks stay in the same namespace.
@@ -0,0 +1,134 @@
---
name: kanban-worker
description: Pitfalls, examples, and edge cases for Hermes Kanban workers. The lifecycle itself is auto-injected into every worker's system prompt as KANBAN_GUIDANCE (from agent/prompt_builder.py); this skill is what you load when you want deeper detail on specific scenarios.
version: 2.0.0
metadata:
hermes:
tags: [kanban, multi-agent, collaboration, workflow, pitfalls]
related_skills: [kanban-orchestrator]
---
# Kanban Worker — Pitfalls and Examples
> You're seeing this skill because the Hermes Kanban dispatcher spawned you as a worker with `--skills kanban-worker` — it's loaded automatically for every dispatched worker. The **lifecycle** (6 steps: orient → work → heartbeat → block/complete) also lives in the `KANBAN_GUIDANCE` block that's auto-injected into your system prompt. This skill is the deeper detail: good handoff shapes, retry diagnostics, edge cases.
## Workspace handling
Your workspace kind determines how you should behave inside `$HERMES_KANBAN_WORKSPACE`:
| Kind | What it is | How to work |
|---|---|---|
| `scratch` | Fresh tmp dir, yours alone | Read/write freely; it gets GC'd when the task is archived. |
| `dir:<path>` | Shared persistent directory | Other runs will read what you write. Treat it like long-lived state. Path is guaranteed absolute (the kernel rejects relative paths). |
| `worktree` | Git worktree at the resolved path | If `.git` doesn't exist, run `git worktree add <path> <branch>` from the main repo first, then cd and work normally. Commit work here. |
## Tenant isolation
If `$HERMES_TENANT` is set, the task belongs to a tenant namespace. When reading or writing persistent memory, prefix memory entries with the tenant so context doesn't leak across tenants:
- Good: `business-a: Acme is our biggest customer`
- Bad (leaks): `Acme is our biggest customer`
## Good summary + metadata shapes
The `kanban_complete(summary=..., metadata=...)` handoff is how downstream workers read what you did. Patterns that work:
**Coding task:**
```python
kanban_complete(
summary="shipped rate limiter — token bucket, keys on user_id with IP fallback, 14 tests pass",
metadata={
"changed_files": ["rate_limiter.py", "tests/test_rate_limiter.py"],
"tests_run": 14,
"tests_passed": 14,
"decisions": ["user_id primary, IP fallback for unauthenticated requests"],
},
)
```
**Research task:**
```python
kanban_complete(
summary="3 competing libraries reviewed; vLLM wins on throughput, SGLang on latency, Tensorrt-LLM on memory efficiency",
metadata={
"sources_read": 12,
"recommendation": "vLLM",
"benchmarks": {"vllm": 1.0, "sglang": 0.87, "trtllm": 0.72},
},
)
```
**Review task:**
```python
kanban_complete(
summary="reviewed PR #123; 2 blocking issues found (SQL injection in /search, missing CSRF on /settings)",
metadata={
"pr_number": 123,
"findings": [
{"severity": "critical", "file": "api/search.py", "line": 42, "issue": "raw SQL concat"},
{"severity": "high", "file": "api/settings.py", "issue": "missing CSRF middleware"},
],
"approved": False,
},
)
```
Shape `metadata` so downstream parsers (reviewers, aggregators, schedulers) can use it without re-reading your prose.
## Block reasons that get answered fast
Bad: `"stuck"` — the human has no context.
Good: one sentence naming the specific decision you need. Leave longer context as a comment instead.
```python
kanban_comment(
task_id=os.environ["HERMES_KANBAN_TASK"],
body="Full context: I have user IPs from Cloudflare headers but some users are behind NATs with thousands of peers. Keying on IP alone causes false positives.",
)
kanban_block(reason="Rate limit key choice: IP (simple, NAT-unsafe) or user_id (requires auth, skips anonymous endpoints)?")
```
The block message is what appears in the dashboard / gateway notifier. The comment is the deeper context a human reads when they open the task.
## Heartbeats worth sending
Good heartbeats name progress: `"epoch 12/50, loss 0.31"`, `"scanned 1.2M/2.4M rows"`, `"uploaded 47/120 videos"`.
Bad heartbeats: `"still working"`, empty notes, sub-second intervals. Every few minutes max; skip entirely for tasks under ~2 minutes.
## Retry scenarios
If you open the task and `kanban_show` returns `runs: [...]` with one or more closed runs, you're a retry. The prior runs' `outcome` / `summary` / `error` tell you what didn't work. Don't repeat that path. Typical retry diagnostics:
- `outcome: "timed_out"` — the previous attempt hit `max_runtime_seconds`. You may need to chunk the work or shorten it.
- `outcome: "crashed"` — OOM or segfault. Reduce memory footprint.
- `outcome: "spawn_failed"` + `error: "..."` — usually a profile config issue (missing credential, bad PATH). Ask the human via `kanban_block` instead of retrying blindly.
- `outcome: "reclaimed"` + `summary: "task archived..."` — operator archived the task out from under the previous run; you probably shouldn't be running at all, check status carefully.
- `outcome: "blocked"` — a previous attempt blocked; the unblock comment should be in the thread by now.
## Do NOT
- Call `delegate_task` as a substitute for `kanban_create`. `delegate_task` is for short reasoning subtasks inside YOUR run; `kanban_create` is for cross-agent handoffs that outlive one API loop.
- Modify files outside `$HERMES_KANBAN_WORKSPACE` unless the task body says to.
- Create follow-up tasks assigned to yourself — assign to the right specialist.
- Complete a task you didn't actually finish. Block it instead.
## Pitfalls
**Task state can change between dispatch and your startup.** Between when the dispatcher claimed and when your process actually booted, the task may have been blocked, reassigned, or archived. Always `kanban_show` first. If it reports `blocked` or `archived`, stop — you shouldn't be running.
**Workspace may have stale artifacts.** Especially `dir:` and `worktree` workspaces can have files from previous runs. Read the comment thread — it usually explains why you're running again and what state the workspace is in.
**Don't rely on the CLI when the guidance is available.** The `kanban_*` tools work across all terminal backends (Docker, Modal, SSH). `hermes kanban <verb>` from your terminal tool will fail in containerized backends because the CLI isn't installed there. When in doubt, use the tool.
## CLI fallback (for scripting)
Every tool has a CLI equivalent for human operators and scripts:
- `kanban_show``hermes kanban show <id> --json`
- `kanban_complete``hermes kanban complete <id> --summary "..." --metadata '{...}'`
- `kanban_block``hermes kanban block <id> "reason"`
- `kanban_create``hermes kanban create "title" --assignee <profile> [--parent <id>]`
- etc.
Use the tools from inside an agent; the CLI exists for the human at the terminal.
@@ -1,6 +1,6 @@
---
name: webhook-subscriptions
description: Create and manage webhook subscriptions for event-driven agent activation, or for direct push notifications (zero LLM cost). Use when the user wants external services to trigger agent runs OR push notifications to chats.
description: "Webhook subscriptions: event-driven agent runs."
version: 1.1.0
metadata:
hermes:
@@ -1,6 +1,6 @@
---
name: dogfood
description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
description: "Exploratory QA of web apps: find bugs, evidence, reports."
version: 1.0.0
metadata:
hermes:
@@ -1,6 +1,6 @@
---
name: himalaya
description: CLI to manage emails via IMAP/SMTP. Use himalaya to list, read, write, reply, forward, search, and organize emails from the terminal. Supports multiple accounts and message composition with MML (MIME Meta Language).
description: "Himalaya CLI: IMAP/SMTP email from terminal."
version: 1.0.0
author: community
license: MIT
@@ -1,6 +1,6 @@
---
name: minecraft-modpack-server
description: Set up a modded Minecraft server from a CurseForge/Modrinth server pack zip. Covers NeoForge/Forge install, Java version, JVM tuning, firewall, LAN config, backups, and launch scripts.
description: "Host modded Minecraft servers (CurseForge, Modrinth)."
tags: [minecraft, gaming, server, neoforge, forge, modpack]
---
@@ -1,6 +1,6 @@
---
name: pokemon-player
description: Play Pokemon games autonomously via headless emulation. Starts a game server, reads structured game state from RAM, makes strategic decisions, and sends button inputs — all from the terminal.
description: "Play Pokemon via headless emulator + RAM reads."
tags: [gaming, pokemon, emulator, pyboy, gameplay, gameboy]
---
# Pokemon Player
@@ -1,6 +1,6 @@
---
name: codebase-inspection
description: Inspect and analyze codebases using pygount for LOC counting, language breakdown, and code-vs-comment ratios. Use when asked to check lines of code, repo size, language composition, or codebase stats.
description: "Inspect codebases w/ pygount: LOC, languages, ratios."
version: 1.0.0
author: Hermes Agent
license: MIT

Some files were not shown because too many files have changed in this diff Show More