New commands:
- `transcribe`: Uses Whisper to generate word-level timestamps from video
- `align`: Matches manuscript markers to transcript, outputs transcript.csv
Workflow:
1. gnommo transcribe video.mov → video.transcript.json
2. gnommo align project/ → transcript.csv with markers at aligned times
Alignment uses fuzzy text matching to find the first phrase after each
marker in the manuscript, then locates it in the transcript. Applies
configurable offset (default -1s) so slides appear before speech.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
A code-first, declarative video editing system that compiles text
documents into rendered video via FFmpeg. Uses a compiler-style
ETL pipeline: Extract (parse inputs) → Validate → Transform
(build timeline) → Render (FFmpeg).
Features:
- Text-based project definition (manuscript, transcript, JSON configs)
- Slide markers [S1], [S2] in transcript map to timed overlays
- Strict validation with fail-fast error reporting
- FFmpeg filter_complex generation with time-based enables
- CLI with validate/render/dry-run modes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>