Adding changes version 1

This commit is contained in:
2026-02-06 17:56:05 +01:00
parent 93fa820275
commit fdd275ac0e
30 changed files with 7068 additions and 888 deletions
+317
View File
@@ -0,0 +1,317 @@
# Partial Rendering Specification
## Overview
Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
- Faster iteration during development
- Re-rendering specific sections after fixes
- Parallel rendering of segments that can be concatenated later
## Scope (v1)
**In scope:**
- Camera state tracking (cumulative state must be computed from t=0)
- Time offset adjustment for all events
- Slide range filtering
- Input video seeking
**Out of scope (v1):**
- Audio events crossing range boundaries
- Triggered video duration edge cases
- Events are assumed to begin at their marker timestamp and never "carry over"
## Current Architecture Analysis
### 1. Camera State Management
**Current behavior** (`transformer.py:250-332`):
- Camera state is **cumulative** across the transcript
- `_extract_camera_events()` walks through ALL markers sequentially
- Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
- Example: `[Zoom2]` then `[TiltLeft]` = both zoom AND tilt active
**Problem for partial rendering**:
If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
**Solution**:
Separate "state computation" from "event generation":
1. Always walk through ALL transcript markers to compute cumulative state
2. Track the "initial state" at the start of the render range
3. Only emit CameraEvents for markers WITHIN the render range
4. First event in partial render must transition FROM the computed initial state
### 2. Time Signature Adjustment
**Current behavior**:
All timing uses absolute timestamps from `transcript.csv`:
- `SlideEvent.start_time/end_time`
- `VideoEvent.start_time/end_time`
- `AudioEvent.start_time`
- `CameraEvent.time`
- FFmpeg expressions: `enable=between(t, start, end)`
- Camera animation: `if(between(t, 1.000, 1.200), ...)`
**Problem for partial rendering**:
If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
**Solution**:
Apply a `time_offset` to all events after extraction:
```
new_time = original_time - time_offset
```
Where `time_offset` = start time of first slide/event in range.
### 3. Input Video Seeking
**Current behavior**:
- Always-visible videos (talking head) start from the beginning
- FFmpeg processes entire input duration
**Problem for partial rendering**:
Need to seek into source videos to the correct position.
**Solution**:
Add `-ss <seek_time>` before input files for always-visible videos:
```
ffmpeg -ss 10.0 -i talking_head.mov ...
```
---
## Proposed API
### Command Line Interface
```bash
# Render full video (current behavior)
gnommo render example/project.json output.mp4
# Render specific slide range
gnommo render example/project.json output.mp4 --slides S1:S10
gnommo render example/project.json output.mp4 --slides S10:S20
gnommo render example/project.json output.mp4 --slides S5: # S5 to end
# Render specific time range (alternative)
gnommo render example/project.json output.mp4 --time 0:60
gnommo render example/project.json output.mp4 --time 60:120
```
### Internal API
New parameters for `build_render_plan()`:
```python
def build_render_plan(
...
slide_range: Optional[tuple[str, Optional[str]]] = None, # (start_slide, end_slide)
# OR
time_range: Optional[tuple[float, Optional[float]]] = None, # (start_time, end_time)
) -> RenderPlan:
```
New field on `RenderPlan`:
```python
@dataclass
class RenderPlan:
...
time_offset: float = 0.0 # Offset to subtract from all timestamps
initial_camera_state: CameraState = field(default_factory=CameraState) # State at render start
input_seek_time: float = 0.0 # Seek position for input videos
```
---
## Implementation Details
### Phase 1: Compute Full State, Filter Events
Modify `_extract_camera_events()` to accept a time range:
```python
def _extract_camera_events(
transcript: list[TimedWord],
time_range: Optional[tuple[float, float]] = None, # (start, end)
) -> tuple[list[CameraEvent], CameraState]:
"""
Returns:
- List of CameraEvents within time_range
- Initial CameraState at start of time_range
"""
events: list[CameraEvent] = []
current_state = CameraState()
initial_state = CameraState()
start_time, end_time = time_range or (0.0, float('inf'))
found_start = False
for timed_word in transcript:
if not timed_word.is_marker:
continue
marker_id = timed_word.marker_id
if not marker_id or marker_id not in CAMERA_PRESETS:
continue
# Always update current_state (full walk)
preset = CAMERA_PRESETS[marker_id]
new_state = _apply_preset(current_state, marker_id, preset)
# Capture state just before we enter the render range
if not found_start and timed_word.time >= start_time:
initial_state = current_state # State BEFORE this marker
found_start = True
# Only emit events within range
if start_time <= timed_word.time < end_time:
events.append(CameraEvent(
time=timed_word.time,
target_state=new_state,
duration=0.2,
easing="ease-out",
))
current_state = new_state
return events, initial_state
```
### Phase 2: Apply Time Offset
After extracting events, apply offset to all timestamps:
```python
def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
"""Shift all timestamps by offset (subtract offset from all times)."""
# Adjust slide events
for event in plan.slide_events:
event.start_time -= offset
event.end_time -= offset
# Adjust video events
for event in plan.video_events:
event.start_time -= offset
event.end_time -= offset
# Adjust audio events
for event in plan.audio_events:
event.start_time = max(0, event.start_time - offset)
# Adjust camera events
for event in plan.camera_events:
event.time -= offset
# Adjust total duration
plan.total_duration -= offset
plan.time_offset = offset
plan.input_seek_time = offset
return plan
```
### Phase 3: FFmpeg Seeking
Modify `build_ffmpeg_command()` to add seeking:
```python
def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
cmd = ["ffmpeg", "-y"]
# Add seek for always-visible videos
for video_id, video_source, cutout in plan.narration_videos:
video_path = _resolve_video_path(videos_dir, video_source)
if plan.input_seek_time > 0:
cmd.extend(["-ss", str(plan.input_seek_time)]) # Seek BEFORE -i
cmd.extend(["-i", str(video_path)])
...
```
### Phase 4: Initial Camera State Handling
If `initial_camera_state` is not default, inject a "virtual" camera event at t=0:
```python
def build_camera_transform(
camera_events: list[CameraEvent],
initial_state: CameraState, # NEW PARAMETER
...
) -> str:
# If initial state differs from default, prepend a virtual event
if not initial_state.is_default():
initial_event = CameraEvent(
time=0.0,
target_state=initial_state,
duration=0.0, # Instant - no transition
easing="linear",
)
camera_events = [initial_event] + camera_events
...
```
---
## FFmpeg Optimization
**Only emit filters for events within range.**
When rendering a partial range, the `RenderPlan` should only contain events within that range. This means:
- Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
- Fewer overlay filters in filter_complex
- Fewer `between(t, start, end)` enable expressions to evaluate per frame
Example: Full video has 50 slides, rendering S40:S50 only:
- **Before**: 50 slide inputs, 50 overlay filters
- **After**: 10 slide inputs, 10 overlay filters
This is achieved naturally by filtering events in `build_render_plan()` before constructing the plan - the renderer already only processes events present in the plan.
---
## Edge Cases (v1 Simplified)
### 1. Camera state from before range
If rendering S5:S10 but there's a camera event at the S4 marker:
- Camera state from S4 must be captured as `initial_camera_state`
- Rendered output starts with that state already applied at t=0
### 2. Events filter by marker position
All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
- Events beginning outside range are excluded
- No "carry over" or boundary-crossing logic needed
---
## Testing Strategy
### Unit Tests
1. Camera state computation maintains state across full transcript
2. Time offset correctly shifts all event types
3. Initial camera state correctly captured at boundary
### Integration Tests
1. Render slides 1-5, then 5-10, concatenate, compare to full render
2. Camera state continuity across segment boundaries
3. Audio alignment after seeking
### Manual Verification
1. Visual inspection of camera state at segment boundaries
2. Audio sync verification
---
## Future Enhancements
### Parallel Rendering Pipeline
```bash
# Render in parallel, then concatenate
gnommo render proj.json seg1.mp4 --slides S1:S10 &
gnommo render proj.json seg2.mp4 --slides S10:S20 &
gnommo render proj.json seg3.mp4 --slides S20: &
wait
ffmpeg -f concat -i segments.txt -c copy final.mp4
```
### Smart Re-rendering
Track which slides changed and only re-render affected segments.
### Preview Mode
Quick low-quality render of specific section for review.
+265
View File
@@ -0,0 +1,265 @@
# Virtual Camera Effects
Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
These effects are triggered by markers in the manuscript, just like slides.
## Zoom Effects
| Marker | Description |
|--------|-------------|
| `[Zoom1]` | Zoom to 110% - subtle emphasis |
| `[Zoom2]` | Zoom to 125% - moderate emphasis |
| `[Zoom3]` | Zoom to 150% - strong emphasis |
| `[Zoom0]` | Return to 100% (default) |
| `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |
**Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.
## Tilt/Rotation Effects
| Marker | Description |
|--------|-------------|
| `[TiltLeft]` | Rotate -15 degrees |
| `[TiltRight]` | Rotate +15 degrees |
| `[NoTilt]` | Return to 0 degrees |
| `[TiltShake]` | Quick left-right shake (confusion/emphasis) |
**Use case:** Tilt when saying something "off" or wrong, return to flat for correction.
## Pan/Position Effects
| Marker | Description |
|--------|-------------|
| `[PanLeft]` | Shift frame left (subject moves right) |
| `[PanRight]` | Shift frame right (subject moves left) |
| `[PanUp]` | Shift frame up |
| `[PanDown]` | Shift frame down |
| `[PanCenter]` | Return to center |
**Use case:** Pan to make room for a slide appearing on one side.
## Shake/Movement Effects
| Marker | Description |
|--------|-------------|
| `[Shake]` | Brief screen shake (impact, surprise) |
| `[ShakeHard]` | Intense shake (explosion, error) |
| `[Wobble]` | Gentle continuous wobble |
| `[NoWobble]` | Stop wobble |
**Use case:** Shake on "WRONG!" or when something crashes/fails.
## Speed/Rhythm Effects
| Marker | Description |
|--------|-------------|
| `[Beat]` | Single visual pulse (scale bump) |
| `[BeatStart]` | Start pulsing to rhythm |
| `[BeatStop]` | Stop pulsing |
**Use case:** Rhythmic emphasis during lists or key points.
## Transition Effects
| Marker | Description |
|--------|-------------|
| `[Flash]` | Quick white flash |
| `[Blackout]` | Brief black frame |
| `[Glitch]` | Digital glitch effect |
**Use case:** Transition between topics or for "record scratch" moments.
## Picture-in-Picture Variations
| Marker | Description |
|--------|-------------|
| `[PipGrow]` | Enlarge talking head cutout |
| `[PipShrink]` | Shrink talking head cutout |
| `[PipHide]` | Temporarily hide talking head |
| `[PipShow]` | Restore talking head |
| `[PipMove:corner]` | Move pip to different corner |
**Use case:** Shrink self when showing important diagram, grow when making personal point.
## Combination Presets
| Marker | Description |
|--------|-------------|
| `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
| `[Surprise]` | Quick zoom + shake |
| `[Sarcasm]` | Slow zoom + tilt |
| `[Reset]` | Return all effects to default |
---
## Architecture: The Camera Abstraction
### The Core Insight
All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
The **camera** views the scene. When the camera zooms, tilts, or pans - everything
moves together, just like a real camera filming a physical set.
```
┌─────────────────────────────────────────────────────────┐
│ SCENE │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Background Layer │ │
│ │ ┌─────────────┐ │ │
│ │ │ Talking Head│ ┌──────────────────┐ │ │
│ │ │ (cutout) │ │ Slide │ │ │
│ │ └─────────────┘ │ (from .png) │ │ │
│ │ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────┐
│ CAMERA │
│ zoom: 1.25 │
│ tilt: -15° │
│ pan: 0, 0 │
└─────────────┘
┌─────────────────┐
│ Final Output │
│ (1920x1080) │
└─────────────────┘
```
### Why This Matters
**Keynote slides are designed for a specific frame.** If you create a slide with
an arrow pointing at where the talking head cutout will be, that spatial
relationship must be preserved when the camera zooms or tilts.
If we zoomed only the background and not the slides, the arrow would point to
the wrong place. The camera abstraction ensures everything transforms together.
### Camera Properties
```python
@dataclass
class CameraState:
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%
rotation: float = 0.0 # degrees, positive = clockwise
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame
@dataclass
class CameraKeyframe:
time: float # timestamp in seconds
state: CameraState
easing: str = "linear" # linear, ease-in, ease-out, ease-in-out
```
### Rendering Pipeline (Updated)
```
Current Pipeline:
Parse → Validate → Transform → Render
build_filter_complex()
[bg] → overlays → [vout]
New Pipeline:
Parse → Validate → Transform → Render
Extract camera
keyframes from
markers
build_filter_complex()
[bg] → overlays → [scene]
apply_camera_transform()
[scene] → zoom/rotate/pan → [vout]
```
### FFmpeg Implementation
The camera transform is a **final filter stage** applied to the composed scene:
```
# Compose scene (existing code)
[0:v]scale=1920:1080[bg];
[bg][slide1]overlay=...[s1];
[s1][talkinghead]overlay=...[scene];
# Camera transform (new)
[scene]scale=iw*{zoom}:ih*{zoom},
rotate={rotation}*PI/180:fillcolor=black,
crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
```
For smooth animated zoom (using expressions):
```
[scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
x='iw/2-(iw/zoom/2)':
y='ih/2-(ih/zoom/2)':
d=1:s=1920x1080:fps=30[vout]
```
### Camera Events in Timeline
New model for camera changes:
```python
@dataclass
class CameraEvent:
time: float
target_state: CameraState
duration: float = 0.0 # 0 = instant snap
easing: str = "ease-out"
```
Markers map to camera events:
- `[Zoom2]``CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
- `[TiltLeft]``CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
- `[Reset]``CameraEvent(time=t, target_state=CameraState(), duration=0.2)`
### Considerations
1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
larger than output (e.g., 2x) to have room for zoom without quality loss.
2. **Rotation center**: Rotate around frame center, not corner.
3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
are both active. `[Reset]` clears all.
4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
transform naturally with the camera. No special handling needed.
5. **Slides stay synced**: Keynote exports are positioned for the base frame.
Camera zoom/tilt transforms them identically to everything else.
---
## Implementation Plan
### Phase 1: Camera Data Model ✓
- [x] Add `CameraState` and `CameraEvent` to models.py
- [x] Add camera effect markers to transformer.py
- [x] Generate camera keyframes from markers
### Phase 2: Render Pipeline ✓
- [x] Modify renderer to compose to `[scene]` instead of `[vout]`
- [x] Add camera transform stage after composition
- [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now
### Phase 3: Smooth Animation (partial)
- [x] Support animated transitions between keyframes (linear interpolation)
- [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
- [ ] Test with rapid zoom sequences
### Phase 4: Effect Presets ✓
- [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
- [x] Presets defined in `CAMERA_PRESETS` dict in models.py
- [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement
+10
View File
@@ -0,0 +1,10 @@
[
{
"reference": "Gnommo Documentation - https://github.com/example/gnommo",
"context": ""
},
{
"reference": "FFmpeg Documentation - https://ffmpeg.org/documentation.html",
"context": ""
}
]
+17 -3
View File
@@ -1,5 +1,19 @@
Welcome to GnommoEditor, a code-first video editing system. [S1]
[S1]
This is the first slide. It appears immediately. [cite:Gnommo Documentation - https://github.com/example/gnommo]
In this example, we demonstrate how slides appear at specific timestamps based on markers in the transcript. [S2]
[S2]
However, this is the second slide. It should appear 1 second prior to when I say "however"
And that's the end of our demo.
[S3]
[video:Zoomin_MontageZoom]
This is me talking alongside a video. The video is constrained within the red square. Notice how the video stops immediately when we make the transition to the next slide. [cite:FFmpeg Documentation - https://ffmpeg.org/documentation.html]
[S4]
I will continue to talk without pause, but in the finished recording - there will be a pause before the narration continues. Now a video will play that pauses the narration
[S5]
[video:gnommologo]
Notice how my voice continues after the video finished.
[S6]
+26
View File
@@ -0,0 +1,26 @@
{
"S1": {
"image": "example.001.png",
"type": "fullscreen"
},
"S2": {
"image": "example.002.png",
"type": "fullscreen"
},
"S3": {
"image": "example.003.png",
"type": "fullscreen"
},
"S4": {
"image": "example.004.png",
"type": "fullscreen"
},
"S5": {
"image": "example.005.png",
"type": "fullscreen"
},
"S6": {
"image": "example.006.png",
"type": "fullscreen"
}
}
@@ -0,0 +1,2 @@
file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/talking_head_batch0.mov'
file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/segments/segment_0002.mov'
@@ -0,0 +1,497 @@
[
{
"word": "This",
"start": 10.72,
"end": 11.4
},
{
"word": "is",
"start": 11.4,
"end": 11.6
},
{
"word": "the",
"start": 11.6,
"end": 11.78
},
{
"word": "first",
"start": 11.78,
"end": 11.98
},
{
"word": "slide.",
"start": 11.98,
"end": 12.44
},
{
"word": "It",
"start": 13.02,
"end": 13.3
},
{
"word": "appears",
"start": 13.3,
"end": 13.66
},
{
"word": "immediately.",
"start": 13.66,
"end": 14.3
},
{
"word": "However,",
"start": 15.34,
"end": 16.02
},
{
"word": "this",
"start": 16.34,
"end": 16.46
},
{
"word": "is",
"start": 16.46,
"end": 16.58
},
{
"word": "the",
"start": 16.58,
"end": 16.76
},
{
"word": "second",
"start": 16.76,
"end": 17.04
},
{
"word": "slide.",
"start": 17.04,
"end": 17.4
},
{
"word": "It",
"start": 17.74,
"end": 17.96
},
{
"word": "should",
"start": 17.96,
"end": 18.2
},
{
"word": "appear",
"start": 18.2,
"end": 18.54
},
{
"word": "one",
"start": 18.54,
"end": 18.98
},
{
"word": "second",
"start": 18.98,
"end": 19.46
},
{
"word": "prior",
"start": 19.46,
"end": 19.88
},
{
"word": "to",
"start": 19.88,
"end": 20.1
},
{
"word": "the",
"start": 20.1,
"end": 20.22
},
{
"word": "word",
"start": 20.22,
"end": 20.52
},
{
"word": "to",
"start": 20.52,
"end": 21.14
},
{
"word": "say",
"start": 21.14,
"end": 21.42
},
{
"word": "whoever",
"start": 21.42,
"end": 21.8
},
{
"word": "the",
"start": 21.8,
"end": 22.16
},
{
"word": "first",
"start": 22.16,
"end": 22.4
},
{
"word": "time.",
"start": 22.4,
"end": 22.68
},
{
"word": "This",
"start": 24.28,
"end": 24.96
},
{
"word": "is",
"start": 24.96,
"end": 25.12
},
{
"word": "me",
"start": 25.12,
"end": 25.36
},
{
"word": "taking,",
"start": 25.36,
"end": 25.74
},
{
"word": "talking",
"start": 26.12,
"end": 27.12
},
{
"word": "alongside",
"start": 27.12,
"end": 27.64
},
{
"word": "a",
"start": 27.64,
"end": 27.88
},
{
"word": "video.",
"start": 27.88,
"end": 28.16
},
{
"word": "The",
"start": 28.16,
"end": 28.92
},
{
"word": "video",
"start": 28.92,
"end": 29.18
},
{
"word": "is",
"start": 29.18,
"end": 29.36
},
{
"word": "constrained",
"start": 29.36,
"end": 29.76
},
{
"word": "within",
"start": 29.76,
"end": 30.14
},
{
"word": "the",
"start": 30.14,
"end": 30.32
},
{
"word": "red",
"start": 30.32,
"end": 30.48
},
{
"word": "square.",
"start": 30.48,
"end": 30.9
},
{
"word": "Notice",
"start": 31.26,
"end": 31.44
},
{
"word": "how",
"start": 31.44,
"end": 31.74
},
{
"word": "the",
"start": 31.74,
"end": 31.92
},
{
"word": "video",
"start": 31.92,
"end": 32.14
},
{
"word": "stops",
"start": 32.14,
"end": 32.44
},
{
"word": "immediately",
"start": 32.44,
"end": 32.94
},
{
"word": "when",
"start": 32.94,
"end": 33.36
},
{
"word": "we",
"start": 33.36,
"end": 33.54
},
{
"word": "make",
"start": 33.54,
"end": 33.74
},
{
"word": "the",
"start": 33.74,
"end": 33.94
},
{
"word": "transition",
"start": 33.94,
"end": 34.38
},
{
"word": "to",
"start": 34.38,
"end": 34.68
},
{
"word": "the",
"start": 34.68,
"end": 34.8
},
{
"word": "next",
"start": 34.8,
"end": 35.02
},
{
"word": "slide.",
"start": 35.02,
"end": 35.48
},
{
"word": "I",
"start": 37.18,
"end": 37.72
},
{
"word": "will",
"start": 37.72,
"end": 37.78
},
{
"word": "continue",
"start": 37.78,
"end": 38.08
},
{
"word": "to",
"start": 38.08,
"end": 38.32
},
{
"word": "talk",
"start": 38.32,
"end": 38.56
},
{
"word": "without",
"start": 38.56,
"end": 38.88
},
{
"word": "pause,",
"start": 38.88,
"end": 39.24
},
{
"word": "but",
"start": 39.46,
"end": 39.56
},
{
"word": "in",
"start": 39.56,
"end": 39.68
},
{
"word": "the",
"start": 39.68,
"end": 39.74
},
{
"word": "finished",
"start": 39.74,
"end": 39.98
},
{
"word": "recording",
"start": 39.98,
"end": 40.46
},
{
"word": "there",
"start": 40.46,
"end": 41.18
},
{
"word": "will",
"start": 41.18,
"end": 41.36
},
{
"word": "be",
"start": 41.36,
"end": 41.54
},
{
"word": "a",
"start": 41.54,
"end": 41.64
},
{
"word": "pause",
"start": 41.64,
"end": 41.92
},
{
"word": "before",
"start": 41.92,
"end": 42.28
},
{
"word": "the",
"start": 42.28,
"end": 42.5
},
{
"word": "narration",
"start": 42.5,
"end": 43.0
},
{
"word": "continues.",
"start": 43.0,
"end": 43.64
},
{
"word": "Now",
"start": 44.38,
"end": 44.52
},
{
"word": "a",
"start": 44.52,
"end": 44.68
},
{
"word": "video",
"start": 44.68,
"end": 44.9
},
{
"word": "will",
"start": 44.9,
"end": 45.08
},
{
"word": "play",
"start": 45.08,
"end": 45.36
},
{
"word": "that",
"start": 45.36,
"end": 45.76
},
{
"word": "pauses",
"start": 45.76,
"end": 46.52
},
{
"word": "the",
"start": 46.52,
"end": 46.76
},
{
"word": "narration.",
"start": 46.76,
"end": 47.2
},
{
"word": "Notice",
"start": 48.64,
"end": 49.18
},
{
"word": "how",
"start": 49.18,
"end": 49.42
},
{
"word": "my",
"start": 49.42,
"end": 49.58
},
{
"word": "voice",
"start": 49.58,
"end": 49.8
},
{
"word": "continues",
"start": 49.8,
"end": 50.36
},
{
"word": "after",
"start": 50.36,
"end": 50.84
},
{
"word": "the",
"start": 50.84,
"end": 51.02
},
{
"word": "video",
"start": 51.02,
"end": 51.24
},
{
"word": "finished.",
"start": 51.24,
"end": 51.76
}
]
+39
View File
@@ -0,0 +1,39 @@
{
"talking_head": {
"source_file": "talking_head.mov",
"output_file": "talking_head_processed.mov",
"cutout": "talkinghead",
"always_visible": true,
"filter": [
{
"type": "chroma_key",
"color": [131, 177, 83],
"similarity": 0.04,
"blend": 0.025,
"spill": 0.05
},
{
"type": "mask",
"left": 0.05,
"right": 0.10
}
]
},
"gnommologo": {
"source_file": "Logo.mov",
"is_shared": true,
"cutout": "fullscreen",
"pause_narration": 0 ,
"take": 10,
"skip": 0
},
"Zoomin_MontageZoom": {
"description": "Montage zoom",
"source_file": "MontageZoom.mp4",
"output_file": "MontageZoom.mp4",
"pause_narration":3,
"cutout": "square",
"is_shared": true,
"filter": []
}
}
+31 -7
View File
@@ -1,11 +1,35 @@
{
"id": "VideoExample",
"name": "Example",
"description": "In this video, I demonstrate the Gnommo video editing pipeline - a code-first approach to creating presenter-mode videos from Keynote presentations.",
"footer": "Subscribe for more tutorials!\nTwitter: @example",
"resolution": [1920, 1080],
"fps": 30,
"talkinghead": {
"x": 50,
"y": 600,
"targetheight": 400
},
"defaultSlideType": "square",
"background_video": ""
"gnommo_scratch": null,
"defaultSlideType": "fullscreen",
"keynote_file": "media/example.key",
"transcript": "media/videos/talking_head.transcript.json",
"background": "shared_assets/solarpunk.png",
"videos": "media/videos/videos.json",
"slides": "media/slides/Example/slides.json",
"audio": "media/audio/audio.json",
"main_video": "talking_head",
"cutouts": {
"talkinghead": {
"x": "-10%",
"y": "40%",
"height": "60%"
},
"square": {
"x": "45%",
"y": "3%",
"width": "53%",
"height": "94%"
},
"fullscreen": {
"x": "0%",
"y": "0%",
"height": "100%"
}
}
}
-10
View File
@@ -1,10 +0,0 @@
{
"S1": {
"image": "S1.png",
"type": "square"
},
"S2": {
"image": "S2.png",
"type": "square"
}
}
-8
View File
@@ -1,8 +0,0 @@
t,word
0.00,Hello
0.30,world
0.60,[S1]
1.50,Second
1.80,slide
2.00,[S2]
2.50,End
1 t word
2 0.00 Hello
3 0.30 world
4 0.60 [S1]
5 1.50 Second
6 1.80 slide
7 2.00 [S2]
8 2.50 End
-6
View File
@@ -1,6 +0,0 @@
{
"talking_head": {
"file": "media/talking_head.mp4",
"preprocess": []
}
}
+6 -139
View File
@@ -1,154 +1,21 @@
#!/bin/bash
#
# GnommoEditor - Code-first video editing pipeline
# This is a thin wrapper that activates the venv and runs the Python CLI.
#
# Usage:
# gnommo.sh -p <project> Render project
# gnommo.sh -p <project> import Generate slides.json from image files
# gnommo.sh -p <project> validate Validate only
# gnommo.sh -p <project> preprocess Apply video preprocessing filters
# gnommo.sh -p <project> transcribe Transcribe video
# gnommo.sh -p <project> align Align markers to transcript
# gnommo.sh -p <project> all Full pipeline: transcribe → align → render
# Usage: gnommo -p <project> [action] [options]
# Run with -h for full help.
#
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
VENV_PYTHON="$SCRIPT_DIR/venv/bin/python"
# Check for venv
if [[ ! -f "$VENV_PYTHON" ]]; then
echo "Error: Virtual environment not found at $SCRIPT_DIR/venv"
echo "Create it with: python -m venv venv && ./venv/bin/pip install openai-whisper"
echo "Create it with: python -m venv venv && ./venv/bin/pip install -e . openai-whisper"
exit 1
fi
# Parse arguments
PROJECT=""
COMMAND="render"
VERBOSE=""
FORCE=""
usage() {
echo "Usage: gnommo.sh -p <project> [command] [options]"
echo ""
echo "Commands:"
echo " render Render video (default)"
echo " import Generate slides.json from image files"
echo " validate Validate project only"
echo " preprocess Apply video preprocessing filters (chroma key, etc.)"
echo " transcribe Transcribe video audio"
echo " align Align manuscript to transcript"
echo " all Full pipeline: transcribe → align → render"
echo ""
echo "Options:"
echo " -p <dir> Project directory (required)"
echo " -v Verbose output"
echo " -f Force overwrite existing files"
echo " -h Show this help"
echo ""
echo "Examples:"
echo " gnommo.sh -p video1 # Render video1 project"
echo " gnommo.sh -p video1 import # Generate slides.json"
echo " gnommo.sh -p video1 import -f # Force overwrite slides.json"
echo " gnommo.sh -p video1 validate # Validate only"
echo " gnommo.sh -p video1 all # Full pipeline"
exit 0
}
while [[ $# -gt 0 ]]; do
case $1 in
-p|--project)
PROJECT="$2"
shift 2
;;
-v|--verbose)
VERBOSE="-v"
shift
;;
-f|--force)
FORCE="-f"
shift
;;
-h|--help)
usage
;;
import|validate|render|preprocess|transcribe|align|all)
COMMAND="$1"
shift
;;
*)
echo "Unknown option: $1"
usage
;;
esac
done
# Validate project argument
if [[ -z "$PROJECT" ]]; then
echo "Error: Project directory required (-p <project>)"
echo ""
usage
fi
if [[ ! -d "$PROJECT" ]]; then
echo "Error: Project directory not found: $PROJECT"
exit 1
fi
if [[ ! -f "$PROJECT/project.json" ]]; then
echo "Error: project.json not found in $PROJECT"
exit 1
fi
# Run commands using new CLI interface
run_gnommo() {
"$VENV_PYTHON" -m gnommo -p "$PROJECT" -a "$1" $VERBOSE
}
run_gnommo_import() {
"$VENV_PYTHON" -m gnommo -p "$PROJECT" -a validate -i $FORCE $VERBOSE
}
case $COMMAND in
import)
echo "=== Importing assets for $PROJECT ==="
run_gnommo_import
;;
validate)
echo "=== Validating $PROJECT ==="
run_gnommo validate
;;
transcribe)
echo "=== Transcribing $PROJECT ==="
run_gnommo transcribe
;;
align)
echo "=== Aligning $PROJECT ==="
run_gnommo align
;;
render)
echo "=== Rendering $PROJECT ==="
run_gnommo render
;;
preprocess)
echo "=== Preprocessing $PROJECT ==="
run_gnommo preprocess
;;
all)
echo "=== Full Pipeline: $PROJECT ==="
run_gnommo all
;;
*)
echo "Unknown command: $COMMAND"
usage
;;
esac
# Pass all arguments directly to the Python CLI
exec "$VENV_PYTHON" -m gnommo "$@"
-199
View File
@@ -1,199 +0,0 @@
"""Alignment stage: match manuscript markers to transcript timestamps."""
import csv
import re
from dataclasses import dataclass
from pathlib import Path
from .errors import GnommoError
from .transcriber import TranscribedWord
class AlignmentError(GnommoError):
"""Error during alignment."""
pass
@dataclass
class MarkerAlignment:
"""A marker with its aligned timestamp."""
marker_id: str
timestamp: float
matched_phrase: str
confidence: float # 0-1, how confident the match is
def extract_marker_contexts(manuscript_text: str) -> list[tuple[str, str]]:
"""
Extract markers and the text immediately following them.
Returns:
List of (marker_id, following_text) tuples
"""
# Split by markers, keeping the markers
parts = re.split(r"\[([A-Za-z0-9_]+)\]", manuscript_text)
# parts will be: [text_before, marker1, text_after1, marker2, text_after2, ...]
contexts = []
for i in range(1, len(parts), 2):
marker_id = parts[i]
if i + 1 < len(parts):
following_text = parts[i + 1].strip()
# Get first sentence or first N words
following_text = _get_first_phrase(following_text)
contexts.append((marker_id, following_text))
return contexts
def _get_first_phrase(text: str, max_words: int = 10) -> str:
"""Extract first phrase (up to first sentence end or max_words)."""
# Clean up the text
text = text.replace("\n", " ").strip()
# Find first sentence boundary
match = re.search(r"[.!?]", text)
if match and match.start() < 200:
text = text[: match.start()]
# Limit to max_words
words = text.split()[:max_words]
return " ".join(words)
def normalize_text(text: str) -> str:
"""Normalize text for matching (lowercase, remove punctuation)."""
text = text.lower()
text = re.sub(r"[^\w\s]", "", text)
text = re.sub(r"\s+", " ", text)
return text.strip()
def find_phrase_in_transcript(
phrase: str,
transcript: list[TranscribedWord],
start_from: int = 0,
) -> tuple[int, float]:
"""
Find a phrase in the transcript and return the word index and timestamp.
Uses sliding window matching with normalization.
Returns:
Tuple of (word_index, timestamp) or (-1, 0.0) if not found
"""
phrase_normalized = normalize_text(phrase)
phrase_words = phrase_normalized.split()
if not phrase_words:
return -1, 0.0
# Try to find increasingly shorter prefixes
for length in range(len(phrase_words), 2, -1):
target = " ".join(phrase_words[:length])
# Sliding window through transcript
for i in range(start_from, len(transcript) - length + 1):
window_words = [normalize_text(transcript[j].word) for j in range(i, i + length)]
window_text = " ".join(window_words)
if target in window_text or window_text in target:
return i, transcript[i].start
# Fallback: try to find just the first few words
if len(phrase_words) >= 2:
target = " ".join(phrase_words[:3])
for i in range(start_from, len(transcript) - 2):
window_words = [normalize_text(transcript[j].word) for j in range(i, min(i + 5, len(transcript)))]
window_text = " ".join(window_words)
if phrase_words[0] in window_text and phrase_words[1] in window_text:
return i, transcript[i].start
return -1, 0.0
def align_markers(
manuscript_text: str,
transcript: list[TranscribedWord],
offset_seconds: float = -1.0,
) -> list[MarkerAlignment]:
"""
Align manuscript markers to transcript timestamps.
Args:
manuscript_text: Full manuscript text with [S1], [S2] etc.
transcript: Word-level transcript with timestamps
offset_seconds: Offset to apply to found timestamps (default -1.0)
Returns:
List of MarkerAlignment with timestamps
"""
contexts = extract_marker_contexts(manuscript_text)
alignments: list[MarkerAlignment] = []
last_index = 0
for marker_id, following_text in contexts:
idx, timestamp = find_phrase_in_transcript(
following_text, transcript, start_from=last_index
)
if idx >= 0:
# Apply offset (e.g., -1 second before the word)
adjusted_time = max(0.0, timestamp + offset_seconds)
alignments.append(MarkerAlignment(
marker_id=marker_id,
timestamp=adjusted_time,
matched_phrase=following_text[:50],
confidence=1.0,
))
last_index = idx
else:
# Could not find match - report but continue
alignments.append(MarkerAlignment(
marker_id=marker_id,
timestamp=-1.0, # Indicates not found
matched_phrase=following_text[:50],
confidence=0.0,
))
return alignments
def save_aligned_transcript(
alignments: list[MarkerAlignment],
transcript: list[TranscribedWord],
output_path: Path,
) -> None:
"""
Save aligned transcript as CSV compatible with gnommo's transcript.csv format.
Format:
t,word
0.00,Hello
1.50,[S1]
1.51,This
...
"""
# Build list of (timestamp, word) including markers
entries: list[tuple[float, str]] = []
# Add all words from transcript
for word in transcript:
entries.append((word.start, word.word))
# Add markers at their aligned positions
for alignment in alignments:
if alignment.timestamp >= 0:
entries.append((alignment.timestamp, f"[{alignment.marker_id}]"))
# Sort by timestamp
entries.sort(key=lambda x: x[0])
# Write CSV
with open(output_path, "w", encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(["t", "word"])
for timestamp, word in entries:
writer.writerow([f"{timestamp:.2f}", word])
+894 -152
View File
File diff suppressed because it is too large Load Diff
+359
View File
@@ -0,0 +1,359 @@
"""Description generator: Create YouTube description with chapters, citations, and attributions."""
import re
from dataclasses import dataclass
from pathlib import Path
from typing import Optional
from .models import (
Attribution,
Citation,
ProjectConfig,
SlideDefinition,
VideoSource,
)
from .transcriber import TranscribedWord
@dataclass
class ChapterMarker:
"""A chapter marker with timestamp and title."""
slide_id: str
timestamp: float
title: str
def _format_timestamp(seconds: float) -> str:
"""Format seconds as M:SS or H:MM:SS for YouTube chapters."""
if seconds < 0:
return "0:00"
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
if hours > 0:
return f"{hours}:{minutes:02d}:{secs:02d}"
else:
return f"{minutes}:{secs:02d}"
def _extract_chapter_title(
manuscript_text: str, slide_id: str, slides: dict[str, SlideDefinition]
) -> str:
"""
Extract a chapter title for a slide.
Tries to find meaningful title from:
1. First sentence/line after the slide marker
2. Falls back to slide ID if nothing useful found
"""
# Find the marker and text after it
pattern = rf"\[{re.escape(slide_id)}\]\s*(.+?)(?=\[S\d+\]|\[video:|\[narration:|\Z)"
match = re.search(pattern, manuscript_text, re.DOTALL)
if match:
text = match.group(1).strip()
# Remove any other markers from the text
text = re.sub(r"\[[^\]]+\]", "", text).strip()
if text:
# Take first line or first sentence
first_line = text.split("\n")[0].strip()
# Truncate if too long
if len(first_line) > 50:
# Try to break at word boundary
truncated = first_line[:47]
last_space = truncated.rfind(" ")
if last_space > 30:
truncated = truncated[:last_space]
first_line = truncated + "..."
if first_line:
return first_line
# Fallback to slide number
slide_num = slide_id[1:] if slide_id.startswith("S") else slide_id
return f"Section {slide_num}"
def _align_citation_to_transcription(
citation: Citation,
transcription: list[TranscribedWord],
manuscript_text: str,
) -> float:
"""
Align a citation to the transcription to find its timestamp.
Uses the context text following the citation to find the approximate
position in the audio.
Returns timestamp in seconds, or -1 if not found.
"""
if not transcription or not citation.context:
return -1.0
# Get more context from the manuscript for better matching
# Find the citation in the manuscript and get surrounding text
pattern = rf"\[cite:{re.escape(citation.reference)}\]\s*(.{{0,200}})"
match = re.search(pattern, manuscript_text, re.DOTALL)
if not match:
return -1.0
context_text = match.group(1).strip()
# Clean up: remove markers, normalize whitespace
context_text = re.sub(r"\[[^\]]+\]", "", context_text)
context_text = " ".join(context_text.split())
if not context_text:
return -1.0
# Normalize for matching
context_words = context_text.lower().split()[:10] # Use up to 10 words
if not context_words:
return -1.0
# Build normalized transcription
trans_words = [(w.word.lower(), w.start) for w in transcription]
# Simple sliding window match
best_match_score = 0
best_match_time = -1.0
for i in range(len(trans_words) - len(context_words) + 1):
matches = 0
for j, ctx_word in enumerate(context_words):
trans_word = trans_words[i + j][0]
# Allow partial matches for longer words
if ctx_word == trans_word:
matches += 1
elif len(ctx_word) >= 4 and (
ctx_word in trans_word or trans_word in ctx_word
):
matches += 0.5
score = matches / len(context_words)
if score > best_match_score and score >= 0.5:
best_match_score = score
best_match_time = trans_words[i][1]
return best_match_time
def generate_chapters(
manuscript_text: str,
slides: dict[str, SlideDefinition],
marker_timings: list, # List of MarkerTiming from transformer
min_chapter_duration: float = 30.0,
) -> list[ChapterMarker]:
"""
Generate chapter markers from slide timings.
Args:
manuscript_text: The manuscript content
slides: Slide definitions
marker_timings: Aligned marker timings from the transformer
min_chapter_duration: Minimum seconds between chapters (merges short ones)
Returns:
List of ChapterMarker objects
"""
chapters = []
# Build timing lookup
timing_lookup = {t.marker_id: t.timestamp for t in marker_timings if t.timestamp >= 0}
# Process slides in order
slide_ids = sorted(
[s for s in slides.keys() if s.startswith("S")],
key=lambda x: int(x[1:]) if x[1:].isdigit() else 0,
)
for slide_id in slide_ids:
if slide_id not in timing_lookup:
continue
timestamp = timing_lookup[slide_id]
title = _extract_chapter_title(manuscript_text, slide_id, slides)
# Check if we should merge with previous chapter (too short)
if chapters and (timestamp - chapters[-1].timestamp) < min_chapter_duration:
continue # Skip this chapter, previous one covers it
chapters.append(
ChapterMarker(
slide_id=slide_id,
timestamp=timestamp,
title=title,
)
)
# Ensure first chapter starts at 0:00
if chapters and chapters[0].timestamp > 0:
chapters[0] = ChapterMarker(
slide_id=chapters[0].slide_id,
timestamp=0.0,
title=chapters[0].title,
)
return chapters
def collect_attributions(
videos: dict[str, VideoSource],
video_events: list = None,
) -> list[tuple[str, Attribution]]:
"""
Collect all video attributions.
Returns list of (video_id, Attribution) tuples for videos that have attribution.
Only includes videos that are actually used in the project (via video_events)
or videos from shared assets that have attribution.
"""
attributions = []
# Get set of used video IDs from events
used_video_ids = set()
if video_events:
for event in video_events:
used_video_ids.add(event.video_id)
for video_id, video_source in videos.items():
if video_source.attribution:
# Include if used in video or if it's a shared asset
if video_id in used_video_ids or video_source.is_shared:
attributions.append((video_id, video_source.attribution))
return attributions
def generate_description(
config: ProjectConfig,
manuscript_text: str,
slides: dict[str, SlideDefinition],
videos: dict[str, VideoSource],
marker_timings: list,
transcription: list[TranscribedWord] = None,
video_events: list = None,
citations: list[Citation] = None,
include_chapters: bool = True,
include_citations: bool = True,
include_attributions: bool = True,
) -> str:
"""
Generate complete YouTube description.
Combines:
- Video description from project.json
- Chapter markers (optional)
- Citations from manuscript (optional)
- Stock footage attributions (optional)
- Footer from project.json
Returns formatted description text.
"""
sections = []
# 1. Video description
if config.description:
sections.append(config.description.strip())
# 2. Chapters
if include_chapters:
chapters = generate_chapters(manuscript_text, slides, marker_timings)
if chapters:
chapter_lines = ["CHAPTERS", ""]
for ch in chapters:
chapter_lines.append(f"{_format_timestamp(ch.timestamp)} {ch.title}")
sections.append("\n".join(chapter_lines))
# 3. Citations/References
if include_citations:
citations = citations or []
if citations and transcription:
# Align citations to get timestamps
for citation in citations:
citation.timestamp = _align_citation_to_transcription(
citation, transcription, manuscript_text
)
if citations:
ref_lines = ["REFERENCES", ""]
for citation in citations:
if citation.timestamp >= 0:
ref_lines.append(
f"{_format_timestamp(citation.timestamp)} - {citation.reference}"
)
else:
ref_lines.append(f"- {citation.reference}")
sections.append("\n".join(ref_lines))
# 4. Stock footage attributions
if include_attributions:
attributions = collect_attributions(videos, video_events)
if attributions:
attr_lines = ["STOCK FOOTAGE", ""]
for video_id, attr in attributions:
# Format: "Description by Creator via Source: URL"
line = f"{video_id.replace('_', ' ').title()} by {attr.creator} via {attr.source.title()}"
if attr.url:
line += f": {attr.url}"
attr_lines.append(line)
sections.append("\n".join(attr_lines))
# 5. Footer
if config.footer:
sections.append(config.footer.strip())
# Join sections with double newlines
return "\n\n".join(sections)
def write_description_file(
output_path: Path,
config: ProjectConfig,
manuscript_text: str,
slides: dict[str, SlideDefinition],
videos: dict[str, VideoSource],
marker_timings: list,
transcription: list[TranscribedWord] = None,
video_events: list = None,
citations: list[Citation] = None,
) -> str:
"""
Generate and write YouTube description to file.
Args:
output_path: Path to write description (e.g., out/description_youtube.txt)
config: Project configuration
manuscript_text: Manuscript content
slides: Slide definitions
videos: Video definitions
marker_timings: Aligned marker timings
transcription: Word-level transcription (optional, for citation timestamps)
video_events: Video events from render plan (optional, for attribution filtering)
citations: Pre-extracted citations (optional, loaded from citations.json)
Returns:
The generated description text
"""
description = generate_description(
config=config,
manuscript_text=manuscript_text,
slides=slides,
videos=videos,
marker_timings=marker_timings,
transcription=transcription,
video_events=video_events,
citations=citations,
)
# Ensure output directory exists
output_path.parent.mkdir(parents=True, exist_ok=True)
# Write description
output_path.write_text(description, encoding="utf-8")
return description
+15 -3
View File
@@ -7,12 +7,14 @@ from typing import Optional
class GnommoError(Exception):
"""Base exception for all GnommoEditor errors."""
pass
@dataclass
class ValidationIssue:
"""A single validation issue with location context."""
message: str
file: Optional[Path] = None
line: Optional[int] = None
@@ -30,7 +32,9 @@ class ValidationIssue:
class ParseError(GnommoError):
"""Error during parsing of input files."""
def __init__(self, message: str, file: Optional[Path] = None, line: Optional[int] = None):
def __init__(
self, message: str, file: Optional[Path] = None, line: Optional[int] = None
):
self.issue = ValidationIssue(message, file, line)
super().__init__(str(self.issue))
@@ -48,7 +52,9 @@ class ValidationError(GnommoError):
class RenderError(GnommoError):
"""Error during rendering stage."""
def __init__(self, message: str, command: Optional[str] = None, stderr: Optional[str] = None):
def __init__(
self, message: str, command: Optional[str] = None, stderr: Optional[str] = None
):
self.command = command
self.stderr = stderr
full_message = message
@@ -62,7 +68,13 @@ class RenderError(GnommoError):
class PreprocessError(GnommoError):
"""Error during preprocessing stage."""
def __init__(self, message: str, filter_type: Optional[str] = None, command: Optional[str] = None, stderr: Optional[str] = None):
def __init__(
self,
message: str,
filter_type: Optional[str] = None,
command: Optional[str] = None,
stderr: Optional[str] = None,
):
self.filter_type = filter_type
self.command = command
self.stderr = stderr
+74
View File
@@ -0,0 +1,74 @@
ObjC.import('stdlib');
ObjC.import('Foundation');
function toAbsolutePath(p) {
// Expand ~ and make absolute relative to current working directory
var s = $(String(p)).stringByExpandingTildeInPath;
if (!s.isAbsolutePath) {
var cwd = $.NSFileManager.defaultManager.currentDirectoryPath;
s = cwd.stringByAppendingPathComponent(s);
}
return s.stringByStandardizingPath.js;
}
function fileExists(p) {
return $.NSFileManager.defaultManager.fileExistsAtPath($(p));
}
function getNotes(slide) {
try { return slide.presenterNotes(); } catch (e) {}
try { return slide.speakerNotes(); } catch (e) {}
return "";
}
function run(argv) {
if (!argv || argv.length < 1) throw new Error("Usage: script.js <file.key> [slides_output_dir]");
var abs = toAbsolutePath(argv[0]);
var slidesDir = argv.length >= 2 ? toAbsolutePath(argv[1]) : null;
if (!fileExists(abs)) {
throw new Error("File not found: " + abs);
}
var Keynote = Application('Keynote');
Keynote.activate();
// Keynote is happiest when given a Path() made from an absolute POSIX path
var doc = Keynote.open(Path(abs));
// Export slides as PNG if output directory is provided
if (slidesDir) {
// Create directory if it doesn't exist
var fm = $.NSFileManager.defaultManager;
if (!fm.fileExistsAtPath($(slidesDir))) {
fm.createDirectoryAtPathWithIntermediateDirectoriesAttributesError(
$(slidesDir), true, $(), $()
);
}
// Export using AppleScript (more reliable than JXA for Keynote export)
var app = Application.currentApplication();
app.includeStandardAdditions = true;
// Build osascript command with proper escaping
// Using multiple -e flags to avoid quoting issues
var cmd = '/usr/bin/osascript' +
' -e \'tell application "Keynote"\'' +
' -e \'export front document to POSIX file "' + slidesDir + '" as slide images with properties {image format:PNG}\'' +
' -e \'end tell\'';
app.doShellScript(cmd);
}
var slides = doc.slides();
var out = [];
for (var i = 0; i < slides.length; i++) {
out.push({
slide_index: i + 1,
notes: String(getNotes(slides[i]) || "")
});
}
doc.close({ saving: 'no' });
return JSON.stringify(out, null, 2);
}
+94
View File
@@ -0,0 +1,94 @@
#!/usr/bin/env python3
"""
Extract presenter notes from a Keynote .key file.
Usage:
python extract_keynote_notes.py path/to/deck.key --out notes.json
Notes:
- A .key file is a package (zip). The presenter notes live in an XML-ish file
typically called index.apxl inside the package.
- This script tries to be robust across minor format changes by searching for
likely note fields.
"""
import json
import os
import subprocess
import argparse
import json
import os
import re
import shutil
import tempfile
import zipfile
from pathlib import Path
def write_manuscript(data: Path, out_path: Path):
data = json.loads(
data.read_text(encoding="utf-8")
) # list of {"slide_index": int, "notes": str}
lines = []
i = 0
for item in data:
print(f"Writing notes for slide {i} to file")
idx = item.get("slide_index")
notes = (item.get("notes") or "").rstrip()
lines.append(f"[S{idx}]")
lines.append(notes)
lines.append("") # blank line between slides
i += 1
out_path.write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")
print(f"Wrote {out_path}")
def main():
keynote_file = Path("video1/video1.key").expanduser().resolve()
if not keynote_file.exists():
raise FileNotFoundError(f"Keynote file not found: {keynote_file}")
script_file = Path("gnommo/extract_keynote_notes.js").expanduser().resolve()
if not script_file.exists():
raise FileNotFoundError(f"Extractor script not found: {script_file}")
presenter_notes_json_file = Path("video1/manuscript.json").expanduser().resolve()
# Run JXA extractor
proc = subprocess.run(
[
"osascript",
"-l",
"JavaScript",
str(script_file),
str(keynote_file),
],
capture_output=True,
text=True,
)
if proc.returncode != 0:
raise RuntimeError(
"Failed to extract presenter notes:\n"
f"STDERR:\n{proc.stderr}\n"
f"STDOUT:\n{proc.stdout}"
)
# Write JSON output
presenter_notes_json_file.write_text(proc.stdout, encoding="utf-8")
if not presenter_notes_json_file.exists():
raise FileNotFoundError(
f"Failed to extract presenter notes to {presenter_notes_json_file}"
)
# Convert JSON → manuscript.txt
write_manuscript(
presenter_notes_json_file, out_path=keynote_file.parent / "manuscript.txt"
)
if __name__ == "__main__":
main()
+366 -36
View File
@@ -6,31 +6,64 @@ from typing import Optional
@dataclass
class TalkingHeadConfig:
"""Configuration for talking head video positioning."""
x: int
y: int
target_height: int # in pixels, or -1 for percentage-based
target_height_percent: float = 0.0 # percentage (0.0-1.0) if target_height is -1
file: Optional[str] = None # Path to video or metadata JSON file
class CutoutDefinition:
"""Definition of a named zone for placing video content.
All positioning values support both pixels (int) and percentages (str like "50%").
Percentage values are stored as floats (0.0-1.0) with pixel value set to -1.
Videos placed in cutouts are cropped to fit the cutout dimensions.
"""
x: int # in pixels, or -1 for percentage-based
y: int # in pixels, or -1 for percentage-based
height: int # in pixels, or -1 for percentage-based
width: int = (
-1
) # in pixels, or -1 for percentage-based (defaults to height for square)
x_percent: float = 0.0 # percentage (0.0-1.0) if x is -1
y_percent: float = 0.0 # percentage (0.0-1.0) if y is -1
height_percent: float = 0.0 # percentage (0.0-1.0) if height is -1
width_percent: float = 0.0 # percentage (0.0-1.0) if width is -1
# Backwards compatibility alias
TalkingHeadConfig = CutoutDefinition
@dataclass
class ProjectConfig:
"""Global project configuration from project.json."""
resolution: tuple[int, int]
fps: int
talking_head: TalkingHeadConfig
default_slide_type: str
cutouts: dict[str, CutoutDefinition] = field(
default_factory=dict
) # Named zones for video placement
background: str = "" # Background image or video path (in shared_assets/)
background_video: str = "" # Deprecated: use background instead
slides_path: str = "slides.json" # path to slides.json relative to project
videos_path: str = "videos.json" # path to videos.json relative to project
audio_path: str = "audio.json" # path to audio.json relative to project
audio_source: Optional[str] = None # defaults to talking head
main_video: Optional[str] = None # ID of main video (e.g., talking head)
gnommo_scratch: Optional[
str
] = None # directory for intermediate files (e.g., external SSD)
# Outro sequence - plays after narration ends (not marker-triggered)
outro: list[str] = field(
default_factory=list
) # List of video IDs to play in sequence after narration
# YouTube description fields
description: str = "" # Video description text for YouTube
footer: str = "" # Footer text (social links, subscribe CTA, etc.)
@dataclass
class SlideDefinition:
"""Definition of a single slide from slides.json."""
image: str
type: str # "fullscreen" | "square"
@@ -38,25 +71,170 @@ class SlideDefinition:
@dataclass
class ChromaKeyConfig:
"""Configuration for chroma key (green screen) filter."""
color: tuple[int, int, int] = (0, 255, 0) # RGB color to key out
similarity: float = 0.15 # Color similarity threshold (0.0-1.0)
blend: float = 0.1 # Edge blend/feathering (0.0-1.0)
spill: float = 0.0 # Spill suppression amount (0.0-1.0)
similarity: float = (
0.4 # Color similarity threshold (0.0-1.0), higher = more aggressive
)
blend: float = 0.08 # Edge blend/feathering (0.0-1.0), lower = tighter edges
spill: float = 0.1 # Spill suppression amount (0.0-1.0)
edge_erode: int = 0 # Pixels to erode from alpha edge (0-5), removes green fringe
# Color protection - restore opacity for colors that shouldn't be keyed
protect_color: tuple[int, int, int] = None # RGB color to protect from keying
protect_tolerance: float = (
0.15 # How much variation from protect_color to allow (0-1)
)
@dataclass
class GnommoKeyConfig:
"""Configuration for gnommokey filter - Keylight-style color-difference keyer.
Uses YCbCr color-difference keying (like Keylight/Ultimatte) instead of
simple Euclidean distance. This handles lighting variation much better
than basic chromakey.
"""
# Screen color (the green/blue screen color to key out)
screen_color: tuple[int, int, int] = (0, 177, 64) # RGB of the screen
# Key extraction strength (default 100, higher = more aggressive)
# Values 80-150 are typical. Maps to Keylight's Screen Gain.
screen_gain: float = 100.0
# Balance between chrominance and luminance in key calculation (0-100)
# 0 = pure color-difference, 100 = luminance weighted
# Maps to Keylight's Screen Balance.
screen_balance: float = 50.0
# Alpha/matte adjustments
clip_black: float = 0.0 # Crush blacks (0-100). Higher = more transparent areas
clip_white: float = 100.0 # Crush whites (0-100). Lower = more opaque areas
# Despill: color to shift green spill toward (RGB)
# Typical values: skin tone [217, 200, 180] or neutral [200, 200, 200]
despill_bias: tuple[int, int, int] = None
# How aggressively to apply despill (0-1)
despill_strength: float = 0.5
# Alpha bias: influences edge treatment (RGB)
# Can help with edge color contamination
alpha_bias: tuple[int, int, int] = None
# Edge refinement
edge_erode: int = 0 # Pixels to erode from alpha edge (0-5)
edge_soften: float = 0.0 # Blur the alpha edge (0-5 pixels)
@dataclass
class ColorGradeConfig:
"""Configuration for color grading filter.
Applies color balance, contrast curves, and saturation adjustments
while preserving the alpha channel.
"""
# Color balance (range: -1.0 to 1.0, 0 = no change)
# Midtones
rm: float = 0.0 # Red midtones adjustment
gm: float = 0.0 # Green midtones adjustment
bm: float = 0.0 # Blue midtones adjustment
# Highlights
rh: float = 0.0 # Red highlights adjustment
gh: float = 0.0 # Green highlights adjustment
bh: float = 0.0 # Blue highlights adjustment
# Shadows
rs: float = 0.0 # Red shadows adjustment
gs: float = 0.0 # Green shadows adjustment
bs: float = 0.0 # Blue shadows adjustment
# Curves preset (none, lighter, darker, increase_contrast, medium_contrast, etc.)
curves_preset: str = "none"
# EQ adjustments
contrast: float = 1.0 # Contrast multiplier (0.0-2.0, 1.0 = no change)
brightness: float = 0.0 # Brightness adjustment (-1.0 to 1.0, 0 = no change)
saturation: float = 1.0 # Saturation multiplier (0.0-3.0, 1.0 = no change)
# Custom curves for lift/gamma/gain control
# Format: "0/0 0.5/0.56 1/1" means (input/output) control points
curves_r: str = "" # Red channel curve
curves_g: str = "" # Green channel curve
curves_b: str = "" # Blue channel curve
curves_master: str = "" # Master (luminance) curve
@dataclass
class AudioNormalizeConfig:
"""Configuration for audio normalization filter.
Applies noise reduction, compression, and loudness normalization
to improve audio quality and consistency.
"""
# Noise reduction (afftdn filter)
denoise: bool = True # Enable noise reduction
noise_floor: float = -25.0 # Noise floor in dB (default -25, lower = more aggressive)
# Compression (acompressor filter)
compress: bool = True # Enable dynamic range compression
threshold: float = -20.0 # Compression threshold in dB
ratio: float = 4.0 # Compression ratio (4:1 default)
attack: float = 5.0 # Attack time in ms
release: float = 50.0 # Release time in ms
makeup: float = 2.0 # Makeup gain in dB
# Loudness normalization (loudnorm filter - EBU R128)
normalize: bool = True # Enable loudness normalization
target_lufs: float = -16.0 # Target integrated loudness (YouTube recommends -14 to -16)
target_lra: float = 11.0 # Target loudness range
target_tp: float = -1.5 # Target true peak in dB
@dataclass
class FilterConfig:
"""Base configuration for a preprocessing filter."""
type: str
# Type-specific config stored in subclasses or as dict
@dataclass
class Attribution:
"""Attribution information for stock footage (e.g., Pexels)."""
source: str # Source platform (e.g., "pexels", "pixabay", "unsplash")
creator: str # Creator/photographer name
url: Optional[str] = None # URL to the original content
@dataclass
class VideoSource:
"""Video source definition from videos.json."""
file: str
preprocess: list[dict] = field(default_factory=list) # List of filter config dicts
output_file: Optional[str] = None # Path to preprocessed output (if any)
source_file: str # Source video filename (relative to videos.json location or shared_assets/)
filter: list[dict] = field(default_factory=list) # List of filter config dicts
output_file: Optional[
str
] = None # Path to preprocessed output (relative to videos.json)
take: Optional[
float
] = None # Max duration to play (seconds). Default: until next slide or end of clip
skip: float = 0.0 # Skip this many seconds at start of video (seek point)
zoom: float = (
1.0 # Scale factor for video (1.0 = fit to cutout height, >1 = enlarge)
)
cutout: Optional[
str
] = None # Name of cutout to place video in (from project.json cutouts)
always_visible: bool = False # If True, video is always shown (like talking head)
is_shared: bool = False # If True, source_file is relative to shared_assets/
pause_narration: float = (
0.0 # Seconds to pause narration during this video (0 = no pause)
)
attribution: Optional[Attribution] = None # Attribution for stock footage
use_audio_channels: str = "both" # Audio channel selection: "both", "left", or "right"
@dataclass
@@ -67,50 +245,202 @@ class VideoMetadata:
This allows defining preprocessing steps separately from videos.json,
enabling per-video preprocessing configuration.
"""
source_file: str # Original source video file
preprocess: list[dict] = field(default_factory=list) # Preprocessing filters
output: Optional[dict] = None # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
@dataclass
class TimedWord:
"""A word or marker with its timestamp from transcript.csv."""
time: float
word: str
@property
def is_marker(self) -> bool:
"""Check if this is a slide marker like [S1]."""
return self.word.startswith("[") and self.word.endswith("]")
@property
def marker_id(self) -> Optional[str]:
"""Extract marker ID (e.g., 'S1' from '[S1]')."""
if self.is_marker:
return self.word[1:-1]
return None
output: Optional[
dict
] = None # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
@dataclass
class SlideEvent:
"""A resolved slide event with timing information."""
slide_id: str
start_time: float
end_time: float
slide_def: SlideDefinition
@dataclass
class AudioDefinition:
"""Definition of an audio clip from audio.json."""
file: str # Audio filename (relative to audio.json location)
volume: float = 1.0 # Volume multiplier (0.0-1.0)
loop: bool = False # If True, loop for entire duration from trigger point
ignore_pauses: bool = False # If True, audio continues playing during narration pauses
@dataclass
class Citation:
"""A citation extracted from manuscript.txt [cite:...] markers."""
reference: str # The literal reference text after cite:
marker_id: str # The full marker (e.g., "cite:Smith et al...")
timestamp: float = -1.0 # Aligned timestamp (-1 if not aligned)
context: str = "" # Text following the citation for alignment
@dataclass
class AudioEvent:
"""A resolved audio event with timing information."""
audio_id: str
start_time: float # When to start playing (marker time - offset)
audio_def: AudioDefinition
@dataclass
class VideoEvent:
"""A resolved video event with timing information."""
video_id: str
start_time: float
end_time: float
video_source: "VideoSource"
cutout: "CutoutDefinition"
@dataclass
class CameraState:
"""State of the virtual camera at a point in time.
The camera transforms the entire composed scene (background, slides, cutouts).
This ensures all elements stay spatially synchronized when zooming/tilting.
"""
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%, etc.
rotation: float = 0.0 # degrees, positive = clockwise
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame width
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame height
focal_x: float = 0.5 # 0.0 to 1.0, zoom focal point X (0.5 = center)
focal_y: float = 0.5 # 0.0 to 1.0, zoom focal point Y (0.5 = center)
def __post_init__(self):
# Clamp values to reasonable ranges
self.zoom = max(0.5, min(3.0, self.zoom))
self.rotation = max(-45.0, min(45.0, self.rotation))
self.pan_x = max(-1.0, min(1.0, self.pan_x))
self.pan_y = max(-1.0, min(1.0, self.pan_y))
self.focal_x = max(0.0, min(1.0, self.focal_x))
self.focal_y = max(0.0, min(1.0, self.focal_y))
def is_default(self) -> bool:
"""Check if this is the default camera state (no transform)."""
return (
self.zoom == 1.0
and self.rotation == 0.0
and self.pan_x == 0.0
and self.pan_y == 0.0
and self.focal_x == 0.5
and self.focal_y == 0.5
)
@dataclass
class CameraEvent:
"""A camera state change at a specific time.
Camera events can be instant (duration=0) or animated (duration>0).
When animated, the camera smoothly transitions from its current state
to the target state over the specified duration using the easing function.
"""
time: float # timestamp in seconds
target_state: CameraState
duration: float = 0.2 # transition duration (0 = instant snap)
easing: str = "ease-out" # linear, ease-in, ease-out, ease-in-out
# Camera effect presets - map marker names to camera states
# Effect strengths are intentionally subtle for professional look
CAMERA_PRESETS: dict[str, CameraState] = {
# Zoom levels (halved for subtlety)
"Zoom0": CameraState(zoom=1.0),
"Zoom1": CameraState(zoom=1.05),
"Zoom2": CameraState(zoom=1.125),
"Zoom3": CameraState(zoom=1.25),
# Tilt/rotation (halved)
"TiltLeft": CameraState(rotation=-7.5),
"TiltRight": CameraState(rotation=7.5),
"NoTilt": CameraState(), # Full reset to default state
# Pan (halved)
"PanLeft": CameraState(pan_x=-0.1),
"PanRight": CameraState(pan_x=0.1),
"PanUp": CameraState(pan_y=-0.075),
"PanDown": CameraState(pan_y=0.075),
"PanCenter": CameraState(pan_x=0.0, pan_y=0.0),
# Reset all
"Reset": CameraState(),
}
@dataclass
class NarrationPause:
"""A pause in the narration timeline for an interstitial video."""
output_time: float # When the pause starts in the OUTPUT timeline
narration_time: float # Where we are in the NARRATION source when pause starts
duration: float # How long the pause lasts
video_id: str # The video that plays during the pause
@dataclass
class OutroEvent:
"""A video that plays as part of the outro sequence (after narration ends)."""
video_id: str
start_time: float # When this outro video starts (in output timeline)
end_time: float # When this outro video ends
video_source: "VideoSource"
cutout: Optional["CutoutDefinition"] = None # None = fullscreen
@dataclass
class RenderPlan:
"""Complete plan for rendering the final video."""
project_path: Path
config: ProjectConfig
talking_head: VideoSource
slide_events: list[SlideEvent]
total_duration: float
slides: dict[str, SlideDefinition]
videos: dict[str, VideoSource] = field(default_factory=dict)
video_events: list[VideoEvent] = field(
default_factory=list
) # Triggered video overlays
narration_videos: list[tuple[str, VideoSource, CutoutDefinition]] = field(
default_factory=list
) # (video_id, source, cutout)
slides_dir: Path = None # directory containing slide images
talking_head_path: Path = None # Resolved path to actual video file
videos_dir: Path = None # directory containing videos.json and video files
audio_events: list[AudioEvent] = field(default_factory=list)
audio: dict[str, AudioDefinition] = field(default_factory=dict)
audio_dir: Path = None # directory containing audio.json and audio files
camera_events: list[CameraEvent] = field(
default_factory=list
) # Virtual camera keyframes
# Partial rendering support
time_offset: float = (
0.0 # Offset subtracted from all timestamps (for partial render)
)
initial_camera_state: "CameraState" = (
None # Camera state at render start (for partial render)
)
input_seek_time: float = 0.0 # Seek position for input videos (for partial render)
# Shared assets support
shared_assets_dir: Path = None # Directory containing shared assets (pexels, etc.)
# Narration pause support
narration_pauses: list[NarrationPause] = field(
default_factory=list
) # Gaps in narration for interstitial videos
# Outro sequence (plays after narration ends)
outro_events: list["OutroEvent"] = field(
default_factory=list
) # Videos that play after narration ends
narration_end_time: float = 0.0 # When narration ends (before outro starts)
# Slide layout configurations (hardcoded for POC)
+207 -67
View File
@@ -1,6 +1,5 @@
"""Extract stage: parse all input files."""
import csv
import json
import re
from pathlib import Path
@@ -8,21 +7,28 @@ from typing import Any, Optional
from .errors import ParseError
from .models import (
Attribution,
AudioDefinition,
Citation,
CutoutDefinition,
ProjectConfig,
SlideDefinition,
TalkingHeadConfig,
TimedWord,
VideoMetadata,
VideoSource,
)
def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int, str]]]:
def parse_manuscript(
project_path: Path,
) -> tuple[str, list[str], list[tuple[int, str]], list[Citation]]:
"""
Parse manuscript.txt and extract text content and slide markers.
Strips [cite:...] markers from the returned text so they never pollute
alignment contexts. Citations are extracted and returned separately.
Returns:
Tuple of (full text, list of marker IDs found, list of malformed markers as (line_num, text))
Tuple of (full text, list of marker IDs found, list of malformed markers, list of citations)
"""
manuscript_path = project_path / "manuscript.txt"
@@ -31,8 +37,15 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
text = manuscript_path.read_text(encoding="utf-8")
# Extract all valid slide markers like [S1], [S2], etc.
markers = re.findall(r"\[([A-Za-z0-9_]+)\]", text)
# Extract citations before stripping them
citations = parse_citations(text)
# Strip [cite:...] markers from text so they don't pollute alignment
text = re.sub(r"\[cite:[^\]]+\]", "", text)
# Extract all valid markers like [S1], [video:demo], [Zoom2], etc.
# Include . in pattern to catch markers with file extensions (so validator can warn about them)
markers = re.findall(r"\[([A-Za-z0-9_:.]+)\]", text)
# Find malformed markers (missing brackets, extra spaces, etc.)
malformed: list[tuple[int, str]] = []
@@ -56,48 +69,75 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
for match in spaced:
malformed.append((line_num, match))
return text, markers, malformed
return text, markers, malformed, citations
def parse_transcript(project_path: Path) -> list[TimedWord]:
def parse_citations(manuscript_text: str) -> list[Citation]:
"""
Parse transcript.csv into a list of timed words.
Extract all [cite:...] markers from manuscript text.
Expected format:
t,word
0.00,This
0.42,is
...
The text after 'cite:' is the literal reference that should appear
in the video description.
Returns:
List of Citation objects with reference text and context for alignment.
"""
transcript_path = project_path / "transcript.csv"
citations = []
if not transcript_path.exists():
raise ParseError("transcript.csv not found", transcript_path)
# Match [cite:...] markers - content can include any characters except ]
# Use a more permissive pattern that handles multi-word citations
pattern = r"\[cite:([^\]]+)\]"
timed_words = []
for match in re.finditer(pattern, manuscript_text):
reference = match.group(1).strip()
marker_id = f"cite:{reference}"
with open(transcript_path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
# Extract context: text following the citation (for alignment)
# Get up to 100 chars after the marker, stopping at next marker or newline
end_pos = match.end()
context_text = manuscript_text[end_pos : end_pos + 150]
if reader.fieldnames is None or "t" not in reader.fieldnames or "word" not in reader.fieldnames:
raise ParseError(
"transcript.csv must have columns: t, word",
transcript_path
# Clean up context: take text until next marker or double newline
context_match = re.match(r"([^\[]*?)(?:\[|\n\n|$)", context_text)
context = context_match.group(1).strip() if context_match else ""
# Truncate context to ~50 chars for display
if len(context) > 50:
context = context[:47] + "..."
citations.append(
Citation(
reference=reference,
marker_id=marker_id,
context=context,
)
)
for line_num, row in enumerate(reader, start=2): # start=2 because line 1 is header
try:
time = float(row["t"])
word = row["word"].strip()
timed_words.append(TimedWord(time=time, word=word))
except (ValueError, KeyError) as e:
raise ParseError(
f"Invalid row: {e}",
transcript_path,
line_num
)
return citations
return timed_words
def save_citations(citations: list[Citation], path: Path) -> None:
"""Save citations to a JSON file."""
data = [
{"reference": c.reference, "context": c.context}
for c in citations
]
path.write_text(json.dumps(data, indent=2), encoding="utf-8")
def load_citations(path: Path) -> list[Citation]:
"""Load citations from a JSON file."""
if not path.exists():
return []
data = json.loads(path.read_text(encoding="utf-8"))
return [
Citation(
reference=item["reference"],
marker_id=f"cite:{item['reference']}",
context=item.get("context", ""),
)
for item in data
]
def parse_project_config(project_path: Path) -> ProjectConfig:
@@ -112,16 +152,27 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
except json.JSONDecodeError as e:
raise ParseError(f"Invalid JSON: {e}", config_path)
# Parse talking head config
th_data = data.get("talkinghead", {})
th_height, th_height_pct = _parse_dimension(th_data.get("targetheight", 200))
talking_head = TalkingHeadConfig(
x=th_data.get("x", 100),
y=th_data.get("y", 100),
target_height=th_height,
target_height_percent=th_height_pct,
file=th_data.get("file"),
)
# Parse cutouts (named zones for video placement)
cutouts: dict[str, CutoutDefinition] = {}
cutouts_data = data.get("cutouts", {})
for cutout_name, cutout_data in cutouts_data.items():
x, x_pct = _parse_dimension(cutout_data.get("x", 0))
y, y_pct = _parse_dimension(cutout_data.get("y", 0))
height, height_pct = _parse_dimension(cutout_data.get("height", 200))
# Width defaults to same as height (square) if not specified
width, width_pct = _parse_dimension(
cutout_data.get("width", cutout_data.get("height", 200))
)
cutouts[cutout_name] = CutoutDefinition(
x=x,
y=y,
height=height,
width=width,
x_percent=x_pct,
y_percent=y_pct,
height_percent=height_pct,
width_percent=width_pct,
)
# Parse resolution
resolution = data.get("resolution", [1920, 1080])
@@ -131,12 +182,19 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
return ProjectConfig(
resolution=tuple(resolution),
fps=data.get("fps", 30),
talking_head=talking_head,
default_slide_type=data.get("defaultSlideType", "square"),
cutouts=cutouts,
background=data.get("background", ""),
background_video=data.get("background_video", ""), # Deprecated
slides_path=data.get("slides", "slides.json"),
videos_path=data.get("videos", "videos.json"),
audio_path=data.get("audio", "audio.json"),
audio_source=data.get("audio_source"),
main_video=data.get("main_video"),
gnommo_scratch=data.get("gnommo_scratch"),
outro=data.get("outro", []),
description=data.get("description", ""),
footer=data.get("footer", ""),
)
@@ -157,7 +215,9 @@ def _parse_dimension(value: Any) -> tuple[int, float]:
return 200, 0.0 # default
def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str, SlideDefinition]:
def parse_slides(
project_path: Path, config: ProjectConfig = None
) -> dict[str, SlideDefinition]:
"""Parse slides.json into slide definitions."""
if config and config.slides_path:
slides_path = project_path / config.slides_path
@@ -176,8 +236,7 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
for slide_id, slide_data in data.items():
if "image" not in slide_data:
raise ParseError(
f"Slide '{slide_id}' missing required field 'image'",
slides_path
f"Slide '{slide_id}' missing required field 'image'", slides_path
)
slides[slide_id] = SlideDefinition(
image=slide_data["image"],
@@ -187,12 +246,67 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
return slides
def parse_videos(project_path: Path) -> dict[str, VideoSource]:
"""Parse videos.json into video source definitions."""
videos_path = project_path / "videos.json"
def parse_audio(
project_path: Path, config: Optional[ProjectConfig] = None
) -> tuple[dict[str, AudioDefinition], Path]:
"""
Parse audio.json into audio definitions.
Returns:
Tuple of (audio dict, audio_dir) where audio_dir is the directory
containing audio.json (for resolving relative file paths).
"""
if config and config.audio_path:
audio_path = project_path / config.audio_path
else:
audio_path = project_path / "audio.json"
# Audio is optional - return empty dict if not found
if not audio_path.exists():
return {}, project_path
audio_dir = audio_path.parent
try:
data = json.loads(audio_path.read_text(encoding="utf-8"))
except json.JSONDecodeError as e:
raise ParseError(f"Invalid JSON: {e}", audio_path)
audio = {}
for audio_id, audio_data in data.items():
if "file" not in audio_data:
raise ParseError(
f"Audio '{audio_id}' missing required field 'file'", audio_path
)
audio[audio_id] = AudioDefinition(
file=audio_data["file"],
volume=float(audio_data.get("volume", 1.0)),
loop=bool(audio_data.get("loop", False)),
ignore_pauses=bool(audio_data.get("ignore_pauses", False)),
)
return audio, audio_dir
def parse_videos(
project_path: Path, config: Optional[ProjectConfig] = None
) -> tuple[dict[str, VideoSource], Path]:
"""
Parse videos.json into video source definitions.
Returns:
Tuple of (videos dict, videos_dir) where videos_dir is the directory
containing videos.json (for resolving relative file paths).
"""
if config and config.videos_path:
videos_path = project_path / config.videos_path
else:
videos_path = project_path / "videos.json"
if not videos_path.exists():
raise ParseError("videos.json not found", videos_path)
raise ParseError(f"videos.json not found: {videos_path}", videos_path)
videos_dir = videos_path.parent
try:
data = json.loads(videos_path.read_text(encoding="utf-8"))
@@ -201,18 +315,37 @@ def parse_videos(project_path: Path) -> dict[str, VideoSource]:
videos = {}
for video_id, video_data in data.items():
if "file" not in video_data:
if "source_file" not in video_data:
raise ParseError(
f"Video '{video_id}' missing required field 'file'",
videos_path
f"Video '{video_id}' missing required field 'source_file'", videos_path
)
# Parse attribution if present
attribution = None
if "attribution" in video_data:
attr_data = video_data["attribution"]
attribution = Attribution(
source=attr_data.get("source", "unknown"),
creator=attr_data.get("creator", "Unknown"),
url=attr_data.get("url"),
)
videos[video_id] = VideoSource(
file=video_data["file"],
preprocess=video_data.get("preprocess", []),
source_file=video_data["source_file"],
filter=video_data.get("filter", []),
output_file=video_data.get("output_file"),
take=video_data.get("take"),
skip=video_data.get("skip", 0.0),
zoom=video_data.get("zoom", 1.0),
cutout=video_data.get("cutout"),
always_visible=video_data.get("always_visible", False),
is_shared=video_data.get("is_shared", False),
pause_narration=float(video_data.get("pause_narration", 0)),
attribution=attribution,
use_audio_channels=video_data.get("use_audio_channels", "both"),
)
return videos
return videos, videos_dir
def get_video_duration(video_path: Path) -> float:
@@ -221,10 +354,13 @@ def get_video_duration(video_path: Path) -> float:
cmd = [
"ffprobe",
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
str(video_path)
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
str(video_path),
]
result = subprocess.run(cmd, capture_output=True, text=True)
@@ -261,7 +397,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
raise ParseError(f"Invalid JSON: {e}", metadata_path)
if "source_file" not in data:
raise ParseError("Video metadata missing required field 'source_file'", metadata_path)
raise ParseError(
"Video metadata missing required field 'source_file'", metadata_path
)
return VideoMetadata(
source_file=data["source_file"],
@@ -270,7 +408,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
)
def resolve_video_file(project_path: Path, file_ref: str) -> tuple[Path, Optional[VideoMetadata]]:
def resolve_video_file(
project_path: Path, file_ref: str
) -> tuple[Path, Optional[VideoMetadata]]:
"""
Resolve a video file reference, which can be either:
1. A direct path to a video file
+1445 -64
View File
File diff suppressed because it is too large Load Diff
+840 -71
View File
File diff suppressed because it is too large Load Diff
+11 -11
View File
@@ -11,6 +11,7 @@ from .errors import GnommoError
@dataclass
class TranscribedWord:
"""A word with its timestamp from transcription."""
word: str
start: float
end: float
@@ -18,6 +19,7 @@ class TranscribedWord:
class TranscriptionError(GnommoError):
"""Error during transcription."""
pass
@@ -57,21 +59,20 @@ def transcribe_video(video_path: Path, model: str = "base") -> list[TranscribedW
for segment in result.get("segments", []):
for word_info in segment.get("words", []):
words.append(TranscribedWord(
word=word_info["word"].strip(),
start=word_info["start"],
end=word_info["end"],
))
words.append(
TranscribedWord(
word=word_info["word"].strip(),
start=word_info["start"],
end=word_info["end"],
)
)
return words
def save_transcript(words: list[TranscribedWord], output_path: Path) -> None:
"""Save transcribed words to a JSON file."""
data = [
{"word": w.word, "start": w.start, "end": w.end}
for w in words
]
data = [{"word": w.word, "start": w.start, "end": w.end} for w in words]
with open(output_path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
@@ -86,6 +87,5 @@ def load_transcript(transcript_path: Path) -> list[TranscribedWord]:
data = json.load(f)
return [
TranscribedWord(word=w["word"], start=w["start"], end=w["end"])
for w in data
TranscribedWord(word=w["word"], start=w["start"], end=w["end"]) for w in data
]
+929 -57
View File
File diff suppressed because it is too large Load Diff
+140 -55
View File
@@ -3,7 +3,13 @@
from pathlib import Path
from .errors import ValidationError, ValidationIssue
from .models import ProjectConfig, SlideDefinition, VideoSource, SLIDE_LAYOUTS
from .models import (
ProjectConfig,
SlideDefinition,
VideoSource,
SLIDE_LAYOUTS,
CAMERA_PRESETS,
)
def validate_project(
@@ -12,6 +18,7 @@ def validate_project(
config: ProjectConfig,
slides: dict[str, SlideDefinition],
videos: dict[str, VideoSource],
videos_dir: Path,
malformed_markers: list[tuple[int, str]] = None,
) -> None:
"""
@@ -30,19 +37,59 @@ def validate_project(
# Check for malformed markers first (these are likely typos)
if malformed_markers:
for line_num, marker_text in malformed_markers:
issues.append(ValidationIssue(
f"Malformed marker: {marker_text}",
project_path / "manuscript.txt",
line_num
))
issues.append(
ValidationIssue(
f"Malformed marker: {marker_text}",
project_path / "manuscript.txt",
line_num,
)
)
# Check all manuscript markers have corresponding slides
# Check all manuscript markers have corresponding slides or videos
for marker in manuscript_markers:
# Skip camera effect markers (Zoom0, TiltLeft, Reset, etc.)
if marker in CAMERA_PRESETS:
continue
# Skip audio markers (start with 'A' followed by audio id, e.g., Awoosh)
if marker.startswith("A") and len(marker) > 1 and marker[1:].isalnum():
continue
# Validate video trigger markers (video:xxx) - slide-like videos
if marker.startswith("video:"):
video_id = marker[6:] # Remove 'video:' prefix
if video_id not in videos:
# Check if it's a file extension mismatch
hint = ""
if "." in video_id:
base_name = video_id.rsplit(".", 1)[0]
if base_name in videos:
hint = f" (Did you mean [video:{base_name}]? Don't include file extensions in markers)"
issues.append(
ValidationIssue(
f"Video marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json{hint}",
project_path / "manuscript.txt",
)
)
continue
# Validate narration trigger markers (narration:xxx) - continuous videos
if marker.startswith("narration:"):
video_id = marker[10:] # Remove 'narration:' prefix
if video_id not in videos:
issues.append(
ValidationIssue(
f"Narration marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json",
project_path / "manuscript.txt",
)
)
continue
if marker not in slides:
issues.append(ValidationIssue(
f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
project_path / "manuscript.txt"
))
issues.append(
ValidationIssue(
f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
project_path / "manuscript.txt",
)
)
# Check all slide images exist
# Slides are in the same directory as the slides.json file
@@ -52,37 +99,68 @@ def validate_project(
for slide_id, slide_def in slides.items():
image_path = slides_dir / slide_def.image
if not image_path.exists():
issues.append(ValidationIssue(
f"Slide image not found: {slide_def.image}",
slides_json_path
))
issues.append(
ValidationIssue(
f"Slide image not found: {slide_def.image}", slides_json_path
)
)
# Check slide type is valid
if slide_def.type not in SLIDE_LAYOUTS:
issues.append(ValidationIssue(
f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
project_path / "slides.json"
))
issues.append(
ValidationIssue(
f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
project_path / "slides.json",
)
)
# Check all video files exist (paths relative to videos_dir or shared_assets)
videos_json_path = project_path / config.videos_path
# Find shared_assets directory
shared_assets_dir = None
if (project_path / "shared_assets").exists():
shared_assets_dir = project_path / "shared_assets"
elif (project_path.parent / "shared_assets").exists():
shared_assets_dir = project_path.parent / "shared_assets"
# Check all video files exist
for video_id, video_source in videos.items():
video_path = project_path / video_source.file
if not video_path.exists():
issues.append(ValidationIssue(
f"Video file not found: {video_source.file}",
project_path / "videos.json"
))
# Determine base directory based on is_shared flag
if video_source.is_shared:
if shared_assets_dir:
base_dir = shared_assets_dir
else:
issues.append(
ValidationIssue(
f"Video '{video_id}' has is_shared=true but shared_assets directory not found",
videos_json_path,
)
)
continue
else:
base_dir = videos_dir
# Check preprocessed output exists if preprocessing is defined
if video_source.preprocess and video_source.output_file:
output_path = project_path / video_source.output_file
video_path = base_dir / video_source.source_file
if not video_path.exists():
issues.append(
ValidationIssue(
f"Video file not found: {video_source.source_file}",
videos_json_path,
)
)
# Check preprocessed output exists if filters are defined
if video_source.filter and video_source.output_file:
output_path = base_dir / video_source.output_file
if not output_path.exists():
issues.append(ValidationIssue(
f"Preprocessed output not found: {video_source.output_file}. "
f"Run with -a preprocess first.",
project_path / "videos.json"
))
issues.append(
ValidationIssue(
f"Preprocessed output not found: {video_source.output_file}. "
f"Run with -a preprocess first.",
videos_json_path,
)
)
# Check background exists (image or video)
# Try 'background' first, fall back to deprecated 'background_video'
@@ -94,38 +172,45 @@ def validate_project(
# Try parent directory (shared_assets at repo root)
bg_path = project_path.parent / bg_file
if not bg_path.exists():
issues.append(ValidationIssue(
f"Background not found: {bg_file}",
project_path / "project.json"
))
issues.append(
ValidationIssue(
f"Background not found: {bg_file}", project_path / "project.json"
)
)
# Check we have at least one video source
if not videos:
issues.append(ValidationIssue(
"No video sources defined in videos.json",
project_path / "videos.json"
))
issues.append(
ValidationIssue(
"No video sources defined in videos.json", project_path / "videos.json"
)
)
# Check resolution is reasonable
width, height = config.resolution
if width < 100 or height < 100:
issues.append(ValidationIssue(
f"Resolution too small: {width}x{height}",
project_path / "project.json"
))
issues.append(
ValidationIssue(
f"Resolution too small: {width}x{height}", project_path / "project.json"
)
)
if width > 7680 or height > 4320:
issues.append(ValidationIssue(
f"Resolution too large: {width}x{height} (max 8K)",
project_path / "project.json"
))
issues.append(
ValidationIssue(
f"Resolution too large: {width}x{height} (max 8K)",
project_path / "project.json",
)
)
# Check FPS is reasonable
if config.fps < 1 or config.fps > 120:
issues.append(ValidationIssue(
f"Invalid FPS: {config.fps} (must be 1-120)",
project_path / "project.json"
))
issues.append(
ValidationIssue(
f"Invalid FPS: {config.fps} (must be 1-120)",
project_path / "project.json",
)
)
# If any issues, raise ValidationError
if issues:
+6
View File
@@ -0,0 +1,6 @@
import gnommo
if __name__ == "__main__":
print("This is the main module.")
gnommo.main()
View File
+2
View File
@@ -0,0 +1,2 @@
openai-whisper
+476
View File
@@ -0,0 +1,476 @@
# Gnommo Feature Development Roadmap
## Overview
Features to standardize the Keynote-to-YouTube workflow, so that once the presentation is complete, only a standardized recording session stands between you and a finished video.
---
## 1. Video Description Generator
**Command:** `gnommo -p <project> description`
Generate a complete YouTube description with citations, attributions, and chapters.
---
### 1.1 Manuscript Citations (`[cite:...]`)
Citations embedded in the manuscript represent sources, references, or links mentioned during narration. The text after `cite:` is the **literal reference** that should appear in the description.
**Format in manuscript.txt:**
```
[cite:Reference text exactly as it should appear]
```
**Examples:**
```
[S3]
According to this study [cite:Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper],
the effect is significant.
[S7]
I'm using [cite:Keynote by Apple - https://apple.com/keynote] for all my presentations.
[S12]
This technique was pioneered by [cite:Dr. Jane Doe, MIT Media Lab].
```
**Output in description:**
```
SOURCES & REFERENCES
━━━━━━━━━━━━━━━━━━━━
1:23 - Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper
4:56 - Keynote by Apple - https://apple.com/keynote
8:30 - Dr. Jane Doe, MIT Media Lab
```
**Requirements:**
- Parse `[cite:...]` markers from manuscript.txt
- Extract the literal text after `cite:` as the reference
- Align citations to timestamps (same fuzzy matching as other markers)
- Group citations in order of appearance
- Citations are NOT aligned for rendering (ignored by renderer) but ARE timestamped for description
**Note:** `[cite:...]` markers should not affect video rendering or narration alignment - they are metadata-only markers for description generation.
---
### 1.2 Pexels/Stock Footage Attribution
Attribution for Pexels content is **not legally required** but is appreciated and professional.
**Official Pexels attribution format:**
```
by [Contributor Name] via Pexels
```
**Implementation:**
- Extend `videos.json` to include attribution metadata:
```json
{
"beach_waves": {
"source_file": "pexels/beach.mp4",
"is_shared": true,
"attribution": {
"source": "pexels",
"creator": "John Doe",
"url": "https://pexels.com/video/12345"
}
}
}
```
- Auto-detect Pexels videos from `shared_assets/pexels/` folder
- Support Pexels metadata JSON files (if downloaded with video)
- Generate attribution section for video description:
```
STOCK FOOTAGE
━━━━━━━━━━━━━
Beach waves by John Doe via Pexels: https://pexels.com/video/12345
City timelapse by Jane Smith via Pexels: https://pexels.com/video/67890
```
**Pexels License Notes** (from pexels.com/license):
- Free for personal and commercial use
- Attribution not required but appreciated
- Cannot sell unaltered copies
- Cannot redistribute on other stock platforms
### 1.3 Complete Description Output
**Output file:** `out/description_youtube.txt`
Combine all elements into a ready-to-paste YouTube description.
**Structure:**
```
[Video description from project.json "description" field]
CHAPTERS
━━━━━━━━
0:00 Introduction
1:23 Topic One
3:45 Topic Two
...
REFERENCES
━━━━━━━━━━
1:23 - Smith et al. (2024) "AI Study" - https://example.com
4:56 - Keynote by Apple - https://apple.com/keynote
...
STOCK FOOTAGE
━━━━━━━━━━━━━
Beach waves by John Doe via Pexels: https://pexels.com/video/12345
...
[Optional footer from project.json "footer" field - social links, subscribe CTA, etc.]
```
**project.json additions:**
```json
{
"description": "In this video, I walk through the complete Gnommo workflow for creating YouTube videos from Keynote presentations.",
"footer": "Subscribe for more tutorials: https://youtube.com/@channel\nTwitter: https://twitter.com/handle"
}
```
**Requirements:**
- Pull video description from `project.json` "description" field
- Generate chapters from slide markers (see Section 2)
- Collect all `[cite:...]` references with timestamps
- Collect all Pexels/stock attributions from `videos.json`
- Append optional footer from `project.json` "footer" field
- Output to `out/description_youtube.txt`
- Sections with no content are omitted (e.g., no STOCK FOOTAGE section if none used)
---
## 2. YouTube Chapter Markers
**Command:** `gnommo -p <project> chapters`
Auto-generate chapter timestamps from slide markers.
**Requirements:**
- Extract chapter titles from:
- Keynote slide titles (via presenter notes import)
- First sentence after each `[SN]` marker
- Optional `[chapter:Title]` markers for explicit chapter names
- Calculate timestamps from aligned marker timings
- Output copy-paste ready format:
```
CHAPTERS
━━━━━━━━
0:00 Introduction
1:23 What is Gnommo?
3:45 Setting Up Your Project
7:12 Recording Tips
10:30 Rendering Your Video
12:45 Outro
```
- Option to merge small chapters (minimum duration threshold)
- Support for nested chapters (main topics + subtopics)
---
## 3. Subtitle/Caption Export
**Command:** `gnommo -p <project> subtitles`
Generate subtitle files from Whisper transcription.
**Requirements:**
- Export formats: SRT, VTT, TXT
- Use existing word-level timestamps from transcription
- Smart line breaking (max characters per line, break at punctuation)
- Speaker diarization support (future: multiple speakers)
- Options:
- `--format srt|vtt|txt`
- `--max-chars 42` (characters per line)
- `--max-duration 5` (seconds per subtitle block)
**Example output (SRT):**
```
1
00:00:01,500 --> 00:00:04,200
Hello and welcome to this tutorial
on video editing with Gnommo.
2
00:00:04,500 --> 00:00:07,800
Today we're going to cover
the complete workflow.
```
---
## 4. Thumbnail Generation
**Command:** `gnommo -p <project> thumbnail`
Auto-generate thumbnail candidates from slides.
**Requirements:**
- Designate thumbnail slides with `[thumbnail]` marker
- If no marker, use slide 1 or title slide
- Apply text overlays from config:
```json
{
"thumbnail": {
"title_text": "Episode ${episode_number}",
"subtitle_text": "${title}",
"font": "Impact",
"text_color": "#FFFFFF",
"outline_color": "#000000",
"position": "bottom-left"
}
}
```
- Generate multiple variants:
- With/without text overlay
- Different zoom levels
- Different color treatments (saturated, high contrast)
- Output to `out/thumbnails/` folder
- Resolution: 1280x720 (YouTube standard)
---
## 5. Intro/Outro Templates
**Configuration in project.json:**
```json
{
"intro": {
"template": "templates/intro_v2.mp4",
"duration": 3.5,
"transition": "fade",
"variables": {
"episode_number": "12",
"title": "Getting Started with Gnommo"
}
},
"outro": {
"template": "templates/outro_subscribe.mp4",
"duration": 8.0,
"transition": "fade"
}
}
```
**Requirements:**
- Define intro/outro templates in `shared_assets/templates/`
- Auto-prepend intro before first slide
- Auto-append outro after last slide
- Support variable substitution in templates (episode number, title)
- Configurable transition types (fade, cut, wipe)
- End screen safe zone support (last 20 seconds)
---
## 6. Multi-Platform Format Presets
**Command:** `gnommo -p <project> render --format <preset>`
**Presets:**
| Preset | Aspect | Resolution | Notes |
|--------|--------|------------|-------|
| `youtube` | 16:9 | 1920x1080 | Default, standard horizontal |
| `youtube-4k` | 16:9 | 3840x2160 | 4K export |
| `shorts` | 9:16 | 1080x1920 | Vertical, auto-reframe slides |
| `podcast` | - | Audio only | MP3/M4A export for podcast feeds |
| `square` | 1:1 | 1080x1080 | Instagram/LinkedIn |
**Requirements:**
- Auto-adjust cutout positions per format
- Smart slide reframing for vertical (zoom to content area)
- Separate output folders per format
- Batch export to multiple formats: `--format youtube,shorts,podcast`
---
## 7. Teleprompter Script Generation
**Command:** `gnommo -p <project> teleprompter`
Extract clean narration text for teleprompter display.
**Requirements:**
- Strip all markers from manuscript
- Keep only spoken text
- Output formats:
- `--format txt` - Plain text
- `--format html` - Scrollable HTML page with large font
- `--format json` - For teleprompter apps
- Optional: Include slide thumbnails as visual cues
- Configurable font size and scroll speed hints
**Example HTML output:**
```html
<div class="teleprompter">
<p class="cue">[SLIDE: Introduction]</p>
<p>Hello and welcome to this tutorial on video editing with Gnommo.</p>
<p class="cue">[SLIDE: What is Gnommo?]</p>
<p>Gnommo is a code-first video editing pipeline...</p>
</div>
```
---
## 8. Recording Checklist Generator
**Command:** `gnommo -p <project> checklist`
Generate a pre-recording checklist based on project configuration.
**Output includes:**
- [ ] Camera settings (resolution, fps from project.json)
- [ ] Lighting setup (if green screen detected in videos.json)
- [ ] Audio check (microphone levels)
- [ ] Props/demos needed (parsed from `[video:...]` markers)
- [ ] Slide count and estimated duration
- [ ] Teleprompter ready
- [ ] Recording space clear
**Customizable via `checklist_template.md` in project folder.**
---
## 9. Audio Normalization
**Automatic during render or standalone command:**
`gnommo -p <project> normalize`
**Requirements:**
- Target: -14 LUFS (YouTube standard)
- Apply loudness normalization to narration track
- Preserve dynamic range (avoid over-compression)
- Normalize intro/outro audio to match
- Option: `--target-lufs -14`
**Implementation:**
- Use FFmpeg `loudnorm` filter
- Two-pass normalization for accurate results
- Report before/after levels
---
## 10. Project Templates
**Command:** `gnommo init <project-name> --template <template>`
**Built-in templates:**
| Template | Description |
|----------|-------------|
| `tutorial` | Talking head + slides, square slide layout |
| `explainer` | Full-screen slides, minimal presenter |
| `review` | Product review format, multiple camera angles |
| `talking-head` | Full-screen presenter, no slides |
| `screencast` | Screen recording with small presenter PIP |
**Requirements:**
- Templates stored in `~/.gnommo/templates/` or `shared_assets/templates/`
- Each template includes:
- `project.json` with preset cutouts and settings
- `manuscript.txt` skeleton with example markers
- Sample `videos.json` structure
- User can create custom templates: `gnommo template save <name>`
---
## 11. Batch Processing
**Command:** `gnommo batch render project1 project2 project3`
**Requirements:**
- Process multiple projects in sequence
- Continue on failure (don't stop batch for one failed project)
- Summary report at end:
```
BATCH COMPLETE
━━━━━━━━━━━━━━
✓ project1 - rendered in 5:23
✓ project2 - rendered in 4:17
✗ project3 - failed (missing slide S12)
```
- Options:
- `--parallel 2` - Run N renders in parallel
- `--skip-existing` - Skip if `out/final.mp4` exists
- `--format youtube,shorts` - Render all formats for each project
---
## 12. Progress Dashboard
**Command:** `gnommo status` or `gnommo -p <project> status`
Display pipeline status for all projects or specific project.
**Output:**
```
PROJECT STATUS
━━━━━━━━━━━━━━
Project Import Preprocess Transcribe Render Output
─────────────────────────────────────────────────────────────
video1 ✓ ✓ ✓ ✓ final.mp4 (12:34)
video2 ✓ ✓ ✓ ✗ -
video3 ✓ ✗ - - -
video4 ✗ - - - -
```
**Requirements:**
- Scan all project directories
- Check for existence of intermediate files
- Show file timestamps and durations
- Highlight what needs to be done next
---
## 13. Recording Session Mode (Future)
**Command:** `gnommo -p <project> session`
Live recording assistant mode.
**Features:**
- Display current slide on secondary monitor
- Show teleprompter text overlay
- Keyboard shortcuts to advance slides
- Real-time recording with proper settings
- Auto-stop at end of manuscript
- Voice command support: "next slide", "pause"
**Note:** This is a stretch goal requiring significant UI work.
---
## Implementation Priority
### Phase 1 - Core YouTube Workflow (High Impact)
1. **Video Description Generator** (citations + Pexels attribution)
2. **YouTube Chapter Markers**
3. **Subtitle/Caption Export**
4. **Audio Normalization**
### Phase 2 - Content Creation Efficiency
5. **Thumbnail Generation**
6. **Intro/Outro Templates**
7. **Teleprompter Script Generation**
8. **Recording Checklist Generator**
### Phase 3 - Scale & Automation
9. **Project Templates**
10. **Multi-Platform Format Presets**
11. **Batch Processing**
12. **Progress Dashboard**
### Phase 4 - Advanced
13. **Recording Session Mode**
---
## Notes
- All new commands should follow existing CLI pattern: `gnommo -p <project> <command>`
- Output files go to `out/` subdirectory by default
- All features should support `--dry-run` where applicable
- Verbose mode (`-v`) should show detailed progress