Adding changes version 1

2026-02-06 17:56:05 +01:00
parent 93fa820275
commit fdd275ac0e
30 changed files with 7068 additions and 888 deletions
@@ -0,0 +1,317 @@
 # Partial Rendering Specification
 ## Overview
 Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
 - Faster iteration during development
 - Re-rendering specific sections after fixes
 - Parallel rendering of segments that can be concatenated later
 ## Scope (v1)
 **In scope:**
 - Camera state tracking (cumulative state must be computed from t=0)
 - Time offset adjustment for all events
 - Slide range filtering
 - Input video seeking
 **Out of scope (v1):**
 - Audio events crossing range boundaries
 - Triggered video duration edge cases
 - Events are assumed to begin at their marker timestamp and never "carry over"
 ## Current Architecture Analysis
 ### 1. Camera State Management
 **Current behavior** (`transformer.py:250-332`):
 - Camera state is **cumulative** across the transcript
 - `_extract_camera_events()` walks through ALL markers sequentially
 - Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
 - Example: `[Zoom2]` then `[TiltLeft]` = both zoom AND tilt active
 **Problem for partial rendering**:
 If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
 **Solution**:
 Separate "state computation" from "event generation":
 1. Always walk through ALL transcript markers to compute cumulative state
 2. Track the "initial state" at the start of the render range
 3. Only emit CameraEvents for markers WITHIN the render range
 4. First event in partial render must transition FROM the computed initial state
 ### 2. Time Signature Adjustment
 **Current behavior**:
 All timing uses absolute timestamps from `transcript.csv`:
 - `SlideEvent.start_time/end_time`
 - `VideoEvent.start_time/end_time`
 - `AudioEvent.start_time`
 - `CameraEvent.time`
 - FFmpeg expressions: `enable=between(t, start, end)`
 - Camera animation: `if(between(t, 1.000, 1.200), ...)`
 **Problem for partial rendering**:
 If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
 **Solution**:
 Apply a `time_offset` to all events after extraction:
 ```
 new_time = original_time - time_offset
 ```
 Where `time_offset` = start time of first slide/event in range.
 ### 3. Input Video Seeking
 **Current behavior**:
 - Always-visible videos (talking head) start from the beginning
 - FFmpeg processes entire input duration
 **Problem for partial rendering**:
 Need to seek into source videos to the correct position.
 **Solution**:
 Add `-ss <seek_time>` before input files for always-visible videos:
 ```
 ffmpeg -ss 10.0 -i talking_head.mov ...
 ```
 ---
 ## Proposed API
 ### Command Line Interface
 ```bash
 # Render full video (current behavior)
 gnommo render example/project.json output.mp4
 # Render specific slide range
 gnommo render example/project.json output.mp4 --slides S1:S10
 gnommo render example/project.json output.mp4 --slides S10:S20
 gnommo render example/project.json output.mp4 --slides S5:  # S5 to end
 # Render specific time range (alternative)
 gnommo render example/project.json output.mp4 --time 0:60
 gnommo render example/project.json output.mp4 --time 60:120
 ```
 ### Internal API
 New parameters for `build_render_plan()`:
 ```python
 def build_render_plan(
    ...
    slide_range: Optional[tuple[str, Optional[str]]] = None,  # (start_slide, end_slide)
    # OR
    time_range: Optional[tuple[float, Optional[float]]] = None,  # (start_time, end_time)
 ) -> RenderPlan:
 ```
 New field on `RenderPlan`:
 ```python
@dataclass
 class RenderPlan:
    ...
    time_offset: float = 0.0  # Offset to subtract from all timestamps
    initial_camera_state: CameraState = field(default_factory=CameraState)  # State at render start
    input_seek_time: float = 0.0  # Seek position for input videos
 ```
 ---
 ## Implementation Details
 ### Phase 1: Compute Full State, Filter Events
 Modify `_extract_camera_events()` to accept a time range:
 ```python
 def _extract_camera_events(
    transcript: list[TimedWord],
    time_range: Optional[tuple[float, float]] = None,  # (start, end)
 ) -> tuple[list[CameraEvent], CameraState]:
    """
    Returns:
        - List of CameraEvents within time_range
        - Initial CameraState at start of time_range
    """
    events: list[CameraEvent] = []
    current_state = CameraState()
    initial_state = CameraState()
    start_time, end_time = time_range or (0.0, float('inf'))
    found_start = False
    for timed_word in transcript:
        if not timed_word.is_marker:
            continue
        marker_id = timed_word.marker_id
        if not marker_id or marker_id not in CAMERA_PRESETS:
            continue
        # Always update current_state (full walk)
        preset = CAMERA_PRESETS[marker_id]
        new_state = _apply_preset(current_state, marker_id, preset)
        # Capture state just before we enter the render range
        if not found_start and timed_word.time >= start_time:
            initial_state = current_state  # State BEFORE this marker
            found_start = True
        # Only emit events within range
        if start_time <= timed_word.time < end_time:
            events.append(CameraEvent(
                time=timed_word.time,
                target_state=new_state,
                duration=0.2,
                easing="ease-out",
            ))
        current_state = new_state
    return events, initial_state
 ```
 ### Phase 2: Apply Time Offset
 After extracting events, apply offset to all timestamps:
 ```python
 def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
    """Shift all timestamps by offset (subtract offset from all times)."""
    # Adjust slide events
    for event in plan.slide_events:
        event.start_time -= offset
        event.end_time -= offset
    # Adjust video events
    for event in plan.video_events:
        event.start_time -= offset
        event.end_time -= offset
    # Adjust audio events
    for event in plan.audio_events:
        event.start_time = max(0, event.start_time - offset)
    # Adjust camera events
    for event in plan.camera_events:
        event.time -= offset
    # Adjust total duration
    plan.total_duration -= offset
    plan.time_offset = offset
    plan.input_seek_time = offset
    return plan
 ```
 ### Phase 3: FFmpeg Seeking
 Modify `build_ffmpeg_command()` to add seeking:
 ```python
 def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
    cmd = ["ffmpeg", "-y"]
    # Add seek for always-visible videos
    for video_id, video_source, cutout in plan.narration_videos:
        video_path = _resolve_video_path(videos_dir, video_source)
        if plan.input_seek_time > 0:
            cmd.extend(["-ss", str(plan.input_seek_time)])  # Seek BEFORE -i
        cmd.extend(["-i", str(video_path)])
        ...
 ```
 ### Phase 4: Initial Camera State Handling
 If `initial_camera_state` is not default, inject a "virtual" camera event at t=0:
 ```python
 def build_camera_transform(
    camera_events: list[CameraEvent],
    initial_state: CameraState,  # NEW PARAMETER
    ...
 ) -> str:
    # If initial state differs from default, prepend a virtual event
    if not initial_state.is_default():
        initial_event = CameraEvent(
            time=0.0,
            target_state=initial_state,
            duration=0.0,  # Instant - no transition
            easing="linear",
        )
        camera_events = [initial_event] + camera_events
    ...
 ```
 ---
 ## FFmpeg Optimization
 **Only emit filters for events within range.**
 When rendering a partial range, the `RenderPlan` should only contain events within that range. This means:
 - Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
 - Fewer overlay filters in filter_complex
 - Fewer `between(t, start, end)` enable expressions to evaluate per frame
 Example: Full video has 50 slides, rendering S40:S50 only:
 - **Before**: 50 slide inputs, 50 overlay filters
 - **After**: 10 slide inputs, 10 overlay filters
 This is achieved naturally by filtering events in `build_render_plan()` before constructing the plan - the renderer already only processes events present in the plan.
 ---
 ## Edge Cases (v1 Simplified)
 ### 1. Camera state from before range
 If rendering S5:S10 but there's a camera event at the S4 marker:
 - Camera state from S4 must be captured as `initial_camera_state`
 - Rendered output starts with that state already applied at t=0
 ### 2. Events filter by marker position
 All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
 - Events beginning outside range are excluded
 - No "carry over" or boundary-crossing logic needed
 ---
 ## Testing Strategy
 ### Unit Tests
 1. Camera state computation maintains state across full transcript
 2. Time offset correctly shifts all event types
 3. Initial camera state correctly captured at boundary
 ### Integration Tests
 1. Render slides 1-5, then 5-10, concatenate, compare to full render
 2. Camera state continuity across segment boundaries
 3. Audio alignment after seeking
 ### Manual Verification
 1. Visual inspection of camera state at segment boundaries
 2. Audio sync verification
 ---
 ## Future Enhancements
 ### Parallel Rendering Pipeline
 ```bash
 # Render in parallel, then concatenate
 gnommo render proj.json seg1.mp4 --slides S1:S10 &
 gnommo render proj.json seg2.mp4 --slides S10:S20 &
 gnommo render proj.json seg3.mp4 --slides S20: &
 wait
 ffmpeg -f concat -i segments.txt -c copy final.mp4
 ```
 ### Smart Re-rendering
 Track which slides changed and only re-render affected segments.
 ### Preview Mode
 Quick low-quality render of specific section for review.
@@ -0,0 +1,265 @@
 # Virtual Camera Effects
 Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
 These effects are triggered by markers in the manuscript, just like slides.
 ## Zoom Effects
 | Marker | Description |
 |--------|-------------|
 | `[Zoom1]` | Zoom to 110% - subtle emphasis |
 | `[Zoom2]` | Zoom to 125% - moderate emphasis |
 | `[Zoom3]` | Zoom to 150% - strong emphasis |
 | `[Zoom0]` | Return to 100% (default) |
 | `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |
 **Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.
 ## Tilt/Rotation Effects
 | Marker | Description |
 |--------|-------------|
 | `[TiltLeft]` | Rotate -15 degrees |
 | `[TiltRight]` | Rotate +15 degrees |
 | `[NoTilt]` | Return to 0 degrees |
 | `[TiltShake]` | Quick left-right shake (confusion/emphasis) |
 **Use case:** Tilt when saying something "off" or wrong, return to flat for correction.
 ## Pan/Position Effects
 | Marker | Description |
 |--------|-------------|
 | `[PanLeft]` | Shift frame left (subject moves right) |
 | `[PanRight]` | Shift frame right (subject moves left) |
 | `[PanUp]` | Shift frame up |
 | `[PanDown]` | Shift frame down |
 | `[PanCenter]` | Return to center |
 **Use case:** Pan to make room for a slide appearing on one side.
 ## Shake/Movement Effects
 | Marker | Description |
 |--------|-------------|
 | `[Shake]` | Brief screen shake (impact, surprise) |
 | `[ShakeHard]` | Intense shake (explosion, error) |
 | `[Wobble]` | Gentle continuous wobble |
 | `[NoWobble]` | Stop wobble |
 **Use case:** Shake on "WRONG!" or when something crashes/fails.
 ## Speed/Rhythm Effects
 | Marker | Description |
 |--------|-------------|
 | `[Beat]` | Single visual pulse (scale bump) |
 | `[BeatStart]` | Start pulsing to rhythm |
 | `[BeatStop]` | Stop pulsing |
 **Use case:** Rhythmic emphasis during lists or key points.
 ## Transition Effects
 | Marker | Description |
 |--------|-------------|
 | `[Flash]` | Quick white flash |
 | `[Blackout]` | Brief black frame |
 | `[Glitch]` | Digital glitch effect |
 **Use case:** Transition between topics or for "record scratch" moments.
 ## Picture-in-Picture Variations
 | Marker | Description |
 |--------|-------------|
 | `[PipGrow]` | Enlarge talking head cutout |
 | `[PipShrink]` | Shrink talking head cutout |
 | `[PipHide]` | Temporarily hide talking head |
 | `[PipShow]` | Restore talking head |
 | `[PipMove:corner]` | Move pip to different corner |
 **Use case:** Shrink self when showing important diagram, grow when making personal point.
 ## Combination Presets
 | Marker | Description |
 |--------|-------------|
 | `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
 | `[Surprise]` | Quick zoom + shake |
 | `[Sarcasm]` | Slow zoom + tilt |
 | `[Reset]` | Return all effects to default |
 ---
 ## Architecture: The Camera Abstraction
 ### The Core Insight
 All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
 The **camera** views the scene. When the camera zooms, tilts, or pans - everything
 moves together, just like a real camera filming a physical set.
 ```
 ┌─────────────────────────────────────────────────────────┐
 │                        SCENE                           │
 │  ┌─────────────────────────────────────────────────┐   │
 │  │              Background Layer                   │   │
 │  │  ┌─────────────┐                                │   │
 │  │  │ Talking Head│      ┌──────────────────┐      │   │
 │  │  │   (cutout)  │      │      Slide       │      │   │
 │  │  └─────────────┘      │    (from .png)   │      │   │
 │  │                       └──────────────────┘      │   │
 │  └─────────────────────────────────────────────────┘   │
 └─────────────────────────────────────────────────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   CAMERA    │
                    │  zoom: 1.25 │
                    │  tilt: -15° │
                    │  pan: 0, 0  │
                    └─────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │  Final Output   │
                  │   (1920x1080)   │
                  └─────────────────┘
 ```
 ### Why This Matters
 **Keynote slides are designed for a specific frame.** If you create a slide with
 an arrow pointing at where the talking head cutout will be, that spatial
 relationship must be preserved when the camera zooms or tilts.
 If we zoomed only the background and not the slides, the arrow would point to
 the wrong place. The camera abstraction ensures everything transforms together.
 ### Camera Properties
 ```python
@dataclass
 class CameraState:
    zoom: float = 1.0        # 1.0 = 100%, 1.25 = 125%
    rotation: float = 0.0    # degrees, positive = clockwise
    pan_x: float = 0.0       # -1.0 to 1.0, percentage of frame
    pan_y: float = 0.0       # -1.0 to 1.0, percentage of frame
@dataclass
 class CameraKeyframe:
    time: float              # timestamp in seconds
    state: CameraState
    easing: str = "linear"   # linear, ease-in, ease-out, ease-in-out
 ```
 ### Rendering Pipeline (Updated)
 ```
 Current Pipeline:
  Parse → Validate → Transform → Render
                                   │
                                   ▼
                          build_filter_complex()
                                   │
                          [bg] → overlays → [vout]
 New Pipeline:
  Parse → Validate → Transform → Render
                         │
                    Extract camera
                    keyframes from
                    markers
                         │
                         ▼
                  build_filter_complex()
                         │
              [bg] → overlays → [scene]
                                   │
                          apply_camera_transform()
                                   │
                              [scene] → zoom/rotate/pan → [vout]
 ```
 ### FFmpeg Implementation
 The camera transform is a **final filter stage** applied to the composed scene:
 ```
 # Compose scene (existing code)
 [0:v]scale=1920:1080[bg];
 [bg][slide1]overlay=...[s1];
 [s1][talkinghead]overlay=...[scene];
 # Camera transform (new)
 [scene]scale=iw*{zoom}:ih*{zoom},
       rotate={rotation}*PI/180:fillcolor=black,
       crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
 ```
 For smooth animated zoom (using expressions):
 ```
 [scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
              x='iw/2-(iw/zoom/2)':
              y='ih/2-(ih/zoom/2)':
              d=1:s=1920x1080:fps=30[vout]
 ```
 ### Camera Events in Timeline
 New model for camera changes:
 ```python
@dataclass
 class CameraEvent:
    time: float
    target_state: CameraState
    duration: float = 0.0      # 0 = instant snap
    easing: str = "ease-out"
 ```
 Markers map to camera events:
 - `[Zoom2]` → `CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
 - `[TiltLeft]` → `CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
 - `[Reset]` → `CameraEvent(time=t, target_state=CameraState(), duration=0.2)`
 ### Considerations
 1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
   larger than output (e.g., 2x) to have room for zoom without quality loss.
 2. **Rotation center**: Rotate around frame center, not corner.
 3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
   are both active. `[Reset]` clears all.
 4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
   transform naturally with the camera. No special handling needed.
 5. **Slides stay synced**: Keynote exports are positioned for the base frame.
   Camera zoom/tilt transforms them identically to everything else.
 ---
 ## Implementation Plan
 ### Phase 1: Camera Data Model ✓
 - [x] Add `CameraState` and `CameraEvent` to models.py
 - [x] Add camera effect markers to transformer.py
 - [x] Generate camera keyframes from markers
 ### Phase 2: Render Pipeline ✓
 - [x] Modify renderer to compose to `[scene]` instead of `[vout]`
 - [x] Add camera transform stage after composition
 - [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now
 ### Phase 3: Smooth Animation (partial)
 - [x] Support animated transitions between keyframes (linear interpolation)
 - [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
 - [ ] Test with rapid zoom sequences
 ### Phase 4: Effect Presets ✓
 - [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
 - [x] Presets defined in `CAMERA_PRESETS` dict in models.py
 - [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement
@@ -0,0 +1,10 @@
 [
  {
    "reference": "Gnommo Documentation - https://github.com/example/gnommo",
    "context": ""
  },
  {
    "reference": "FFmpeg Documentation - https://ffmpeg.org/documentation.html",
    "context": ""
  }
 ]
@@ -1,5 +1,19 @@
-Welcome to GnommoEditor, a code-first video editing system. [S1]
+[S1]
 This is the first slide. It appears immediately. [cite:Gnommo Documentation - https://github.com/example/gnommo]
-In this example, we demonstrate how slides appear at specific timestamps based on markers in the transcript. [S2]
+[S2]
 However, this is the second slide. It should appear 1 second prior to when I say "however"
-And that's the end of our demo.
+[S3]
 [video:Zoomin_MontageZoom]
 This is me talking alongside a video. The video is constrained within the red square. Notice how the video stops immediately when we make the transition to the next slide. [cite:FFmpeg Documentation - https://ffmpeg.org/documentation.html]
 [S4]
 I will continue to talk without pause, but in the finished recording - there will be a pause before the narration continues. Now a video will play that pauses the narration
 [S5]
 [video:gnommologo]
 Notice how my voice continues after the video finished.
 [S6]
@@ -0,0 +1,26 @@
 {
  "S1": {
    "image": "example.001.png",
    "type": "fullscreen"
  },
  "S2": {
    "image": "example.002.png",
    "type": "fullscreen"
  },
  "S3": {
    "image": "example.003.png",
    "type": "fullscreen"
  },
  "S4": {
    "image": "example.004.png",
    "type": "fullscreen"
  },
  "S5": {
    "image": "example.005.png",
    "type": "fullscreen"
  },
  "S6": {
    "image": "example.006.png",
    "type": "fullscreen"
  }
 }
@@ -0,0 +1,2 @@
 file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/talking_head_batch0.mov'
 file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/segments/segment_0002.mov'
@@ -0,0 +1,497 @@
 [
  {
    "word": "This",
    "start": 10.72,
    "end": 11.4
  },
  {
    "word": "is",
    "start": 11.4,
    "end": 11.6
  },
  {
    "word": "the",
    "start": 11.6,
    "end": 11.78
  },
  {
    "word": "first",
    "start": 11.78,
    "end": 11.98
  },
  {
    "word": "slide.",
    "start": 11.98,
    "end": 12.44
  },
  {
    "word": "It",
    "start": 13.02,
    "end": 13.3
  },
  {
    "word": "appears",
    "start": 13.3,
    "end": 13.66
  },
  {
    "word": "immediately.",
    "start": 13.66,
    "end": 14.3
  },
  {
    "word": "However,",
    "start": 15.34,
    "end": 16.02
  },
  {
    "word": "this",
    "start": 16.34,
    "end": 16.46
  },
  {
    "word": "is",
    "start": 16.46,
    "end": 16.58
  },
  {
    "word": "the",
    "start": 16.58,
    "end": 16.76
  },
  {
    "word": "second",
    "start": 16.76,
    "end": 17.04
  },
  {
    "word": "slide.",
    "start": 17.04,
    "end": 17.4
  },
  {
    "word": "It",
    "start": 17.74,
    "end": 17.96
  },
  {
    "word": "should",
    "start": 17.96,
    "end": 18.2
  },
  {
    "word": "appear",
    "start": 18.2,
    "end": 18.54
  },
  {
    "word": "one",
    "start": 18.54,
    "end": 18.98
  },
  {
    "word": "second",
    "start": 18.98,
    "end": 19.46
  },
  {
    "word": "prior",
    "start": 19.46,
    "end": 19.88
  },
  {
    "word": "to",
    "start": 19.88,
    "end": 20.1
  },
  {
    "word": "the",
    "start": 20.1,
    "end": 20.22
  },
  {
    "word": "word",
    "start": 20.22,
    "end": 20.52
  },
  {
    "word": "to",
    "start": 20.52,
    "end": 21.14
  },
  {
    "word": "say",
    "start": 21.14,
    "end": 21.42
  },
  {
    "word": "whoever",
    "start": 21.42,
    "end": 21.8
  },
  {
    "word": "the",
    "start": 21.8,
    "end": 22.16
  },
  {
    "word": "first",
    "start": 22.16,
    "end": 22.4
  },
  {
    "word": "time.",
    "start": 22.4,
    "end": 22.68
  },
  {
    "word": "This",
    "start": 24.28,
    "end": 24.96
  },
  {
    "word": "is",
    "start": 24.96,
    "end": 25.12
  },
  {
    "word": "me",
    "start": 25.12,
    "end": 25.36
  },
  {
    "word": "taking,",
    "start": 25.36,
    "end": 25.74
  },
  {
    "word": "talking",
    "start": 26.12,
    "end": 27.12
  },
  {
    "word": "alongside",
    "start": 27.12,
    "end": 27.64
  },
  {
    "word": "a",
    "start": 27.64,
    "end": 27.88
  },
  {
    "word": "video.",
    "start": 27.88,
    "end": 28.16
  },
  {
    "word": "The",
    "start": 28.16,
    "end": 28.92
  },
  {
    "word": "video",
    "start": 28.92,
    "end": 29.18
  },
  {
    "word": "is",
    "start": 29.18,
    "end": 29.36
  },
  {
    "word": "constrained",
    "start": 29.36,
    "end": 29.76
  },
  {
    "word": "within",
    "start": 29.76,
    "end": 30.14
  },
  {
    "word": "the",
    "start": 30.14,
    "end": 30.32
  },
  {
    "word": "red",
    "start": 30.32,
    "end": 30.48
  },
  {
    "word": "square.",
    "start": 30.48,
    "end": 30.9
  },
  {
    "word": "Notice",
    "start": 31.26,
    "end": 31.44
  },
  {
    "word": "how",
    "start": 31.44,
    "end": 31.74
  },
  {
    "word": "the",
    "start": 31.74,
    "end": 31.92
  },
  {
    "word": "video",
    "start": 31.92,
    "end": 32.14
  },
  {
    "word": "stops",
    "start": 32.14,
    "end": 32.44
  },
  {
    "word": "immediately",
    "start": 32.44,
    "end": 32.94
  },
  {
    "word": "when",
    "start": 32.94,
    "end": 33.36
  },
  {
    "word": "we",
    "start": 33.36,
    "end": 33.54
  },
  {
    "word": "make",
    "start": 33.54,
    "end": 33.74
  },
  {
    "word": "the",
    "start": 33.74,
    "end": 33.94
  },
  {
    "word": "transition",
    "start": 33.94,
    "end": 34.38
  },
  {
    "word": "to",
    "start": 34.38,
    "end": 34.68
  },
  {
    "word": "the",
    "start": 34.68,
    "end": 34.8
  },
  {
    "word": "next",
    "start": 34.8,
    "end": 35.02
  },
  {
    "word": "slide.",
    "start": 35.02,
    "end": 35.48
  },
  {
    "word": "I",
    "start": 37.18,
    "end": 37.72
  },
  {
    "word": "will",
    "start": 37.72,
    "end": 37.78
  },
  {
    "word": "continue",
    "start": 37.78,
    "end": 38.08
  },
  {
    "word": "to",
    "start": 38.08,
    "end": 38.32
  },
  {
    "word": "talk",
    "start": 38.32,
    "end": 38.56
  },
  {
    "word": "without",
    "start": 38.56,
    "end": 38.88
  },
  {
    "word": "pause,",
    "start": 38.88,
    "end": 39.24
  },
  {
    "word": "but",
    "start": 39.46,
    "end": 39.56
  },
  {
    "word": "in",
    "start": 39.56,
    "end": 39.68
  },
  {
    "word": "the",
    "start": 39.68,
    "end": 39.74
  },
  {
    "word": "finished",
    "start": 39.74,
    "end": 39.98
  },
  {
    "word": "recording",
    "start": 39.98,
    "end": 40.46
  },
  {
    "word": "there",
    "start": 40.46,
    "end": 41.18
  },
  {
    "word": "will",
    "start": 41.18,
    "end": 41.36
  },
  {
    "word": "be",
    "start": 41.36,
    "end": 41.54
  },
  {
    "word": "a",
    "start": 41.54,
    "end": 41.64
  },
  {
    "word": "pause",
    "start": 41.64,
    "end": 41.92
  },
  {
    "word": "before",
    "start": 41.92,
    "end": 42.28
  },
  {
    "word": "the",
    "start": 42.28,
    "end": 42.5
  },
  {
    "word": "narration",
    "start": 42.5,
    "end": 43.0
  },
  {
    "word": "continues.",
    "start": 43.0,
    "end": 43.64
  },
  {
    "word": "Now",
    "start": 44.38,
    "end": 44.52
  },
  {
    "word": "a",
    "start": 44.52,
    "end": 44.68
  },
  {
    "word": "video",
    "start": 44.68,
    "end": 44.9
  },
  {
    "word": "will",
    "start": 44.9,
    "end": 45.08
  },
  {
    "word": "play",
    "start": 45.08,
    "end": 45.36
  },
  {
    "word": "that",
    "start": 45.36,
    "end": 45.76
  },
  {
    "word": "pauses",
    "start": 45.76,
    "end": 46.52
  },
  {
    "word": "the",
    "start": 46.52,
    "end": 46.76
  },
  {
    "word": "narration.",
    "start": 46.76,
    "end": 47.2
  },
  {
    "word": "Notice",
    "start": 48.64,
    "end": 49.18
  },
  {
    "word": "how",
    "start": 49.18,
    "end": 49.42
  },
  {
    "word": "my",
    "start": 49.42,
    "end": 49.58
  },
  {
    "word": "voice",
    "start": 49.58,
    "end": 49.8
  },
  {
    "word": "continues",
    "start": 49.8,
    "end": 50.36
  },
  {
    "word": "after",
    "start": 50.36,
    "end": 50.84
  },
  {
    "word": "the",
    "start": 50.84,
    "end": 51.02
  },
  {
    "word": "video",
    "start": 51.02,
    "end": 51.24
  },
  {
    "word": "finished.",
    "start": 51.24,
    "end": 51.76
  }
 ]
@@ -0,0 +1,39 @@
 {
   "talking_head": {
    "source_file": "talking_head.mov",
    "output_file": "talking_head_processed.mov",
    "cutout": "talkinghead",
    "always_visible": true,
    "filter": [
      {
        "type": "chroma_key",
        "color": [131, 177, 83],
        "similarity": 0.04,
        "blend": 0.025,
        "spill": 0.05
      },
      {
        "type": "mask",
        "left": 0.05,
        "right": 0.10
      }
    ]
  },
    "gnommologo": {                                                                                                                                
      "source_file": "Logo.mov",                                                                                                                   
      "is_shared": true,                                                                                                                           
      "cutout": "fullscreen",                                                                                                                      
      "pause_narration": 0 ,                                                                                                                  
      "take": 10,                                                                                                                                   
      "skip": 0                                                                                                                                    
  },
   "Zoomin_MontageZoom": {
    "description": "Montage zoom",
    "source_file": "MontageZoom.mp4",
    "output_file": "MontageZoom.mp4",
    "pause_narration":3,
    "cutout": "square",
    "is_shared": true,
    "filter": []
  }
 }
@@ -1,11 +1,35 @@
 {
  "id": "VideoExample",
  "name": "Example",
  "description": "In this video, I demonstrate the Gnommo video editing pipeline - a code-first approach to creating presenter-mode videos from Keynote presentations.",
  "footer": "Subscribe for more tutorials!\nTwitter: @example",
  "resolution": [1920, 1080],
  "fps": 30,
-  "talkinghead": {
+  "gnommo_scratch": null,           
-    "x": 50,
+  "defaultSlideType": "fullscreen",
-    "y": 600,
+  "keynote_file": "media/example.key",        
-    "targetheight": 400
+  "transcript": "media/videos/talking_head.transcript.json",
-  },
+  "background": "shared_assets/solarpunk.png",
-  "defaultSlideType": "square",
+  "videos": "media/videos/videos.json",
-  "background_video": ""
+  "slides": "media/slides/Example/slides.json",
  "audio": "media/audio/audio.json",
  "main_video": "talking_head",
   "cutouts": {
    "talkinghead": {
      "x": "-10%",
      "y": "40%",
      "height": "60%"
    },
    "square": {
      "x": "45%",
      "y": "3%",
      "width": "53%",
      "height": "94%"
    },
    "fullscreen": {
      "x": "0%",
      "y": "0%",
      "height": "100%"
    }
  }
 }
@@ -1,10 +0,0 @@
 {
  "S1": {
    "image": "S1.png",
    "type": "square"
  },
  "S2": {
    "image": "S2.png",
    "type": "square"
  }
 }
@@ -1,8 +0,0 @@
 t,word
 0.00,Hello
 0.30,world
 0.60,[S1]
 1.50,Second
 1.80,slide
 2.00,[S2]
 2.50,End
@@ -1,6 +0,0 @@
 {
  "talking_head": {
    "file": "media/talking_head.mp4",
    "preprocess": []
  }
 }
@@ -1,154 +1,21 @@
 #!/bin/bash
 #
 # GnommoEditor - Code-first video editing pipeline
 # This is a thin wrapper that activates the venv and runs the Python CLI.
 #
-# Usage:
+# Usage: gnommo -p <project> [action] [options]
-#   gnommo.sh -p <project>              Render project
+# Run with -h for full help.
 #   gnommo.sh -p <project> import       Generate slides.json from image files
 #   gnommo.sh -p <project> validate     Validate only
 #   gnommo.sh -p <project> preprocess   Apply video preprocessing filters
 #   gnommo.sh -p <project> transcribe   Transcribe video
 #   gnommo.sh -p <project> align        Align markers to transcript
 #   gnommo.sh -p <project> all          Full pipeline: transcribe → align → render
 #
 set -e
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 VENV_PYTHON="$SCRIPT_DIR/venv/bin/python"
 # Check for venv
 if [[ ! -f "$VENV_PYTHON" ]]; then
    echo "Error: Virtual environment not found at $SCRIPT_DIR/venv"
-    echo "Create it with: python -m venv venv && ./venv/bin/pip install openai-whisper"
+    echo "Create it with: python -m venv venv && ./venv/bin/pip install -e . openai-whisper"
    exit 1
 fi
-# Parse arguments
+# Pass all arguments directly to the Python CLI
-PROJECT=""
+exec "$VENV_PYTHON" -m gnommo "$@"
 COMMAND="render"
 VERBOSE=""
 FORCE=""
 usage() {
    echo "Usage: gnommo.sh -p <project> [command] [options]"
    echo ""
    echo "Commands:"
    echo "  render      Render video (default)"
    echo "  import      Generate slides.json from image files"
    echo "  validate    Validate project only"
    echo "  preprocess  Apply video preprocessing filters (chroma key, etc.)"
    echo "  transcribe  Transcribe video audio"
    echo "  align       Align manuscript to transcript"
    echo "  all         Full pipeline: transcribe → align → render"
    echo ""
    echo "Options:"
    echo "  -p <dir>    Project directory (required)"
    echo "  -v          Verbose output"
    echo "  -f          Force overwrite existing files"
    echo "  -h          Show this help"
    echo ""
    echo "Examples:"
    echo "  gnommo.sh -p video1              # Render video1 project"
    echo "  gnommo.sh -p video1 import       # Generate slides.json"
    echo "  gnommo.sh -p video1 import -f    # Force overwrite slides.json"
    echo "  gnommo.sh -p video1 validate     # Validate only"
    echo "  gnommo.sh -p video1 all          # Full pipeline"
    exit 0
 }
 while [[ $# -gt 0 ]]; do
    case $1 in
        -p|--project)
            PROJECT="$2"
            shift 2
            ;;
        -v|--verbose)
            VERBOSE="-v"
            shift
            ;;
        -f|--force)
            FORCE="-f"
            shift
            ;;
        -h|--help)
            usage
            ;;
        import|validate|render|preprocess|transcribe|align|all)
            COMMAND="$1"
            shift
            ;;
        *)
            echo "Unknown option: $1"
            usage
            ;;
    esac
 done
 # Validate project argument
 if [[ -z "$PROJECT" ]]; then
    echo "Error: Project directory required (-p <project>)"
    echo ""
    usage
 fi
 if [[ ! -d "$PROJECT" ]]; then
    echo "Error: Project directory not found: $PROJECT"
    exit 1
 fi
 if [[ ! -f "$PROJECT/project.json" ]]; then
    echo "Error: project.json not found in $PROJECT"
    exit 1
 fi
 # Run commands using new CLI interface
 run_gnommo() {
    "$VENV_PYTHON" -m gnommo -p "$PROJECT" -a "$1" $VERBOSE
 }
 run_gnommo_import() {
    "$VENV_PYTHON" -m gnommo -p "$PROJECT" -a validate -i $FORCE $VERBOSE
 }
 case $COMMAND in
    import)
        echo "=== Importing assets for $PROJECT ==="
        run_gnommo_import
        ;;
    validate)
        echo "=== Validating $PROJECT ==="
        run_gnommo validate
        ;;
    transcribe)
        echo "=== Transcribing $PROJECT ==="
        run_gnommo transcribe
        ;;
    align)
        echo "=== Aligning $PROJECT ==="
        run_gnommo align
        ;;
    render)
        echo "=== Rendering $PROJECT ==="
        run_gnommo render
        ;;
    preprocess)
        echo "=== Preprocessing $PROJECT ==="
        run_gnommo preprocess
        ;;
    all)
        echo "=== Full Pipeline: $PROJECT ==="
        run_gnommo all
        ;;
    *)
        echo "Unknown command: $COMMAND"
        usage
        ;;
 esac
@@ -1,199 +0,0 @@
 """Alignment stage: match manuscript markers to transcript timestamps."""
 import csv
 import re
 from dataclasses import dataclass
 from pathlib import Path
 from .errors import GnommoError
 from .transcriber import TranscribedWord
 class AlignmentError(GnommoError):
    """Error during alignment."""
    pass
@dataclass
 class MarkerAlignment:
    """A marker with its aligned timestamp."""
    marker_id: str
    timestamp: float
    matched_phrase: str
    confidence: float  # 0-1, how confident the match is
 def extract_marker_contexts(manuscript_text: str) -> list[tuple[str, str]]:
    """
    Extract markers and the text immediately following them.
    Returns:
        List of (marker_id, following_text) tuples
    """
    # Split by markers, keeping the markers
    parts = re.split(r"\[([A-Za-z0-9_]+)\]", manuscript_text)
    # parts will be: [text_before, marker1, text_after1, marker2, text_after2, ...]
    contexts = []
    for i in range(1, len(parts), 2):
        marker_id = parts[i]
        if i + 1 < len(parts):
            following_text = parts[i + 1].strip()
            # Get first sentence or first N words
            following_text = _get_first_phrase(following_text)
            contexts.append((marker_id, following_text))
    return contexts
 def _get_first_phrase(text: str, max_words: int = 10) -> str:
    """Extract first phrase (up to first sentence end or max_words)."""
    # Clean up the text
    text = text.replace("\n", " ").strip()
    # Find first sentence boundary
    match = re.search(r"[.!?]", text)
    if match and match.start() < 200:
        text = text[: match.start()]
    # Limit to max_words
    words = text.split()[:max_words]
    return " ".join(words)
 def normalize_text(text: str) -> str:
    """Normalize text for matching (lowercase, remove punctuation)."""
    text = text.lower()
    text = re.sub(r"[^\w\s]", "", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()
 def find_phrase_in_transcript(
    phrase: str,
    transcript: list[TranscribedWord],
    start_from: int = 0,
 ) -> tuple[int, float]:
    """
    Find a phrase in the transcript and return the word index and timestamp.
    Uses sliding window matching with normalization.
    Returns:
        Tuple of (word_index, timestamp) or (-1, 0.0) if not found
    """
    phrase_normalized = normalize_text(phrase)
    phrase_words = phrase_normalized.split()
    if not phrase_words:
        return -1, 0.0
    # Try to find increasingly shorter prefixes
    for length in range(len(phrase_words), 2, -1):
        target = " ".join(phrase_words[:length])
        # Sliding window through transcript
        for i in range(start_from, len(transcript) - length + 1):
            window_words = [normalize_text(transcript[j].word) for j in range(i, i + length)]
            window_text = " ".join(window_words)
            if target in window_text or window_text in target:
                return i, transcript[i].start
    # Fallback: try to find just the first few words
    if len(phrase_words) >= 2:
        target = " ".join(phrase_words[:3])
        for i in range(start_from, len(transcript) - 2):
            window_words = [normalize_text(transcript[j].word) for j in range(i, min(i + 5, len(transcript)))]
            window_text = " ".join(window_words)
            if phrase_words[0] in window_text and phrase_words[1] in window_text:
                return i, transcript[i].start
    return -1, 0.0
 def align_markers(
    manuscript_text: str,
    transcript: list[TranscribedWord],
    offset_seconds: float = -1.0,
 ) -> list[MarkerAlignment]:
    """
    Align manuscript markers to transcript timestamps.
    Args:
        manuscript_text: Full manuscript text with [S1], [S2] etc.
        transcript: Word-level transcript with timestamps
        offset_seconds: Offset to apply to found timestamps (default -1.0)
    Returns:
        List of MarkerAlignment with timestamps
    """
    contexts = extract_marker_contexts(manuscript_text)
    alignments: list[MarkerAlignment] = []
    last_index = 0
    for marker_id, following_text in contexts:
        idx, timestamp = find_phrase_in_transcript(
            following_text, transcript, start_from=last_index
        )
        if idx >= 0:
            # Apply offset (e.g., -1 second before the word)
            adjusted_time = max(0.0, timestamp + offset_seconds)
            alignments.append(MarkerAlignment(
                marker_id=marker_id,
                timestamp=adjusted_time,
                matched_phrase=following_text[:50],
                confidence=1.0,
            ))
            last_index = idx
        else:
            # Could not find match - report but continue
            alignments.append(MarkerAlignment(
                marker_id=marker_id,
                timestamp=-1.0,  # Indicates not found
                matched_phrase=following_text[:50],
                confidence=0.0,
            ))
    return alignments
 def save_aligned_transcript(
    alignments: list[MarkerAlignment],
    transcript: list[TranscribedWord],
    output_path: Path,
 ) -> None:
    """
    Save aligned transcript as CSV compatible with gnommo's transcript.csv format.
    Format:
        t,word
        0.00,Hello
        1.50,[S1]
        1.51,This
        ...
    """
    # Build list of (timestamp, word) including markers
    entries: list[tuple[float, str]] = []
    # Add all words from transcript
    for word in transcript:
        entries.append((word.start, word.word))
    # Add markers at their aligned positions
    for alignment in alignments:
        if alignment.timestamp >= 0:
            entries.append((alignment.timestamp, f"[{alignment.marker_id}]"))
    # Sort by timestamp
    entries.sort(key=lambda x: x[0])
    # Write CSV
    with open(output_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["t", "word"])
        for timestamp, word in entries:
            writer.writerow([f"{timestamp:.2f}", word])
@@ -0,0 +1,359 @@
 """Description generator: Create YouTube description with chapters, citations, and attributions."""
 import re
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Optional
 from .models import (
    Attribution,
    Citation,
    ProjectConfig,
    SlideDefinition,
    VideoSource,
 )
 from .transcriber import TranscribedWord
@dataclass
 class ChapterMarker:
    """A chapter marker with timestamp and title."""
    slide_id: str
    timestamp: float
    title: str
 def _format_timestamp(seconds: float) -> str:
    """Format seconds as M:SS or H:MM:SS for YouTube chapters."""
    if seconds < 0:
        return "0:00"
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    secs = int(seconds % 60)
    if hours > 0:
        return f"{hours}:{minutes:02d}:{secs:02d}"
    else:
        return f"{minutes}:{secs:02d}"
 def _extract_chapter_title(
    manuscript_text: str, slide_id: str, slides: dict[str, SlideDefinition]
 ) -> str:
    """
    Extract a chapter title for a slide.
    Tries to find meaningful title from:
    1. First sentence/line after the slide marker
    2. Falls back to slide ID if nothing useful found
    """
    # Find the marker and text after it
    pattern = rf"\[{re.escape(slide_id)}\]\s*(.+?)(?=\[S\d+\]|\[video:|\[narration:|\Z)"
    match = re.search(pattern, manuscript_text, re.DOTALL)
    if match:
        text = match.group(1).strip()
        # Remove any other markers from the text
        text = re.sub(r"\[[^\]]+\]", "", text).strip()
        if text:
            # Take first line or first sentence
            first_line = text.split("\n")[0].strip()
            # Truncate if too long
            if len(first_line) > 50:
                # Try to break at word boundary
                truncated = first_line[:47]
                last_space = truncated.rfind(" ")
                if last_space > 30:
                    truncated = truncated[:last_space]
                first_line = truncated + "..."
            if first_line:
                return first_line
    # Fallback to slide number
    slide_num = slide_id[1:] if slide_id.startswith("S") else slide_id
    return f"Section {slide_num}"
 def _align_citation_to_transcription(
    citation: Citation,
    transcription: list[TranscribedWord],
    manuscript_text: str,
 ) -> float:
    """
    Align a citation to the transcription to find its timestamp.
    Uses the context text following the citation to find the approximate
    position in the audio.
    Returns timestamp in seconds, or -1 if not found.
    """
    if not transcription or not citation.context:
        return -1.0
    # Get more context from the manuscript for better matching
    # Find the citation in the manuscript and get surrounding text
    pattern = rf"\[cite:{re.escape(citation.reference)}\]\s*(.{{0,200}})"
    match = re.search(pattern, manuscript_text, re.DOTALL)
    if not match:
        return -1.0
    context_text = match.group(1).strip()
    # Clean up: remove markers, normalize whitespace
    context_text = re.sub(r"\[[^\]]+\]", "", context_text)
    context_text = " ".join(context_text.split())
    if not context_text:
        return -1.0
    # Normalize for matching
    context_words = context_text.lower().split()[:10]  # Use up to 10 words
    if not context_words:
        return -1.0
    # Build normalized transcription
    trans_words = [(w.word.lower(), w.start) for w in transcription]
    # Simple sliding window match
    best_match_score = 0
    best_match_time = -1.0
    for i in range(len(trans_words) - len(context_words) + 1):
        matches = 0
        for j, ctx_word in enumerate(context_words):
            trans_word = trans_words[i + j][0]
            # Allow partial matches for longer words
            if ctx_word == trans_word:
                matches += 1
            elif len(ctx_word) >= 4 and (
                ctx_word in trans_word or trans_word in ctx_word
            ):
                matches += 0.5
        score = matches / len(context_words)
        if score > best_match_score and score >= 0.5:
            best_match_score = score
            best_match_time = trans_words[i][1]
    return best_match_time
 def generate_chapters(
    manuscript_text: str,
    slides: dict[str, SlideDefinition],
    marker_timings: list,  # List of MarkerTiming from transformer
    min_chapter_duration: float = 30.0,
 ) -> list[ChapterMarker]:
    """
    Generate chapter markers from slide timings.
    Args:
        manuscript_text: The manuscript content
        slides: Slide definitions
        marker_timings: Aligned marker timings from the transformer
        min_chapter_duration: Minimum seconds between chapters (merges short ones)
    Returns:
        List of ChapterMarker objects
    """
    chapters = []
    # Build timing lookup
    timing_lookup = {t.marker_id: t.timestamp for t in marker_timings if t.timestamp >= 0}
    # Process slides in order
    slide_ids = sorted(
        [s for s in slides.keys() if s.startswith("S")],
        key=lambda x: int(x[1:]) if x[1:].isdigit() else 0,
    )
    for slide_id in slide_ids:
        if slide_id not in timing_lookup:
            continue
        timestamp = timing_lookup[slide_id]
        title = _extract_chapter_title(manuscript_text, slide_id, slides)
        # Check if we should merge with previous chapter (too short)
        if chapters and (timestamp - chapters[-1].timestamp) < min_chapter_duration:
            continue  # Skip this chapter, previous one covers it
        chapters.append(
            ChapterMarker(
                slide_id=slide_id,
                timestamp=timestamp,
                title=title,
            )
        )
    # Ensure first chapter starts at 0:00
    if chapters and chapters[0].timestamp > 0:
        chapters[0] = ChapterMarker(
            slide_id=chapters[0].slide_id,
            timestamp=0.0,
            title=chapters[0].title,
        )
    return chapters
 def collect_attributions(
    videos: dict[str, VideoSource],
    video_events: list = None,
 ) -> list[tuple[str, Attribution]]:
    """
    Collect all video attributions.
    Returns list of (video_id, Attribution) tuples for videos that have attribution.
    Only includes videos that are actually used in the project (via video_events)
    or videos from shared assets that have attribution.
    """
    attributions = []
    # Get set of used video IDs from events
    used_video_ids = set()
    if video_events:
        for event in video_events:
            used_video_ids.add(event.video_id)
    for video_id, video_source in videos.items():
        if video_source.attribution:
            # Include if used in video or if it's a shared asset
            if video_id in used_video_ids or video_source.is_shared:
                attributions.append((video_id, video_source.attribution))
    return attributions
 def generate_description(
    config: ProjectConfig,
    manuscript_text: str,
    slides: dict[str, SlideDefinition],
    videos: dict[str, VideoSource],
    marker_timings: list,
    transcription: list[TranscribedWord] = None,
    video_events: list = None,
    citations: list[Citation] = None,
    include_chapters: bool = True,
    include_citations: bool = True,
    include_attributions: bool = True,
 ) -> str:
    """
    Generate complete YouTube description.
    Combines:
    - Video description from project.json
    - Chapter markers (optional)
    - Citations from manuscript (optional)
    - Stock footage attributions (optional)
    - Footer from project.json
    Returns formatted description text.
    """
    sections = []
    # 1. Video description
    if config.description:
        sections.append(config.description.strip())
    # 2. Chapters
    if include_chapters:
        chapters = generate_chapters(manuscript_text, slides, marker_timings)
        if chapters:
            chapter_lines = ["CHAPTERS", ""]
            for ch in chapters:
                chapter_lines.append(f"{_format_timestamp(ch.timestamp)} {ch.title}")
            sections.append("\n".join(chapter_lines))
    # 3. Citations/References
    if include_citations:
        citations = citations or []
        if citations and transcription:
            # Align citations to get timestamps
            for citation in citations:
                citation.timestamp = _align_citation_to_transcription(
                    citation, transcription, manuscript_text
                )
        if citations:
            ref_lines = ["REFERENCES", ""]
            for citation in citations:
                if citation.timestamp >= 0:
                    ref_lines.append(
                        f"{_format_timestamp(citation.timestamp)} - {citation.reference}"
                    )
                else:
                    ref_lines.append(f"- {citation.reference}")
            sections.append("\n".join(ref_lines))
    # 4. Stock footage attributions
    if include_attributions:
        attributions = collect_attributions(videos, video_events)
        if attributions:
            attr_lines = ["STOCK FOOTAGE", ""]
            for video_id, attr in attributions:
                # Format: "Description by Creator via Source: URL"
                line = f"{video_id.replace('_', ' ').title()} by {attr.creator} via {attr.source.title()}"
                if attr.url:
                    line += f": {attr.url}"
                attr_lines.append(line)
            sections.append("\n".join(attr_lines))
    # 5. Footer
    if config.footer:
        sections.append(config.footer.strip())
    # Join sections with double newlines
    return "\n\n".join(sections)
 def write_description_file(
    output_path: Path,
    config: ProjectConfig,
    manuscript_text: str,
    slides: dict[str, SlideDefinition],
    videos: dict[str, VideoSource],
    marker_timings: list,
    transcription: list[TranscribedWord] = None,
    video_events: list = None,
    citations: list[Citation] = None,
 ) -> str:
    """
    Generate and write YouTube description to file.
    Args:
        output_path: Path to write description (e.g., out/description_youtube.txt)
        config: Project configuration
        manuscript_text: Manuscript content
        slides: Slide definitions
        videos: Video definitions
        marker_timings: Aligned marker timings
        transcription: Word-level transcription (optional, for citation timestamps)
        video_events: Video events from render plan (optional, for attribution filtering)
        citations: Pre-extracted citations (optional, loaded from citations.json)
    Returns:
        The generated description text
    """
    description = generate_description(
        config=config,
        manuscript_text=manuscript_text,
        slides=slides,
        videos=videos,
        marker_timings=marker_timings,
        transcription=transcription,
        video_events=video_events,
        citations=citations,
    )
    # Ensure output directory exists
    output_path.parent.mkdir(parents=True, exist_ok=True)
    # Write description
    output_path.write_text(description, encoding="utf-8")
    return description
@@ -7,12 +7,14 @@ from typing import Optional
 class GnommoError(Exception):
    """Base exception for all GnommoEditor errors."""
    pass
@dataclass
 class ValidationIssue:
    """A single validation issue with location context."""
    message: str
    file: Optional[Path] = None
    line: Optional[int] = None
@@ -30,7 +32,9 @@ class ValidationIssue:
 class ParseError(GnommoError):
    """Error during parsing of input files."""
-    def __init__(self, message: str, file: Optional[Path] = None, line: Optional[int] = None):
+    def __init__(
        self, message: str, file: Optional[Path] = None, line: Optional[int] = None
    ):
        self.issue = ValidationIssue(message, file, line)
        super().__init__(str(self.issue))
@@ -48,7 +52,9 @@ class ValidationError(GnommoError):
 class RenderError(GnommoError):
    """Error during rendering stage."""
-    def __init__(self, message: str, command: Optional[str] = None, stderr: Optional[str] = None):
+    def __init__(
        self, message: str, command: Optional[str] = None, stderr: Optional[str] = None
    ):
        self.command = command
        self.stderr = stderr
        full_message = message
@@ -62,7 +68,13 @@ class RenderError(GnommoError):
 class PreprocessError(GnommoError):
    """Error during preprocessing stage."""
-    def __init__(self, message: str, filter_type: Optional[str] = None, command: Optional[str] = None, stderr: Optional[str] = None):
+    def __init__(
        self,
        message: str,
        filter_type: Optional[str] = None,
        command: Optional[str] = None,
        stderr: Optional[str] = None,
    ):
        self.filter_type = filter_type
        self.command = command
        self.stderr = stderr
@@ -0,0 +1,74 @@
 ObjC.import('stdlib');
 ObjC.import('Foundation');
 function toAbsolutePath(p) {
  // Expand ~ and make absolute relative to current working directory
  var s = $(String(p)).stringByExpandingTildeInPath;
  if (!s.isAbsolutePath) {
    var cwd = $.NSFileManager.defaultManager.currentDirectoryPath;
    s = cwd.stringByAppendingPathComponent(s);
  }
  return s.stringByStandardizingPath.js;
 }
 function fileExists(p) {
  return $.NSFileManager.defaultManager.fileExistsAtPath($(p));
 }
 function getNotes(slide) {
  try { return slide.presenterNotes(); } catch (e) {}
  try { return slide.speakerNotes(); } catch (e) {}
  return "";
 }
 function run(argv) {
  if (!argv || argv.length < 1) throw new Error("Usage: script.js <file.key> [slides_output_dir]");
  var abs = toAbsolutePath(argv[0]);
  var slidesDir = argv.length >= 2 ? toAbsolutePath(argv[1]) : null;
  if (!fileExists(abs)) {
    throw new Error("File not found: " + abs);
  }
  var Keynote = Application('Keynote');
  Keynote.activate();
  // Keynote is happiest when given a Path() made from an absolute POSIX path
  var doc = Keynote.open(Path(abs));
  // Export slides as PNG if output directory is provided
  if (slidesDir) {
    // Create directory if it doesn't exist
    var fm = $.NSFileManager.defaultManager;
    if (!fm.fileExistsAtPath($(slidesDir))) {
      fm.createDirectoryAtPathWithIntermediateDirectoriesAttributesError(
        $(slidesDir), true, $(), $()
      );
    }
    // Export using AppleScript (more reliable than JXA for Keynote export)
    var app = Application.currentApplication();
    app.includeStandardAdditions = true;
    // Build osascript command with proper escaping
    // Using multiple -e flags to avoid quoting issues
    var cmd = '/usr/bin/osascript' +
      ' -e \'tell application "Keynote"\'' +
      ' -e \'export front document to POSIX file "' + slidesDir + '" as slide images with properties {image format:PNG}\'' +
      ' -e \'end tell\'';
    app.doShellScript(cmd);
  }
  var slides = doc.slides();
  var out = [];
  for (var i = 0; i < slides.length; i++) {
    out.push({
      slide_index: i + 1,
      notes: String(getNotes(slides[i]) || "")
    });
  }
  doc.close({ saving: 'no' });
  return JSON.stringify(out, null, 2);
 }
@@ -0,0 +1,94 @@
 #!/usr/bin/env python3
 """
 Extract presenter notes from a Keynote .key file.
 Usage:
  python extract_keynote_notes.py path/to/deck.key --out notes.json
 Notes:
 - A .key file is a package (zip). The presenter notes live in an XML-ish file
  typically called index.apxl inside the package.
 - This script tries to be robust across minor format changes by searching for
  likely note fields.
 """
 import json
 import os
 import subprocess
 import argparse
 import json
 import os
 import re
 import shutil
 import tempfile
 import zipfile
 from pathlib import Path
 def write_manuscript(data: Path, out_path: Path):
    data = json.loads(
        data.read_text(encoding="utf-8")
    )  # list of {"slide_index": int, "notes": str}
    lines = []
    i = 0
    for item in data:
        print(f"Writing notes for slide {i} to file")
        idx = item.get("slide_index")
        notes = (item.get("notes") or "").rstrip()
        lines.append(f"[S{idx}]")
        lines.append(notes)
        lines.append("")  # blank line between slides
        i += 1
    out_path.write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")
    print(f"Wrote {out_path}")
 def main():
    keynote_file = Path("video1/video1.key").expanduser().resolve()
    if not keynote_file.exists():
        raise FileNotFoundError(f"Keynote file not found: {keynote_file}")
    script_file = Path("gnommo/extract_keynote_notes.js").expanduser().resolve()
    if not script_file.exists():
        raise FileNotFoundError(f"Extractor script not found: {script_file}")
    presenter_notes_json_file = Path("video1/manuscript.json").expanduser().resolve()
    # Run JXA extractor
    proc = subprocess.run(
        [
            "osascript",
            "-l",
            "JavaScript",
            str(script_file),
            str(keynote_file),
        ],
        capture_output=True,
        text=True,
    )
    if proc.returncode != 0:
        raise RuntimeError(
            "Failed to extract presenter notes:\n"
            f"STDERR:\n{proc.stderr}\n"
            f"STDOUT:\n{proc.stdout}"
        )
    # Write JSON output
    presenter_notes_json_file.write_text(proc.stdout, encoding="utf-8")
    if not presenter_notes_json_file.exists():
        raise FileNotFoundError(
            f"Failed to extract presenter notes to {presenter_notes_json_file}"
        )
    # Convert JSON → manuscript.txt
    write_manuscript(
        presenter_notes_json_file, out_path=keynote_file.parent / "manuscript.txt"
    )
 if __name__ == "__main__":
    main()
@@ -6,31 +6,64 @@ from typing import Optional
@dataclass
-class TalkingHeadConfig:
+class CutoutDefinition:
-    """Configuration for talking head video positioning."""
+    """Definition of a named zone for placing video content.
-    x: int
+
-    y: int
+    All positioning values support both pixels (int) and percentages (str like "50%").
-    target_height: int  # in pixels, or -1 for percentage-based
+    Percentage values are stored as floats (0.0-1.0) with pixel value set to -1.
-    target_height_percent: float = 0.0  # percentage (0.0-1.0) if target_height is -1
+
-    file: Optional[str] = None  # Path to video or metadata JSON file
+    Videos placed in cutouts are cropped to fit the cutout dimensions.
    """
    x: int  # in pixels, or -1 for percentage-based
    y: int  # in pixels, or -1 for percentage-based
    height: int  # in pixels, or -1 for percentage-based
    width: int = (
        -1
    )  # in pixels, or -1 for percentage-based (defaults to height for square)
    x_percent: float = 0.0  # percentage (0.0-1.0) if x is -1
    y_percent: float = 0.0  # percentage (0.0-1.0) if y is -1
    height_percent: float = 0.0  # percentage (0.0-1.0) if height is -1
    width_percent: float = 0.0  # percentage (0.0-1.0) if width is -1
 # Backwards compatibility alias
 TalkingHeadConfig = CutoutDefinition
@dataclass
 class ProjectConfig:
    """Global project configuration from project.json."""
    resolution: tuple[int, int]
    fps: int
    talking_head: TalkingHeadConfig
    default_slide_type: str
    cutouts: dict[str, CutoutDefinition] = field(
        default_factory=dict
    )  # Named zones for video placement
    background: str = ""  # Background image or video path (in shared_assets/)
    background_video: str = ""  # Deprecated: use background instead
    slides_path: str = "slides.json"  # path to slides.json relative to project
    videos_path: str = "videos.json"  # path to videos.json relative to project
    audio_path: str = "audio.json"  # path to audio.json relative to project
    audio_source: Optional[str] = None  # defaults to talking head
    main_video: Optional[str] = None  # ID of main video (e.g., talking head)
    gnommo_scratch: Optional[
        str
    ] = None  # directory for intermediate files (e.g., external SSD)
    # Outro sequence - plays after narration ends (not marker-triggered)
    outro: list[str] = field(
        default_factory=list
    )  # List of video IDs to play in sequence after narration
    # YouTube description fields
    description: str = ""  # Video description text for YouTube
    footer: str = ""  # Footer text (social links, subscribe CTA, etc.)
@dataclass
 class SlideDefinition:
    """Definition of a single slide from slides.json."""
    image: str
    type: str  # "fullscreen" | "square"
@@ -38,25 +71,170 @@ class SlideDefinition:
@dataclass
 class ChromaKeyConfig:
    """Configuration for chroma key (green screen) filter."""
    color: tuple[int, int, int] = (0, 255, 0)  # RGB color to key out
-    similarity: float = 0.15  # Color similarity threshold (0.0-1.0)
+    similarity: float = (
-    blend: float = 0.1  # Edge blend/feathering (0.0-1.0)
+        0.4  # Color similarity threshold (0.0-1.0), higher = more aggressive
-    spill: float = 0.0  # Spill suppression amount (0.0-1.0)
+    )
    blend: float = 0.08  # Edge blend/feathering (0.0-1.0), lower = tighter edges
    spill: float = 0.1  # Spill suppression amount (0.0-1.0)
    edge_erode: int = 0  # Pixels to erode from alpha edge (0-5), removes green fringe
    # Color protection - restore opacity for colors that shouldn't be keyed
    protect_color: tuple[int, int, int] = None  # RGB color to protect from keying
    protect_tolerance: float = (
        0.15  # How much variation from protect_color to allow (0-1)
    )
@dataclass
 class GnommoKeyConfig:
    """Configuration for gnommokey filter - Keylight-style color-difference keyer.
    Uses YCbCr color-difference keying (like Keylight/Ultimatte) instead of
    simple Euclidean distance. This handles lighting variation much better
    than basic chromakey.
    """
    # Screen color (the green/blue screen color to key out)
    screen_color: tuple[int, int, int] = (0, 177, 64)  # RGB of the screen
    # Key extraction strength (default 100, higher = more aggressive)
    # Values 80-150 are typical. Maps to Keylight's Screen Gain.
    screen_gain: float = 100.0
    # Balance between chrominance and luminance in key calculation (0-100)
    # 0 = pure color-difference, 100 = luminance weighted
    # Maps to Keylight's Screen Balance.
    screen_balance: float = 50.0
    # Alpha/matte adjustments
    clip_black: float = 0.0  # Crush blacks (0-100). Higher = more transparent areas
    clip_white: float = 100.0  # Crush whites (0-100). Lower = more opaque areas
    # Despill: color to shift green spill toward (RGB)
    # Typical values: skin tone [217, 200, 180] or neutral [200, 200, 200]
    despill_bias: tuple[int, int, int] = None
    # How aggressively to apply despill (0-1)
    despill_strength: float = 0.5
    # Alpha bias: influences edge treatment (RGB)
    # Can help with edge color contamination
    alpha_bias: tuple[int, int, int] = None
    # Edge refinement
    edge_erode: int = 0  # Pixels to erode from alpha edge (0-5)
    edge_soften: float = 0.0  # Blur the alpha edge (0-5 pixels)
@dataclass
 class ColorGradeConfig:
    """Configuration for color grading filter.
    Applies color balance, contrast curves, and saturation adjustments
    while preserving the alpha channel.
    """
    # Color balance (range: -1.0 to 1.0, 0 = no change)
    # Midtones
    rm: float = 0.0  # Red midtones adjustment
    gm: float = 0.0  # Green midtones adjustment
    bm: float = 0.0  # Blue midtones adjustment
    # Highlights
    rh: float = 0.0  # Red highlights adjustment
    gh: float = 0.0  # Green highlights adjustment
    bh: float = 0.0  # Blue highlights adjustment
    # Shadows
    rs: float = 0.0  # Red shadows adjustment
    gs: float = 0.0  # Green shadows adjustment
    bs: float = 0.0  # Blue shadows adjustment
    # Curves preset (none, lighter, darker, increase_contrast, medium_contrast, etc.)
    curves_preset: str = "none"
    # EQ adjustments
    contrast: float = 1.0  # Contrast multiplier (0.0-2.0, 1.0 = no change)
    brightness: float = 0.0  # Brightness adjustment (-1.0 to 1.0, 0 = no change)
    saturation: float = 1.0  # Saturation multiplier (0.0-3.0, 1.0 = no change)
    # Custom curves for lift/gamma/gain control
    # Format: "0/0 0.5/0.56 1/1" means (input/output) control points
    curves_r: str = ""  # Red channel curve
    curves_g: str = ""  # Green channel curve
    curves_b: str = ""  # Blue channel curve
    curves_master: str = ""  # Master (luminance) curve
@dataclass
 class AudioNormalizeConfig:
    """Configuration for audio normalization filter.
    Applies noise reduction, compression, and loudness normalization
    to improve audio quality and consistency.
    """
    # Noise reduction (afftdn filter)
    denoise: bool = True  # Enable noise reduction
    noise_floor: float = -25.0  # Noise floor in dB (default -25, lower = more aggressive)
    # Compression (acompressor filter)
    compress: bool = True  # Enable dynamic range compression
    threshold: float = -20.0  # Compression threshold in dB
    ratio: float = 4.0  # Compression ratio (4:1 default)
    attack: float = 5.0  # Attack time in ms
    release: float = 50.0  # Release time in ms
    makeup: float = 2.0  # Makeup gain in dB
    # Loudness normalization (loudnorm filter - EBU R128)
    normalize: bool = True  # Enable loudness normalization
    target_lufs: float = -16.0  # Target integrated loudness (YouTube recommends -14 to -16)
    target_lra: float = 11.0  # Target loudness range
    target_tp: float = -1.5  # Target true peak in dB
@dataclass
 class FilterConfig:
    """Base configuration for a preprocessing filter."""
    type: str
    # Type-specific config stored in subclasses or as dict
@dataclass
 class Attribution:
    """Attribution information for stock footage (e.g., Pexels)."""
    source: str  # Source platform (e.g., "pexels", "pixabay", "unsplash")
    creator: str  # Creator/photographer name
    url: Optional[str] = None  # URL to the original content
@dataclass
 class VideoSource:
    """Video source definition from videos.json."""
-    file: str
+
-    preprocess: list[dict] = field(default_factory=list)  # List of filter config dicts
+    source_file: str  # Source video filename (relative to videos.json location or shared_assets/)
-    output_file: Optional[str] = None  # Path to preprocessed output (if any)
+    filter: list[dict] = field(default_factory=list)  # List of filter config dicts
    output_file: Optional[
        str
    ] = None  # Path to preprocessed output (relative to videos.json)
    take: Optional[
        float
    ] = None  # Max duration to play (seconds). Default: until next slide or end of clip
    skip: float = 0.0  # Skip this many seconds at start of video (seek point)
    zoom: float = (
        1.0  # Scale factor for video (1.0 = fit to cutout height, >1 = enlarge)
    )
    cutout: Optional[
        str
    ] = None  # Name of cutout to place video in (from project.json cutouts)
    always_visible: bool = False  # If True, video is always shown (like talking head)
    is_shared: bool = False  # If True, source_file is relative to shared_assets/
    pause_narration: float = (
        0.0  # Seconds to pause narration during this video (0 = no pause)
    )
    attribution: Optional[Attribution] = None  # Attribution for stock footage
    use_audio_channels: str = "both"  # Audio channel selection: "both", "left", or "right"
@dataclass
@@ -67,50 +245,202 @@ class VideoMetadata:
    This allows defining preprocessing steps separately from videos.json,
    enabling per-video preprocessing configuration.
    """
    source_file: str  # Original source video file
    preprocess: list[dict] = field(default_factory=list)  # Preprocessing filters
-    output: Optional[dict] = None  # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
+    output: Optional[
-
+        dict
-
+    ] = None  # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
@dataclass
 class TimedWord:
    """A word or marker with its timestamp from transcript.csv."""
    time: float
    word: str
    @property
    def is_marker(self) -> bool:
        """Check if this is a slide marker like [S1]."""
        return self.word.startswith("[") and self.word.endswith("]")
    @property
    def marker_id(self) -> Optional[str]:
        """Extract marker ID (e.g., 'S1' from '[S1]')."""
        if self.is_marker:
            return self.word[1:-1]
        return None
@dataclass
 class SlideEvent:
    """A resolved slide event with timing information."""
    slide_id: str
    start_time: float
    end_time: float
    slide_def: SlideDefinition
@dataclass
 class AudioDefinition:
    """Definition of an audio clip from audio.json."""
    file: str  # Audio filename (relative to audio.json location)
    volume: float = 1.0  # Volume multiplier (0.0-1.0)
    loop: bool = False  # If True, loop for entire duration from trigger point
    ignore_pauses: bool = False  # If True, audio continues playing during narration pauses
@dataclass
 class Citation:
    """A citation extracted from manuscript.txt [cite:...] markers."""
    reference: str  # The literal reference text after cite:
    marker_id: str  # The full marker (e.g., "cite:Smith et al...")
    timestamp: float = -1.0  # Aligned timestamp (-1 if not aligned)
    context: str = ""  # Text following the citation for alignment
@dataclass
 class AudioEvent:
    """A resolved audio event with timing information."""
    audio_id: str
    start_time: float  # When to start playing (marker time - offset)
    audio_def: AudioDefinition
@dataclass
 class VideoEvent:
    """A resolved video event with timing information."""
    video_id: str
    start_time: float
    end_time: float
    video_source: "VideoSource"
    cutout: "CutoutDefinition"
@dataclass
 class CameraState:
    """State of the virtual camera at a point in time.
    The camera transforms the entire composed scene (background, slides, cutouts).
    This ensures all elements stay spatially synchronized when zooming/tilting.
    """
    zoom: float = 1.0  # 1.0 = 100%, 1.25 = 125%, etc.
    rotation: float = 0.0  # degrees, positive = clockwise
    pan_x: float = 0.0  # -1.0 to 1.0, percentage of frame width
    pan_y: float = 0.0  # -1.0 to 1.0, percentage of frame height
    focal_x: float = 0.5  # 0.0 to 1.0, zoom focal point X (0.5 = center)
    focal_y: float = 0.5  # 0.0 to 1.0, zoom focal point Y (0.5 = center)
    def __post_init__(self):
        # Clamp values to reasonable ranges
        self.zoom = max(0.5, min(3.0, self.zoom))
        self.rotation = max(-45.0, min(45.0, self.rotation))
        self.pan_x = max(-1.0, min(1.0, self.pan_x))
        self.pan_y = max(-1.0, min(1.0, self.pan_y))
        self.focal_x = max(0.0, min(1.0, self.focal_x))
        self.focal_y = max(0.0, min(1.0, self.focal_y))
    def is_default(self) -> bool:
        """Check if this is the default camera state (no transform)."""
        return (
            self.zoom == 1.0
            and self.rotation == 0.0
            and self.pan_x == 0.0
            and self.pan_y == 0.0
            and self.focal_x == 0.5
            and self.focal_y == 0.5
        )
@dataclass
 class CameraEvent:
    """A camera state change at a specific time.
    Camera events can be instant (duration=0) or animated (duration>0).
    When animated, the camera smoothly transitions from its current state
    to the target state over the specified duration using the easing function.
    """
    time: float  # timestamp in seconds
    target_state: CameraState
    duration: float = 0.2  # transition duration (0 = instant snap)
    easing: str = "ease-out"  # linear, ease-in, ease-out, ease-in-out
 # Camera effect presets - map marker names to camera states
 # Effect strengths are intentionally subtle for professional look
 CAMERA_PRESETS: dict[str, CameraState] = {
    # Zoom levels (halved for subtlety)
    "Zoom0": CameraState(zoom=1.0),
    "Zoom1": CameraState(zoom=1.05),
    "Zoom2": CameraState(zoom=1.125),
    "Zoom3": CameraState(zoom=1.25),
    # Tilt/rotation (halved)
    "TiltLeft": CameraState(rotation=-7.5),
    "TiltRight": CameraState(rotation=7.5),
    "NoTilt": CameraState(),  # Full reset to default state
    # Pan (halved)
    "PanLeft": CameraState(pan_x=-0.1),
    "PanRight": CameraState(pan_x=0.1),
    "PanUp": CameraState(pan_y=-0.075),
    "PanDown": CameraState(pan_y=0.075),
    "PanCenter": CameraState(pan_x=0.0, pan_y=0.0),
    # Reset all
    "Reset": CameraState(),
 }
@dataclass
 class NarrationPause:
    """A pause in the narration timeline for an interstitial video."""
    output_time: float  # When the pause starts in the OUTPUT timeline
    narration_time: float  # Where we are in the NARRATION source when pause starts
    duration: float  # How long the pause lasts
    video_id: str  # The video that plays during the pause
@dataclass
 class OutroEvent:
    """A video that plays as part of the outro sequence (after narration ends)."""
    video_id: str
    start_time: float  # When this outro video starts (in output timeline)
    end_time: float  # When this outro video ends
    video_source: "VideoSource"
    cutout: Optional["CutoutDefinition"] = None  # None = fullscreen
@dataclass
 class RenderPlan:
    """Complete plan for rendering the final video."""
    project_path: Path
    config: ProjectConfig
    talking_head: VideoSource
    slide_events: list[SlideEvent]
    total_duration: float
    slides: dict[str, SlideDefinition]
    videos: dict[str, VideoSource] = field(default_factory=dict)
    video_events: list[VideoEvent] = field(
        default_factory=list
    )  # Triggered video overlays
    narration_videos: list[tuple[str, VideoSource, CutoutDefinition]] = field(
        default_factory=list
    )  # (video_id, source, cutout)
    slides_dir: Path = None  # directory containing slide images
-    talking_head_path: Path = None  # Resolved path to actual video file
+    videos_dir: Path = None  # directory containing videos.json and video files
    audio_events: list[AudioEvent] = field(default_factory=list)
    audio: dict[str, AudioDefinition] = field(default_factory=dict)
    audio_dir: Path = None  # directory containing audio.json and audio files
    camera_events: list[CameraEvent] = field(
        default_factory=list
    )  # Virtual camera keyframes
    # Partial rendering support
    time_offset: float = (
        0.0  # Offset subtracted from all timestamps (for partial render)
    )
    initial_camera_state: "CameraState" = (
        None  # Camera state at render start (for partial render)
    )
    input_seek_time: float = 0.0  # Seek position for input videos (for partial render)
    # Shared assets support
    shared_assets_dir: Path = None  # Directory containing shared assets (pexels, etc.)
    # Narration pause support
    narration_pauses: list[NarrationPause] = field(
        default_factory=list
    )  # Gaps in narration for interstitial videos
    # Outro sequence (plays after narration ends)
    outro_events: list["OutroEvent"] = field(
        default_factory=list
    )  # Videos that play after narration ends
    narration_end_time: float = 0.0  # When narration ends (before outro starts)
 # Slide layout configurations (hardcoded for POC)
@@ -1,6 +1,5 @@
 """Extract stage: parse all input files."""
 import csv
 import json
 import re
 from pathlib import Path
@@ -8,21 +7,28 @@ from typing import Any, Optional
 from .errors import ParseError
 from .models import (
    Attribution,
    AudioDefinition,
    Citation,
    CutoutDefinition,
    ProjectConfig,
    SlideDefinition,
    TalkingHeadConfig,
    TimedWord,
    VideoMetadata,
    VideoSource,
 )
-def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int, str]]]:
+def parse_manuscript(
    project_path: Path,
 ) -> tuple[str, list[str], list[tuple[int, str]], list[Citation]]:
    """
    Parse manuscript.txt and extract text content and slide markers.
    Strips [cite:...] markers from the returned text so they never pollute
    alignment contexts. Citations are extracted and returned separately.
    Returns:
-        Tuple of (full text, list of marker IDs found, list of malformed markers as (line_num, text))
+        Tuple of (full text, list of marker IDs found, list of malformed markers, list of citations)
    """
    manuscript_path = project_path / "manuscript.txt"
@@ -31,8 +37,15 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
    text = manuscript_path.read_text(encoding="utf-8")
-    # Extract all valid slide markers like [S1], [S2], etc.
+    # Extract citations before stripping them
-    markers = re.findall(r"\[([A-Za-z0-9_]+)\]", text)
+    citations = parse_citations(text)
    # Strip [cite:...] markers from text so they don't pollute alignment
    text = re.sub(r"\[cite:[^\]]+\]", "", text)
    # Extract all valid markers like [S1], [video:demo], [Zoom2], etc.
    # Include . in pattern to catch markers with file extensions (so validator can warn about them)
    markers = re.findall(r"\[([A-Za-z0-9_:.]+)\]", text)
    # Find malformed markers (missing brackets, extra spaces, etc.)
    malformed: list[tuple[int, str]] = []
@@ -56,48 +69,75 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
        for match in spaced:
            malformed.append((line_num, match))
-    return text, markers, malformed
+    return text, markers, malformed, citations
-def parse_transcript(project_path: Path) -> list[TimedWord]:
+def parse_citations(manuscript_text: str) -> list[Citation]:
    """
-    Parse transcript.csv into a list of timed words.
+    Extract all [cite:...] markers from manuscript text.
-    Expected format:
+    The text after 'cite:' is the literal reference that should appear
-        t,word
+    in the video description.
-        0.00,This
+
-        0.42,is
+    Returns:
-        ...
+        List of Citation objects with reference text and context for alignment.
    """
-    transcript_path = project_path / "transcript.csv"
+    citations = []
-    if not transcript_path.exists():
+    # Match [cite:...] markers - content can include any characters except ]
-        raise ParseError("transcript.csv not found", transcript_path)
+    # Use a more permissive pattern that handles multi-word citations
    pattern = r"\[cite:([^\]]+)\]"
-    timed_words = []
+    for match in re.finditer(pattern, manuscript_text):
        reference = match.group(1).strip()
        marker_id = f"cite:{reference}"
-    with open(transcript_path, "r", encoding="utf-8") as f:
+        # Extract context: text following the citation (for alignment)
-        reader = csv.DictReader(f)
+        # Get up to 100 chars after the marker, stopping at next marker or newline
        end_pos = match.end()
        context_text = manuscript_text[end_pos : end_pos + 150]
-        if reader.fieldnames is None or "t" not in reader.fieldnames or "word" not in reader.fieldnames:
+        # Clean up context: take text until next marker or double newline
-            raise ParseError(
+        context_match = re.match(r"([^\[]*?)(?:\[|\n\n|$)", context_text)
-                "transcript.csv must have columns: t, word",
+        context = context_match.group(1).strip() if context_match else ""
-                transcript_path
+
        # Truncate context to ~50 chars for display
        if len(context) > 50:
            context = context[:47] + "..."
        citations.append(
            Citation(
                reference=reference,
                marker_id=marker_id,
                context=context,
            )
        )
-        for line_num, row in enumerate(reader, start=2):  # start=2 because line 1 is header
+    return citations
            try:
                time = float(row["t"])
                word = row["word"].strip()
                timed_words.append(TimedWord(time=time, word=word))
            except (ValueError, KeyError) as e:
                raise ParseError(
                    f"Invalid row: {e}",
                    transcript_path,
                    line_num
                )
-    return timed_words
+
 def save_citations(citations: list[Citation], path: Path) -> None:
    """Save citations to a JSON file."""
    data = [
        {"reference": c.reference, "context": c.context}
        for c in citations
    ]
    path.write_text(json.dumps(data, indent=2), encoding="utf-8")
 def load_citations(path: Path) -> list[Citation]:
    """Load citations from a JSON file."""
    if not path.exists():
        return []
    data = json.loads(path.read_text(encoding="utf-8"))
    return [
        Citation(
            reference=item["reference"],
            marker_id=f"cite:{item['reference']}",
            context=item.get("context", ""),
        )
        for item in data
    ]
 def parse_project_config(project_path: Path) -> ProjectConfig:
@@ -112,16 +152,27 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
    except json.JSONDecodeError as e:
        raise ParseError(f"Invalid JSON: {e}", config_path)
-    # Parse talking head config
+    # Parse cutouts (named zones for video placement)
-    th_data = data.get("talkinghead", {})
+    cutouts: dict[str, CutoutDefinition] = {}
-    th_height, th_height_pct = _parse_dimension(th_data.get("targetheight", 200))
+    cutouts_data = data.get("cutouts", {})
-    talking_head = TalkingHeadConfig(
+    for cutout_name, cutout_data in cutouts_data.items():
-        x=th_data.get("x", 100),
+        x, x_pct = _parse_dimension(cutout_data.get("x", 0))
-        y=th_data.get("y", 100),
+        y, y_pct = _parse_dimension(cutout_data.get("y", 0))
-        target_height=th_height,
+        height, height_pct = _parse_dimension(cutout_data.get("height", 200))
-        target_height_percent=th_height_pct,
+        # Width defaults to same as height (square) if not specified
-        file=th_data.get("file"),
+        width, width_pct = _parse_dimension(
-    )
+            cutout_data.get("width", cutout_data.get("height", 200))
        )
        cutouts[cutout_name] = CutoutDefinition(
            x=x,
            y=y,
            height=height,
            width=width,
            x_percent=x_pct,
            y_percent=y_pct,
            height_percent=height_pct,
            width_percent=width_pct,
        )
    # Parse resolution
    resolution = data.get("resolution", [1920, 1080])
@@ -131,12 +182,19 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
    return ProjectConfig(
        resolution=tuple(resolution),
        fps=data.get("fps", 30),
        talking_head=talking_head,
        default_slide_type=data.get("defaultSlideType", "square"),
        cutouts=cutouts,
        background=data.get("background", ""),
        background_video=data.get("background_video", ""),  # Deprecated
        slides_path=data.get("slides", "slides.json"),
        videos_path=data.get("videos", "videos.json"),
        audio_path=data.get("audio", "audio.json"),
        audio_source=data.get("audio_source"),
        main_video=data.get("main_video"),
        gnommo_scratch=data.get("gnommo_scratch"),
        outro=data.get("outro", []),
        description=data.get("description", ""),
        footer=data.get("footer", ""),
    )
@@ -157,7 +215,9 @@ def _parse_dimension(value: Any) -> tuple[int, float]:
    return 200, 0.0  # default
-def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str, SlideDefinition]:
+def parse_slides(
    project_path: Path, config: ProjectConfig = None
 ) -> dict[str, SlideDefinition]:
    """Parse slides.json into slide definitions."""
    if config and config.slides_path:
        slides_path = project_path / config.slides_path
@@ -176,8 +236,7 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
    for slide_id, slide_data in data.items():
        if "image" not in slide_data:
            raise ParseError(
-                f"Slide '{slide_id}' missing required field 'image'",
+                f"Slide '{slide_id}' missing required field 'image'", slides_path
                slides_path
            )
        slides[slide_id] = SlideDefinition(
            image=slide_data["image"],
@@ -187,12 +246,67 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
    return slides
-def parse_videos(project_path: Path) -> dict[str, VideoSource]:
+def parse_audio(
-    """Parse videos.json into video source definitions."""
+    project_path: Path, config: Optional[ProjectConfig] = None
-    videos_path = project_path / "videos.json"
+) -> tuple[dict[str, AudioDefinition], Path]:
    """
    Parse audio.json into audio definitions.
    Returns:
        Tuple of (audio dict, audio_dir) where audio_dir is the directory
        containing audio.json (for resolving relative file paths).
    """
    if config and config.audio_path:
        audio_path = project_path / config.audio_path
    else:
        audio_path = project_path / "audio.json"
    # Audio is optional - return empty dict if not found
    if not audio_path.exists():
        return {}, project_path
    audio_dir = audio_path.parent
    try:
        data = json.loads(audio_path.read_text(encoding="utf-8"))
    except json.JSONDecodeError as e:
        raise ParseError(f"Invalid JSON: {e}", audio_path)
    audio = {}
    for audio_id, audio_data in data.items():
        if "file" not in audio_data:
            raise ParseError(
                f"Audio '{audio_id}' missing required field 'file'", audio_path
            )
        audio[audio_id] = AudioDefinition(
            file=audio_data["file"],
            volume=float(audio_data.get("volume", 1.0)),
            loop=bool(audio_data.get("loop", False)),
            ignore_pauses=bool(audio_data.get("ignore_pauses", False)),
        )
    return audio, audio_dir
 def parse_videos(
    project_path: Path, config: Optional[ProjectConfig] = None
 ) -> tuple[dict[str, VideoSource], Path]:
    """
    Parse videos.json into video source definitions.
    Returns:
        Tuple of (videos dict, videos_dir) where videos_dir is the directory
        containing videos.json (for resolving relative file paths).
    """
    if config and config.videos_path:
        videos_path = project_path / config.videos_path
    else:
        videos_path = project_path / "videos.json"
    if not videos_path.exists():
-        raise ParseError("videos.json not found", videos_path)
+        raise ParseError(f"videos.json not found: {videos_path}", videos_path)
    videos_dir = videos_path.parent
    try:
        data = json.loads(videos_path.read_text(encoding="utf-8"))
@@ -201,18 +315,37 @@ def parse_videos(project_path: Path) -> dict[str, VideoSource]:
    videos = {}
    for video_id, video_data in data.items():
-        if "file" not in video_data:
+        if "source_file" not in video_data:
            raise ParseError(
-                f"Video '{video_id}' missing required field 'file'",
+                f"Video '{video_id}' missing required field 'source_file'", videos_path
                videos_path
            )
        # Parse attribution if present
        attribution = None
        if "attribution" in video_data:
            attr_data = video_data["attribution"]
            attribution = Attribution(
                source=attr_data.get("source", "unknown"),
                creator=attr_data.get("creator", "Unknown"),
                url=attr_data.get("url"),
            )
        videos[video_id] = VideoSource(
-            file=video_data["file"],
+            source_file=video_data["source_file"],
-            preprocess=video_data.get("preprocess", []),
+            filter=video_data.get("filter", []),
            output_file=video_data.get("output_file"),
            take=video_data.get("take"),
            skip=video_data.get("skip", 0.0),
            zoom=video_data.get("zoom", 1.0),
            cutout=video_data.get("cutout"),
            always_visible=video_data.get("always_visible", False),
            is_shared=video_data.get("is_shared", False),
            pause_narration=float(video_data.get("pause_narration", 0)),
            attribution=attribution,
            use_audio_channels=video_data.get("use_audio_channels", "both"),
        )
-    return videos
+    return videos, videos_dir
 def get_video_duration(video_path: Path) -> float:
@@ -221,10 +354,13 @@ def get_video_duration(video_path: Path) -> float:
    cmd = [
        "ffprobe",
-        "-v", "error",
+        "-v",
-        "-show_entries", "format=duration",
+        "error",
-        "-of", "default=noprint_wrappers=1:nokey=1",
+        "-show_entries",
-        str(video_path)
+        "format=duration",
        "-of",
        "default=noprint_wrappers=1:nokey=1",
        str(video_path),
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
@@ -261,7 +397,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
        raise ParseError(f"Invalid JSON: {e}", metadata_path)
    if "source_file" not in data:
-        raise ParseError("Video metadata missing required field 'source_file'", metadata_path)
+        raise ParseError(
            "Video metadata missing required field 'source_file'", metadata_path
        )
    return VideoMetadata(
        source_file=data["source_file"],
@@ -270,7 +408,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
    )
-def resolve_video_file(project_path: Path, file_ref: str) -> tuple[Path, Optional[VideoMetadata]]:
+def resolve_video_file(
    project_path: Path, file_ref: str
 ) -> tuple[Path, Optional[VideoMetadata]]:
    """
    Resolve a video file reference, which can be either:
    1. A direct path to a video file
@@ -11,6 +11,7 @@ from .errors import GnommoError
@dataclass
 class TranscribedWord:
    """A word with its timestamp from transcription."""
    word: str
    start: float
    end: float
@@ -18,6 +19,7 @@ class TranscribedWord:
 class TranscriptionError(GnommoError):
    """Error during transcription."""
    pass
@@ -57,21 +59,20 @@ def transcribe_video(video_path: Path, model: str = "base") -> list[TranscribedW
    for segment in result.get("segments", []):
        for word_info in segment.get("words", []):
-            words.append(TranscribedWord(
+            words.append(
-                word=word_info["word"].strip(),
+                TranscribedWord(
-                start=word_info["start"],
+                    word=word_info["word"].strip(),
-                end=word_info["end"],
+                    start=word_info["start"],
-            ))
+                    end=word_info["end"],
                )
            )
    return words
 def save_transcript(words: list[TranscribedWord], output_path: Path) -> None:
    """Save transcribed words to a JSON file."""
-    data = [
+    data = [{"word": w.word, "start": w.start, "end": w.end} for w in words]
        {"word": w.word, "start": w.start, "end": w.end}
        for w in words
    ]
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2)
@@ -86,6 +87,5 @@ def load_transcript(transcript_path: Path) -> list[TranscribedWord]:
        data = json.load(f)
    return [
-        TranscribedWord(word=w["word"], start=w["start"], end=w["end"])
+        TranscribedWord(word=w["word"], start=w["start"], end=w["end"]) for w in data
        for w in data
    ]
@@ -3,7 +3,13 @@
 from pathlib import Path
 from .errors import ValidationError, ValidationIssue
-from .models import ProjectConfig, SlideDefinition, VideoSource, SLIDE_LAYOUTS
+from .models import (
    ProjectConfig,
    SlideDefinition,
    VideoSource,
    SLIDE_LAYOUTS,
    CAMERA_PRESETS,
 )
 def validate_project(
@@ -12,6 +18,7 @@ def validate_project(
    config: ProjectConfig,
    slides: dict[str, SlideDefinition],
    videos: dict[str, VideoSource],
    videos_dir: Path,
    malformed_markers: list[tuple[int, str]] = None,
 ) -> None:
    """
@@ -30,19 +37,59 @@ def validate_project(
    # Check for malformed markers first (these are likely typos)
    if malformed_markers:
        for line_num, marker_text in malformed_markers:
-            issues.append(ValidationIssue(
+            issues.append(
-                f"Malformed marker: {marker_text}",
+                ValidationIssue(
-                project_path / "manuscript.txt",
+                    f"Malformed marker: {marker_text}",
-                line_num
+                    project_path / "manuscript.txt",
-            ))
+                    line_num,
                )
            )
-    # Check all manuscript markers have corresponding slides
+    # Check all manuscript markers have corresponding slides or videos
    for marker in manuscript_markers:
        # Skip camera effect markers (Zoom0, TiltLeft, Reset, etc.)
        if marker in CAMERA_PRESETS:
            continue
        # Skip audio markers (start with 'A' followed by audio id, e.g., Awoosh)
        if marker.startswith("A") and len(marker) > 1 and marker[1:].isalnum():
            continue
        # Validate video trigger markers (video:xxx) - slide-like videos
        if marker.startswith("video:"):
            video_id = marker[6:]  # Remove 'video:' prefix
            if video_id not in videos:
                # Check if it's a file extension mismatch
                hint = ""
                if "." in video_id:
                    base_name = video_id.rsplit(".", 1)[0]
                    if base_name in videos:
                        hint = f" (Did you mean [video:{base_name}]? Don't include file extensions in markers)"
                issues.append(
                    ValidationIssue(
                        f"Video marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json{hint}",
                        project_path / "manuscript.txt",
                    )
                )
            continue
        # Validate narration trigger markers (narration:xxx) - continuous videos
        if marker.startswith("narration:"):
            video_id = marker[10:]  # Remove 'narration:' prefix
            if video_id not in videos:
                issues.append(
                    ValidationIssue(
                        f"Narration marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json",
                        project_path / "manuscript.txt",
                    )
                )
            continue
        if marker not in slides:
-            issues.append(ValidationIssue(
+            issues.append(
-                f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
+                ValidationIssue(
-                project_path / "manuscript.txt"
+                    f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
-            ))
+                    project_path / "manuscript.txt",
                )
            )
    # Check all slide images exist
    # Slides are in the same directory as the slides.json file
@@ -52,37 +99,68 @@ def validate_project(
    for slide_id, slide_def in slides.items():
        image_path = slides_dir / slide_def.image
        if not image_path.exists():
-            issues.append(ValidationIssue(
+            issues.append(
-                f"Slide image not found: {slide_def.image}",
+                ValidationIssue(
-                slides_json_path
+                    f"Slide image not found: {slide_def.image}", slides_json_path
-            ))
+                )
            )
        # Check slide type is valid
        if slide_def.type not in SLIDE_LAYOUTS:
-            issues.append(ValidationIssue(
+            issues.append(
-                f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
+                ValidationIssue(
-                f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
+                    f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
-                project_path / "slides.json"
+                    f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
-            ))
+                    project_path / "slides.json",
                )
            )
    # Check all video files exist (paths relative to videos_dir or shared_assets)
    videos_json_path = project_path / config.videos_path
    # Find shared_assets directory
    shared_assets_dir = None
    if (project_path / "shared_assets").exists():
        shared_assets_dir = project_path / "shared_assets"
    elif (project_path.parent / "shared_assets").exists():
        shared_assets_dir = project_path.parent / "shared_assets"
    # Check all video files exist
    for video_id, video_source in videos.items():
-        video_path = project_path / video_source.file
+        # Determine base directory based on is_shared flag
-        if not video_path.exists():
+        if video_source.is_shared:
-            issues.append(ValidationIssue(
+            if shared_assets_dir:
-                f"Video file not found: {video_source.file}",
+                base_dir = shared_assets_dir
-                project_path / "videos.json"
+            else:
-            ))
+                issues.append(
                    ValidationIssue(
                        f"Video '{video_id}' has is_shared=true but shared_assets directory not found",
                        videos_json_path,
                    )
                )
                continue
        else:
            base_dir = videos_dir
-        # Check preprocessed output exists if preprocessing is defined
+        video_path = base_dir / video_source.source_file
-        if video_source.preprocess and video_source.output_file:
+        if not video_path.exists():
-            output_path = project_path / video_source.output_file
+            issues.append(
                ValidationIssue(
                    f"Video file not found: {video_source.source_file}",
                    videos_json_path,
                )
            )
        # Check preprocessed output exists if filters are defined
        if video_source.filter and video_source.output_file:
            output_path = base_dir / video_source.output_file
            if not output_path.exists():
-                issues.append(ValidationIssue(
+                issues.append(
-                    f"Preprocessed output not found: {video_source.output_file}. "
+                    ValidationIssue(
-                    f"Run with -a preprocess first.",
+                        f"Preprocessed output not found: {video_source.output_file}. "
-                    project_path / "videos.json"
+                        f"Run with -a preprocess first.",
-                ))
+                        videos_json_path,
                    )
                )
    # Check background exists (image or video)
    # Try 'background' first, fall back to deprecated 'background_video'
@@ -94,38 +172,45 @@ def validate_project(
            # Try parent directory (shared_assets at repo root)
            bg_path = project_path.parent / bg_file
        if not bg_path.exists():
-            issues.append(ValidationIssue(
+            issues.append(
-                f"Background not found: {bg_file}",
+                ValidationIssue(
-                project_path / "project.json"
+                    f"Background not found: {bg_file}", project_path / "project.json"
-            ))
+                )
            )
    # Check we have at least one video source
    if not videos:
-        issues.append(ValidationIssue(
+        issues.append(
-            "No video sources defined in videos.json",
+            ValidationIssue(
-            project_path / "videos.json"
+                "No video sources defined in videos.json", project_path / "videos.json"
-        ))
+            )
        )
    # Check resolution is reasonable
    width, height = config.resolution
    if width < 100 or height < 100:
-        issues.append(ValidationIssue(
+        issues.append(
-            f"Resolution too small: {width}x{height}",
+            ValidationIssue(
-            project_path / "project.json"
+                f"Resolution too small: {width}x{height}", project_path / "project.json"
-        ))
+            )
        )
    if width > 7680 or height > 4320:
-        issues.append(ValidationIssue(
+        issues.append(
-            f"Resolution too large: {width}x{height} (max 8K)",
+            ValidationIssue(
-            project_path / "project.json"
+                f"Resolution too large: {width}x{height} (max 8K)",
-        ))
+                project_path / "project.json",
            )
        )
    # Check FPS is reasonable
    if config.fps < 1 or config.fps > 120:
-        issues.append(ValidationIssue(
+        issues.append(
-            f"Invalid FPS: {config.fps} (must be 1-120)",
+            ValidationIssue(
-            project_path / "project.json"
+                f"Invalid FPS: {config.fps} (must be 1-120)",
-        ))
+                project_path / "project.json",
            )
        )
    # If any issues, raise ValidationError
    if issues:
@@ -0,0 +1,6 @@
 import gnommo
 if __name__ == "__main__":
    print("This is the main module.")
    gnommo.main()
@@ -0,0 +1,2 @@
 openai-whisper
@@ -0,0 +1,476 @@
 # Gnommo Feature Development Roadmap
 ## Overview
 Features to standardize the Keynote-to-YouTube workflow, so that once the presentation is complete, only a standardized recording session stands between you and a finished video.
 ---
 ## 1. Video Description Generator
 **Command:** `gnommo -p <project> description`
 Generate a complete YouTube description with citations, attributions, and chapters.
 ---
 ### 1.1 Manuscript Citations (`[cite:...]`)
 Citations embedded in the manuscript represent sources, references, or links mentioned during narration. The text after `cite:` is the **literal reference** that should appear in the description.
 **Format in manuscript.txt:**
 ```
 [cite:Reference text exactly as it should appear]
 ```
 **Examples:**
 ```
 [S3]
 According to this study [cite:Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper],
 the effect is significant.
 [S7]
 I'm using [cite:Keynote by Apple - https://apple.com/keynote] for all my presentations.
 [S12]
 This technique was pioneered by [cite:Dr. Jane Doe, MIT Media Lab].
 ```
 **Output in description:**
 ```
 SOURCES & REFERENCES
 ━━━━━━━━━━━━━━━━━━━━
 1:23 - Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper
 4:56 - Keynote by Apple - https://apple.com/keynote
 8:30 - Dr. Jane Doe, MIT Media Lab
 ```
 **Requirements:**
 - Parse `[cite:...]` markers from manuscript.txt
 - Extract the literal text after `cite:` as the reference
 - Align citations to timestamps (same fuzzy matching as other markers)
 - Group citations in order of appearance
 - Citations are NOT aligned for rendering (ignored by renderer) but ARE timestamped for description
 **Note:** `[cite:...]` markers should not affect video rendering or narration alignment - they are metadata-only markers for description generation.
 ---
 ### 1.2 Pexels/Stock Footage Attribution
 Attribution for Pexels content is **not legally required** but is appreciated and professional.
 **Official Pexels attribution format:**
 ```
 by [Contributor Name] via Pexels
 ```
 **Implementation:**
 - Extend `videos.json` to include attribution metadata:
  ```json
  {
    "beach_waves": {
      "source_file": "pexels/beach.mp4",
      "is_shared": true,
      "attribution": {
        "source": "pexels",
        "creator": "John Doe",
        "url": "https://pexels.com/video/12345"
      }
    }
  }
  ```
 - Auto-detect Pexels videos from `shared_assets/pexels/` folder
 - Support Pexels metadata JSON files (if downloaded with video)
 - Generate attribution section for video description:
  ```
  STOCK FOOTAGE
  ━━━━━━━━━━━━━
  Beach waves by John Doe via Pexels: https://pexels.com/video/12345
  City timelapse by Jane Smith via Pexels: https://pexels.com/video/67890
  ```
 **Pexels License Notes** (from pexels.com/license):
 - Free for personal and commercial use
 - Attribution not required but appreciated
 - Cannot sell unaltered copies
 - Cannot redistribute on other stock platforms
 ### 1.3 Complete Description Output
 **Output file:** `out/description_youtube.txt`
 Combine all elements into a ready-to-paste YouTube description.
 **Structure:**
 ```
 [Video description from project.json "description" field]
 CHAPTERS
 ━━━━━━━━
 0:00 Introduction
 1:23 Topic One
 3:45 Topic Two
 ...
 REFERENCES
 ━━━━━━━━━━
 1:23 - Smith et al. (2024) "AI Study" - https://example.com
 4:56 - Keynote by Apple - https://apple.com/keynote
 ...
 STOCK FOOTAGE
 ━━━━━━━━━━━━━
 Beach waves by John Doe via Pexels: https://pexels.com/video/12345
 ...
 [Optional footer from project.json "footer" field - social links, subscribe CTA, etc.]
 ```
 **project.json additions:**
 ```json
 {
  "description": "In this video, I walk through the complete Gnommo workflow for creating YouTube videos from Keynote presentations.",
  "footer": "Subscribe for more tutorials: https://youtube.com/@channel\nTwitter: https://twitter.com/handle"
 }
 ```
 **Requirements:**
 - Pull video description from `project.json` "description" field
 - Generate chapters from slide markers (see Section 2)
 - Collect all `[cite:...]` references with timestamps
 - Collect all Pexels/stock attributions from `videos.json`
 - Append optional footer from `project.json` "footer" field
 - Output to `out/description_youtube.txt`
 - Sections with no content are omitted (e.g., no STOCK FOOTAGE section if none used)
 ---
 ## 2. YouTube Chapter Markers
 **Command:** `gnommo -p <project> chapters`
 Auto-generate chapter timestamps from slide markers.
 **Requirements:**
 - Extract chapter titles from:
  - Keynote slide titles (via presenter notes import)
  - First sentence after each `[SN]` marker
  - Optional `[chapter:Title]` markers for explicit chapter names
 - Calculate timestamps from aligned marker timings
 - Output copy-paste ready format:
  ```
  CHAPTERS
  ━━━━━━━━
  0:00 Introduction
  1:23 What is Gnommo?
  3:45 Setting Up Your Project
  7:12 Recording Tips
  10:30 Rendering Your Video
  12:45 Outro
  ```
 - Option to merge small chapters (minimum duration threshold)
 - Support for nested chapters (main topics + subtopics)
 ---
 ## 3. Subtitle/Caption Export
 **Command:** `gnommo -p <project> subtitles`
 Generate subtitle files from Whisper transcription.
 **Requirements:**
 - Export formats: SRT, VTT, TXT
 - Use existing word-level timestamps from transcription
 - Smart line breaking (max characters per line, break at punctuation)
 - Speaker diarization support (future: multiple speakers)
 - Options:
  - `--format srt|vtt|txt`
  - `--max-chars 42` (characters per line)
  - `--max-duration 5` (seconds per subtitle block)
 **Example output (SRT):**
 ```
 1
 00:00:01,500 --> 00:00:04,200
 Hello and welcome to this tutorial
 on video editing with Gnommo.
 2
 00:00:04,500 --> 00:00:07,800
 Today we're going to cover
 the complete workflow.
 ```
 ---
 ## 4. Thumbnail Generation
 **Command:** `gnommo -p <project> thumbnail`
 Auto-generate thumbnail candidates from slides.
 **Requirements:**
 - Designate thumbnail slides with `[thumbnail]` marker
 - If no marker, use slide 1 or title slide
 - Apply text overlays from config:
  ```json
  {
    "thumbnail": {
      "title_text": "Episode ${episode_number}",
      "subtitle_text": "${title}",
      "font": "Impact",
      "text_color": "#FFFFFF",
      "outline_color": "#000000",
      "position": "bottom-left"
    }
  }
  ```
 - Generate multiple variants:
  - With/without text overlay
  - Different zoom levels
  - Different color treatments (saturated, high contrast)
 - Output to `out/thumbnails/` folder
 - Resolution: 1280x720 (YouTube standard)
 ---
 ## 5. Intro/Outro Templates
 **Configuration in project.json:**
 ```json
 {
  "intro": {
    "template": "templates/intro_v2.mp4",
    "duration": 3.5,
    "transition": "fade",
    "variables": {
      "episode_number": "12",
      "title": "Getting Started with Gnommo"
    }
  },
  "outro": {
    "template": "templates/outro_subscribe.mp4",
    "duration": 8.0,
    "transition": "fade"
  }
 }
 ```
 **Requirements:**
 - Define intro/outro templates in `shared_assets/templates/`
 - Auto-prepend intro before first slide
 - Auto-append outro after last slide
 - Support variable substitution in templates (episode number, title)
 - Configurable transition types (fade, cut, wipe)
 - End screen safe zone support (last 20 seconds)
 ---
 ## 6. Multi-Platform Format Presets
 **Command:** `gnommo -p <project> render --format <preset>`
 **Presets:**
 | Preset | Aspect | Resolution | Notes |
 |--------|--------|------------|-------|
 | `youtube` | 16:9 | 1920x1080 | Default, standard horizontal |
 | `youtube-4k` | 16:9 | 3840x2160 | 4K export |
 | `shorts` | 9:16 | 1080x1920 | Vertical, auto-reframe slides |
 | `podcast` | - | Audio only | MP3/M4A export for podcast feeds |
 | `square` | 1:1 | 1080x1080 | Instagram/LinkedIn |
 **Requirements:**
 - Auto-adjust cutout positions per format
 - Smart slide reframing for vertical (zoom to content area)
 - Separate output folders per format
 - Batch export to multiple formats: `--format youtube,shorts,podcast`
 ---
 ## 7. Teleprompter Script Generation
 **Command:** `gnommo -p <project> teleprompter`
 Extract clean narration text for teleprompter display.
 **Requirements:**
 - Strip all markers from manuscript
 - Keep only spoken text
 - Output formats:
  - `--format txt` - Plain text
  - `--format html` - Scrollable HTML page with large font
  - `--format json` - For teleprompter apps
 - Optional: Include slide thumbnails as visual cues
 - Configurable font size and scroll speed hints
 **Example HTML output:**
 ```html
 <div class="teleprompter">
  <p class="cue">[SLIDE: Introduction]</p>
  <p>Hello and welcome to this tutorial on video editing with Gnommo.</p>
  <p class="cue">[SLIDE: What is Gnommo?]</p>
  <p>Gnommo is a code-first video editing pipeline...</p>
 </div>
 ```
 ---
 ## 8. Recording Checklist Generator
 **Command:** `gnommo -p <project> checklist`
 Generate a pre-recording checklist based on project configuration.
 **Output includes:**
 - [ ] Camera settings (resolution, fps from project.json)
 - [ ] Lighting setup (if green screen detected in videos.json)
 - [ ] Audio check (microphone levels)
 - [ ] Props/demos needed (parsed from `[video:...]` markers)
 - [ ] Slide count and estimated duration
 - [ ] Teleprompter ready
 - [ ] Recording space clear
 **Customizable via `checklist_template.md` in project folder.**
 ---
 ## 9. Audio Normalization
 **Automatic during render or standalone command:**
 `gnommo -p <project> normalize`
 **Requirements:**
 - Target: -14 LUFS (YouTube standard)
 - Apply loudness normalization to narration track
 - Preserve dynamic range (avoid over-compression)
 - Normalize intro/outro audio to match
 - Option: `--target-lufs -14`
 **Implementation:**
 - Use FFmpeg `loudnorm` filter
 - Two-pass normalization for accurate results
 - Report before/after levels
 ---
 ## 10. Project Templates
 **Command:** `gnommo init <project-name> --template <template>`
 **Built-in templates:**
 | Template | Description |
 |----------|-------------|
 | `tutorial` | Talking head + slides, square slide layout |
 | `explainer` | Full-screen slides, minimal presenter |
 | `review` | Product review format, multiple camera angles |
 | `talking-head` | Full-screen presenter, no slides |
 | `screencast` | Screen recording with small presenter PIP |
 **Requirements:**
 - Templates stored in `~/.gnommo/templates/` or `shared_assets/templates/`
 - Each template includes:
  - `project.json` with preset cutouts and settings
  - `manuscript.txt` skeleton with example markers
  - Sample `videos.json` structure
 - User can create custom templates: `gnommo template save <name>`
 ---
 ## 11. Batch Processing
 **Command:** `gnommo batch render project1 project2 project3`
 **Requirements:**
 - Process multiple projects in sequence
 - Continue on failure (don't stop batch for one failed project)
 - Summary report at end:
  ```
  BATCH COMPLETE
  ━━━━━━━━━━━━━━
  ✓ project1 - rendered in 5:23
  ✓ project2 - rendered in 4:17
  ✗ project3 - failed (missing slide S12)
  ```
 - Options:
  - `--parallel 2` - Run N renders in parallel
  - `--skip-existing` - Skip if `out/final.mp4` exists
  - `--format youtube,shorts` - Render all formats for each project
 ---
 ## 12. Progress Dashboard
 **Command:** `gnommo status` or `gnommo -p <project> status`
 Display pipeline status for all projects or specific project.
 **Output:**
 ```
 PROJECT STATUS
 ━━━━━━━━━━━━━━
 Project     Import  Preprocess  Transcribe  Render   Output
 ─────────────────────────────────────────────────────────────
 video1      ✓       ✓           ✓           ✓        final.mp4 (12:34)
 video2      ✓       ✓           ✓           ✗        -
 video3      ✓       ✗           -           -        -
 video4      ✗       -           -           -        -
 ```
 **Requirements:**
 - Scan all project directories
 - Check for existence of intermediate files
 - Show file timestamps and durations
 - Highlight what needs to be done next
 ---
 ## 13. Recording Session Mode (Future)
 **Command:** `gnommo -p <project> session`
 Live recording assistant mode.
 **Features:**
 - Display current slide on secondary monitor
 - Show teleprompter text overlay
 - Keyboard shortcuts to advance slides
 - Real-time recording with proper settings
 - Auto-stop at end of manuscript
 - Voice command support: "next slide", "pause"
 **Note:** This is a stretch goal requiring significant UI work.
 ---
 ## Implementation Priority
 ### Phase 1 - Core YouTube Workflow (High Impact)
 1. **Video Description Generator** (citations + Pexels attribution)
 2. **YouTube Chapter Markers**
 3. **Subtitle/Caption Export**
 4. **Audio Normalization**
 ### Phase 2 - Content Creation Efficiency
 5. **Thumbnail Generation**
 6. **Intro/Outro Templates**
 7. **Teleprompter Script Generation**
 8. **Recording Checklist Generator**
 ### Phase 3 - Scale & Automation
 9. **Project Templates**
 10. **Multi-Platform Format Presets**
 11. **Batch Processing**
 12. **Progress Dashboard**
 ### Phase 4 - Advanced
 13. **Recording Session Mode**
 ---
 ## Notes
 - All new commands should follow existing CLI pattern: `gnommo -p <project> <command>`
 - Output files go to `out/` subdirectory by default
 - All features should support `--dry-run` where applicable
 - Verbose mode (`-v`) should show detailed progress
		`@@ -0,0 +1,2 @@`
							`file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/talking_head_batch0.mov'`
							`file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/segments/segment_0002.mov'`