gnommo/docs/virtual-camera-effects.md

# Virtual Camera Effects

Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
These effects are triggered by markers in the manuscript, just like slides.

## Zoom Effects

| Marker | Description |
|--------|-------------|
| `[Zoom1]` | Zoom to 110% - subtle emphasis |
| `[Zoom2]` | Zoom to 125% - moderate emphasis |
| `[Zoom3]` | Zoom to 150% - strong emphasis |
| `[Zoom0]` | Return to 100% (default) |
| `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |

**Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.

## Tilt/Rotation Effects

| Marker | Description |
|--------|-------------|
| `[TiltLeft]` | Rotate -15 degrees |
| `[TiltRight]` | Rotate +15 degrees |
| `[NoTilt]` | Return to 0 degrees |
| `[TiltShake]` | Quick left-right shake (confusion/emphasis) |

**Use case:** Tilt when saying something "off" or wrong, return to flat for correction.

## Pan/Position Effects

| Marker | Description |
|--------|-------------|
| `[PanLeft]` | Shift frame left (subject moves right) |
| `[PanRight]` | Shift frame right (subject moves left) |
| `[PanUp]` | Shift frame up |
| `[PanDown]` | Shift frame down |
| `[PanCenter]` | Return to center |

**Use case:** Pan to make room for a slide appearing on one side.

## Shake/Movement Effects

| Marker | Description |
|--------|-------------|
| `[Shake]` | Brief screen shake (impact, surprise) |
| `[ShakeHard]` | Intense shake (explosion, error) |
| `[Wobble]` | Gentle continuous wobble |
| `[NoWobble]` | Stop wobble |

**Use case:** Shake on "WRONG!" or when something crashes/fails.

## Speed/Rhythm Effects

| Marker | Description |
|--------|-------------|
| `[Beat]` | Single visual pulse (scale bump) |
| `[BeatStart]` | Start pulsing to rhythm |
| `[BeatStop]` | Stop pulsing |

**Use case:** Rhythmic emphasis during lists or key points.

## Transition Effects

| Marker | Description |
|--------|-------------|
| `[Flash]` | Quick white flash |
| `[Blackout]` | Brief black frame |
| `[Glitch]` | Digital glitch effect |

**Use case:** Transition between topics or for "record scratch" moments.

## Picture-in-Picture Variations

| Marker | Description |
|--------|-------------|
| `[PipGrow]` | Enlarge talking head cutout |
| `[PipShrink]` | Shrink talking head cutout |
| `[PipHide]` | Temporarily hide talking head |
| `[PipShow]` | Restore talking head |
| `[PipMove:corner]` | Move pip to different corner |

**Use case:** Shrink self when showing important diagram, grow when making personal point.

## Combination Presets

| Marker | Description |
|--------|-------------|
| `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
| `[Surprise]` | Quick zoom + shake |
| `[Sarcasm]` | Slow zoom + tilt |
| `[Reset]` | Return all effects to default |

---

## Architecture: The Camera Abstraction

### The Core Insight

All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
The **camera** views the scene. When the camera zooms, tilts, or pans - everything
moves together, just like a real camera filming a physical set.

```
┌─────────────────────────────────────────────────────────┐
│                        SCENE                           │
│  ┌─────────────────────────────────────────────────┐   │
│  │              Background Layer                   │   │
│  │  ┌─────────────┐                                │   │
│  │  │ Talking Head│      ┌──────────────────┐      │   │
│  │  │   (cutout)  │      │      Slide       │      │   │
│  │  └─────────────┘      │    (from .png)   │      │   │
│  │                       └──────────────────┘      │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   CAMERA    │
                    │  zoom: 1.25 │
                    │  tilt: -15° │
                    │  pan: 0, 0  │
                    └─────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │  Final Output   │
                  │   (1920x1080)   │
                  └─────────────────┘
```

### Why This Matters

**Keynote slides are designed for a specific frame.** If you create a slide with
an arrow pointing at where the talking head cutout will be, that spatial
relationship must be preserved when the camera zooms or tilts.

If we zoomed only the background and not the slides, the arrow would point to
the wrong place. The camera abstraction ensures everything transforms together.

### Camera Properties

```python
@dataclass
class CameraState:
    zoom: float = 1.0        # 1.0 = 100%, 1.25 = 125%
    rotation: float = 0.0    # degrees, positive = clockwise
    pan_x: float = 0.0       # -1.0 to 1.0, percentage of frame
    pan_y: float = 0.0       # -1.0 to 1.0, percentage of frame

@dataclass
class CameraKeyframe:
    time: float              # timestamp in seconds
    state: CameraState
    easing: str = "linear"   # linear, ease-in, ease-out, ease-in-out
```

### Rendering Pipeline (Updated)

```
Current Pipeline:
  Parse → Validate → Transform → Render
                                   │
                                   ▼
                          build_filter_complex()
                                   │
                          [bg] → overlays → [vout]

New Pipeline:
  Parse → Validate → Transform → Render
                         │
                    Extract camera
                    keyframes from
                    markers
                         │
                         ▼
                  build_filter_complex()
                         │
              [bg] → overlays → [scene]
                                   │
                          apply_camera_transform()
                                   │
                              [scene] → zoom/rotate/pan → [vout]
```

### FFmpeg Implementation

The camera transform is a **final filter stage** applied to the composed scene:

```
# Compose scene (existing code)
[0:v]scale=1920:1080[bg];
[bg][slide1]overlay=...[s1];
[s1][talkinghead]overlay=...[scene];

# Camera transform (new)
[scene]scale=iw*{zoom}:ih*{zoom},
       rotate={rotation}*PI/180:fillcolor=black,
       crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
```

For smooth animated zoom (using expressions):
```
[scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
              x='iw/2-(iw/zoom/2)':
              y='ih/2-(ih/zoom/2)':
              d=1:s=1920x1080:fps=30[vout]
```

### Camera Events in Timeline

New model for camera changes:

```python
@dataclass
class CameraEvent:
    time: float
    target_state: CameraState
    duration: float = 0.0      # 0 = instant snap
    easing: str = "ease-out"
```

Markers map to camera events:
- `[Zoom2]` → `CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
- `[TiltLeft]` → `CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
- `[Reset]` → `CameraEvent(time=t, target_state=CameraState(), duration=0.2)`

### Considerations

1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
   larger than output (e.g., 2x) to have room for zoom without quality loss.

2. **Rotation center**: Rotate around frame center, not corner.

3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
   are both active. `[Reset]` clears all.

4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
   transform naturally with the camera. No special handling needed.

5. **Slides stay synced**: Keynote exports are positioned for the base frame.
   Camera zoom/tilt transforms them identically to everything else.

---

## Implementation Plan

### Phase 1: Camera Data Model ✓
- [x] Add `CameraState` and `CameraEvent` to models.py
- [x] Add camera effect markers to transformer.py
- [x] Generate camera keyframes from markers

### Phase 2: Render Pipeline ✓
- [x] Modify renderer to compose to `[scene]` instead of `[vout]`
- [x] Add camera transform stage after composition
- [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now

### Phase 3: Smooth Animation (partial)
- [x] Support animated transitions between keyframes (linear interpolation)
- [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
- [ ] Test with rapid zoom sequences

### Phase 4: Effect Presets ✓
- [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
- [x] Presets defined in `CAMERA_PRESETS` dict in models.py
- [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement