Adding changes version 1

This commit is contained in:
2026-02-06 17:56:05 +01:00
parent 93fa820275
commit fdd275ac0e
30 changed files with 7068 additions and 888 deletions
+317
View File
@@ -0,0 +1,317 @@
# Partial Rendering Specification
## Overview
Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
- Faster iteration during development
- Re-rendering specific sections after fixes
- Parallel rendering of segments that can be concatenated later
## Scope (v1)
**In scope:**
- Camera state tracking (cumulative state must be computed from t=0)
- Time offset adjustment for all events
- Slide range filtering
- Input video seeking
**Out of scope (v1):**
- Audio events crossing range boundaries
- Triggered video duration edge cases
- Events are assumed to begin at their marker timestamp and never "carry over"
## Current Architecture Analysis
### 1. Camera State Management
**Current behavior** (`transformer.py:250-332`):
- Camera state is **cumulative** across the transcript
- `_extract_camera_events()` walks through ALL markers sequentially
- Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
- Example: `[Zoom2]` then `[TiltLeft]` = both zoom AND tilt active
**Problem for partial rendering**:
If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
**Solution**:
Separate "state computation" from "event generation":
1. Always walk through ALL transcript markers to compute cumulative state
2. Track the "initial state" at the start of the render range
3. Only emit CameraEvents for markers WITHIN the render range
4. First event in partial render must transition FROM the computed initial state
### 2. Time Signature Adjustment
**Current behavior**:
All timing uses absolute timestamps from `transcript.csv`:
- `SlideEvent.start_time/end_time`
- `VideoEvent.start_time/end_time`
- `AudioEvent.start_time`
- `CameraEvent.time`
- FFmpeg expressions: `enable=between(t, start, end)`
- Camera animation: `if(between(t, 1.000, 1.200), ...)`
**Problem for partial rendering**:
If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
**Solution**:
Apply a `time_offset` to all events after extraction:
```
new_time = original_time - time_offset
```
Where `time_offset` = start time of first slide/event in range.
### 3. Input Video Seeking
**Current behavior**:
- Always-visible videos (talking head) start from the beginning
- FFmpeg processes entire input duration
**Problem for partial rendering**:
Need to seek into source videos to the correct position.
**Solution**:
Add `-ss <seek_time>` before input files for always-visible videos:
```
ffmpeg -ss 10.0 -i talking_head.mov ...
```
---
## Proposed API
### Command Line Interface
```bash
# Render full video (current behavior)
gnommo render example/project.json output.mp4
# Render specific slide range
gnommo render example/project.json output.mp4 --slides S1:S10
gnommo render example/project.json output.mp4 --slides S10:S20
gnommo render example/project.json output.mp4 --slides S5: # S5 to end
# Render specific time range (alternative)
gnommo render example/project.json output.mp4 --time 0:60
gnommo render example/project.json output.mp4 --time 60:120
```
### Internal API
New parameters for `build_render_plan()`:
```python
def build_render_plan(
...
slide_range: Optional[tuple[str, Optional[str]]] = None, # (start_slide, end_slide)
# OR
time_range: Optional[tuple[float, Optional[float]]] = None, # (start_time, end_time)
) -> RenderPlan:
```
New field on `RenderPlan`:
```python
@dataclass
class RenderPlan:
...
time_offset: float = 0.0 # Offset to subtract from all timestamps
initial_camera_state: CameraState = field(default_factory=CameraState) # State at render start
input_seek_time: float = 0.0 # Seek position for input videos
```
---
## Implementation Details
### Phase 1: Compute Full State, Filter Events
Modify `_extract_camera_events()` to accept a time range:
```python
def _extract_camera_events(
transcript: list[TimedWord],
time_range: Optional[tuple[float, float]] = None, # (start, end)
) -> tuple[list[CameraEvent], CameraState]:
"""
Returns:
- List of CameraEvents within time_range
- Initial CameraState at start of time_range
"""
events: list[CameraEvent] = []
current_state = CameraState()
initial_state = CameraState()
start_time, end_time = time_range or (0.0, float('inf'))
found_start = False
for timed_word in transcript:
if not timed_word.is_marker:
continue
marker_id = timed_word.marker_id
if not marker_id or marker_id not in CAMERA_PRESETS:
continue
# Always update current_state (full walk)
preset = CAMERA_PRESETS[marker_id]
new_state = _apply_preset(current_state, marker_id, preset)
# Capture state just before we enter the render range
if not found_start and timed_word.time >= start_time:
initial_state = current_state # State BEFORE this marker
found_start = True
# Only emit events within range
if start_time <= timed_word.time < end_time:
events.append(CameraEvent(
time=timed_word.time,
target_state=new_state,
duration=0.2,
easing="ease-out",
))
current_state = new_state
return events, initial_state
```
### Phase 2: Apply Time Offset
After extracting events, apply offset to all timestamps:
```python
def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
"""Shift all timestamps by offset (subtract offset from all times)."""
# Adjust slide events
for event in plan.slide_events:
event.start_time -= offset
event.end_time -= offset
# Adjust video events
for event in plan.video_events:
event.start_time -= offset
event.end_time -= offset
# Adjust audio events
for event in plan.audio_events:
event.start_time = max(0, event.start_time - offset)
# Adjust camera events
for event in plan.camera_events:
event.time -= offset
# Adjust total duration
plan.total_duration -= offset
plan.time_offset = offset
plan.input_seek_time = offset
return plan
```
### Phase 3: FFmpeg Seeking
Modify `build_ffmpeg_command()` to add seeking:
```python
def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
cmd = ["ffmpeg", "-y"]
# Add seek for always-visible videos
for video_id, video_source, cutout in plan.narration_videos:
video_path = _resolve_video_path(videos_dir, video_source)
if plan.input_seek_time > 0:
cmd.extend(["-ss", str(plan.input_seek_time)]) # Seek BEFORE -i
cmd.extend(["-i", str(video_path)])
...
```
### Phase 4: Initial Camera State Handling
If `initial_camera_state` is not default, inject a "virtual" camera event at t=0:
```python
def build_camera_transform(
camera_events: list[CameraEvent],
initial_state: CameraState, # NEW PARAMETER
...
) -> str:
# If initial state differs from default, prepend a virtual event
if not initial_state.is_default():
initial_event = CameraEvent(
time=0.0,
target_state=initial_state,
duration=0.0, # Instant - no transition
easing="linear",
)
camera_events = [initial_event] + camera_events
...
```
---
## FFmpeg Optimization
**Only emit filters for events within range.**
When rendering a partial range, the `RenderPlan` should only contain events within that range. This means:
- Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
- Fewer overlay filters in filter_complex
- Fewer `between(t, start, end)` enable expressions to evaluate per frame
Example: Full video has 50 slides, rendering S40:S50 only:
- **Before**: 50 slide inputs, 50 overlay filters
- **After**: 10 slide inputs, 10 overlay filters
This is achieved naturally by filtering events in `build_render_plan()` before constructing the plan - the renderer already only processes events present in the plan.
---
## Edge Cases (v1 Simplified)
### 1. Camera state from before range
If rendering S5:S10 but there's a camera event at the S4 marker:
- Camera state from S4 must be captured as `initial_camera_state`
- Rendered output starts with that state already applied at t=0
### 2. Events filter by marker position
All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
- Events beginning outside range are excluded
- No "carry over" or boundary-crossing logic needed
---
## Testing Strategy
### Unit Tests
1. Camera state computation maintains state across full transcript
2. Time offset correctly shifts all event types
3. Initial camera state correctly captured at boundary
### Integration Tests
1. Render slides 1-5, then 5-10, concatenate, compare to full render
2. Camera state continuity across segment boundaries
3. Audio alignment after seeking
### Manual Verification
1. Visual inspection of camera state at segment boundaries
2. Audio sync verification
---
## Future Enhancements
### Parallel Rendering Pipeline
```bash
# Render in parallel, then concatenate
gnommo render proj.json seg1.mp4 --slides S1:S10 &
gnommo render proj.json seg2.mp4 --slides S10:S20 &
gnommo render proj.json seg3.mp4 --slides S20: &
wait
ffmpeg -f concat -i segments.txt -c copy final.mp4
```
### Smart Re-rendering
Track which slides changed and only re-render affected segments.
### Preview Mode
Quick low-quality render of specific section for review.
+265
View File
@@ -0,0 +1,265 @@
# Virtual Camera Effects
Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
These effects are triggered by markers in the manuscript, just like slides.
## Zoom Effects
| Marker | Description |
|--------|-------------|
| `[Zoom1]` | Zoom to 110% - subtle emphasis |
| `[Zoom2]` | Zoom to 125% - moderate emphasis |
| `[Zoom3]` | Zoom to 150% - strong emphasis |
| `[Zoom0]` | Return to 100% (default) |
| `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |
**Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.
## Tilt/Rotation Effects
| Marker | Description |
|--------|-------------|
| `[TiltLeft]` | Rotate -15 degrees |
| `[TiltRight]` | Rotate +15 degrees |
| `[NoTilt]` | Return to 0 degrees |
| `[TiltShake]` | Quick left-right shake (confusion/emphasis) |
**Use case:** Tilt when saying something "off" or wrong, return to flat for correction.
## Pan/Position Effects
| Marker | Description |
|--------|-------------|
| `[PanLeft]` | Shift frame left (subject moves right) |
| `[PanRight]` | Shift frame right (subject moves left) |
| `[PanUp]` | Shift frame up |
| `[PanDown]` | Shift frame down |
| `[PanCenter]` | Return to center |
**Use case:** Pan to make room for a slide appearing on one side.
## Shake/Movement Effects
| Marker | Description |
|--------|-------------|
| `[Shake]` | Brief screen shake (impact, surprise) |
| `[ShakeHard]` | Intense shake (explosion, error) |
| `[Wobble]` | Gentle continuous wobble |
| `[NoWobble]` | Stop wobble |
**Use case:** Shake on "WRONG!" or when something crashes/fails.
## Speed/Rhythm Effects
| Marker | Description |
|--------|-------------|
| `[Beat]` | Single visual pulse (scale bump) |
| `[BeatStart]` | Start pulsing to rhythm |
| `[BeatStop]` | Stop pulsing |
**Use case:** Rhythmic emphasis during lists or key points.
## Transition Effects
| Marker | Description |
|--------|-------------|
| `[Flash]` | Quick white flash |
| `[Blackout]` | Brief black frame |
| `[Glitch]` | Digital glitch effect |
**Use case:** Transition between topics or for "record scratch" moments.
## Picture-in-Picture Variations
| Marker | Description |
|--------|-------------|
| `[PipGrow]` | Enlarge talking head cutout |
| `[PipShrink]` | Shrink talking head cutout |
| `[PipHide]` | Temporarily hide talking head |
| `[PipShow]` | Restore talking head |
| `[PipMove:corner]` | Move pip to different corner |
**Use case:** Shrink self when showing important diagram, grow when making personal point.
## Combination Presets
| Marker | Description |
|--------|-------------|
| `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
| `[Surprise]` | Quick zoom + shake |
| `[Sarcasm]` | Slow zoom + tilt |
| `[Reset]` | Return all effects to default |
---
## Architecture: The Camera Abstraction
### The Core Insight
All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
The **camera** views the scene. When the camera zooms, tilts, or pans - everything
moves together, just like a real camera filming a physical set.
```
┌─────────────────────────────────────────────────────────┐
│ SCENE │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Background Layer │ │
│ │ ┌─────────────┐ │ │
│ │ │ Talking Head│ ┌──────────────────┐ │ │
│ │ │ (cutout) │ │ Slide │ │ │
│ │ └─────────────┘ │ (from .png) │ │ │
│ │ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────┐
│ CAMERA │
│ zoom: 1.25 │
│ tilt: -15° │
│ pan: 0, 0 │
└─────────────┘
┌─────────────────┐
│ Final Output │
│ (1920x1080) │
└─────────────────┘
```
### Why This Matters
**Keynote slides are designed for a specific frame.** If you create a slide with
an arrow pointing at where the talking head cutout will be, that spatial
relationship must be preserved when the camera zooms or tilts.
If we zoomed only the background and not the slides, the arrow would point to
the wrong place. The camera abstraction ensures everything transforms together.
### Camera Properties
```python
@dataclass
class CameraState:
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%
rotation: float = 0.0 # degrees, positive = clockwise
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame
@dataclass
class CameraKeyframe:
time: float # timestamp in seconds
state: CameraState
easing: str = "linear" # linear, ease-in, ease-out, ease-in-out
```
### Rendering Pipeline (Updated)
```
Current Pipeline:
Parse → Validate → Transform → Render
build_filter_complex()
[bg] → overlays → [vout]
New Pipeline:
Parse → Validate → Transform → Render
Extract camera
keyframes from
markers
build_filter_complex()
[bg] → overlays → [scene]
apply_camera_transform()
[scene] → zoom/rotate/pan → [vout]
```
### FFmpeg Implementation
The camera transform is a **final filter stage** applied to the composed scene:
```
# Compose scene (existing code)
[0:v]scale=1920:1080[bg];
[bg][slide1]overlay=...[s1];
[s1][talkinghead]overlay=...[scene];
# Camera transform (new)
[scene]scale=iw*{zoom}:ih*{zoom},
rotate={rotation}*PI/180:fillcolor=black,
crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
```
For smooth animated zoom (using expressions):
```
[scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
x='iw/2-(iw/zoom/2)':
y='ih/2-(ih/zoom/2)':
d=1:s=1920x1080:fps=30[vout]
```
### Camera Events in Timeline
New model for camera changes:
```python
@dataclass
class CameraEvent:
time: float
target_state: CameraState
duration: float = 0.0 # 0 = instant snap
easing: str = "ease-out"
```
Markers map to camera events:
- `[Zoom2]``CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
- `[TiltLeft]``CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
- `[Reset]``CameraEvent(time=t, target_state=CameraState(), duration=0.2)`
### Considerations
1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
larger than output (e.g., 2x) to have room for zoom without quality loss.
2. **Rotation center**: Rotate around frame center, not corner.
3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
are both active. `[Reset]` clears all.
4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
transform naturally with the camera. No special handling needed.
5. **Slides stay synced**: Keynote exports are positioned for the base frame.
Camera zoom/tilt transforms them identically to everything else.
---
## Implementation Plan
### Phase 1: Camera Data Model ✓
- [x] Add `CameraState` and `CameraEvent` to models.py
- [x] Add camera effect markers to transformer.py
- [x] Generate camera keyframes from markers
### Phase 2: Render Pipeline ✓
- [x] Modify renderer to compose to `[scene]` instead of `[vout]`
- [x] Add camera transform stage after composition
- [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now
### Phase 3: Smooth Animation (partial)
- [x] Support animated transitions between keyframes (linear interpolation)
- [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
- [ ] Test with rapid zoom sequences
### Phase 4: Effect Presets ✓
- [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
- [x] Presets defined in `CAMERA_PRESETS` dict in models.py
- [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement