Adding changes version 1
This commit is contained in:
@@ -0,0 +1,317 @@
|
||||
# Partial Rendering Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
|
||||
- Faster iteration during development
|
||||
- Re-rendering specific sections after fixes
|
||||
- Parallel rendering of segments that can be concatenated later
|
||||
|
||||
## Scope (v1)
|
||||
|
||||
**In scope:**
|
||||
- Camera state tracking (cumulative state must be computed from t=0)
|
||||
- Time offset adjustment for all events
|
||||
- Slide range filtering
|
||||
- Input video seeking
|
||||
|
||||
**Out of scope (v1):**
|
||||
- Audio events crossing range boundaries
|
||||
- Triggered video duration edge cases
|
||||
- Events are assumed to begin at their marker timestamp and never "carry over"
|
||||
|
||||
## Current Architecture Analysis
|
||||
|
||||
### 1. Camera State Management
|
||||
|
||||
**Current behavior** (`transformer.py:250-332`):
|
||||
- Camera state is **cumulative** across the transcript
|
||||
- `_extract_camera_events()` walks through ALL markers sequentially
|
||||
- Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
|
||||
- Example: `[Zoom2]` then `[TiltLeft]` = both zoom AND tilt active
|
||||
|
||||
**Problem for partial rendering**:
|
||||
If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
|
||||
|
||||
**Solution**:
|
||||
Separate "state computation" from "event generation":
|
||||
1. Always walk through ALL transcript markers to compute cumulative state
|
||||
2. Track the "initial state" at the start of the render range
|
||||
3. Only emit CameraEvents for markers WITHIN the render range
|
||||
4. First event in partial render must transition FROM the computed initial state
|
||||
|
||||
### 2. Time Signature Adjustment
|
||||
|
||||
**Current behavior**:
|
||||
All timing uses absolute timestamps from `transcript.csv`:
|
||||
- `SlideEvent.start_time/end_time`
|
||||
- `VideoEvent.start_time/end_time`
|
||||
- `AudioEvent.start_time`
|
||||
- `CameraEvent.time`
|
||||
- FFmpeg expressions: `enable=between(t, start, end)`
|
||||
- Camera animation: `if(between(t, 1.000, 1.200), ...)`
|
||||
|
||||
**Problem for partial rendering**:
|
||||
If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
|
||||
|
||||
**Solution**:
|
||||
Apply a `time_offset` to all events after extraction:
|
||||
```
|
||||
new_time = original_time - time_offset
|
||||
```
|
||||
Where `time_offset` = start time of first slide/event in range.
|
||||
|
||||
### 3. Input Video Seeking
|
||||
|
||||
**Current behavior**:
|
||||
- Always-visible videos (talking head) start from the beginning
|
||||
- FFmpeg processes entire input duration
|
||||
|
||||
**Problem for partial rendering**:
|
||||
Need to seek into source videos to the correct position.
|
||||
|
||||
**Solution**:
|
||||
Add `-ss <seek_time>` before input files for always-visible videos:
|
||||
```
|
||||
ffmpeg -ss 10.0 -i talking_head.mov ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Proposed API
|
||||
|
||||
### Command Line Interface
|
||||
|
||||
```bash
|
||||
# Render full video (current behavior)
|
||||
gnommo render example/project.json output.mp4
|
||||
|
||||
# Render specific slide range
|
||||
gnommo render example/project.json output.mp4 --slides S1:S10
|
||||
gnommo render example/project.json output.mp4 --slides S10:S20
|
||||
gnommo render example/project.json output.mp4 --slides S5: # S5 to end
|
||||
|
||||
# Render specific time range (alternative)
|
||||
gnommo render example/project.json output.mp4 --time 0:60
|
||||
gnommo render example/project.json output.mp4 --time 60:120
|
||||
```
|
||||
|
||||
### Internal API
|
||||
|
||||
New parameters for `build_render_plan()`:
|
||||
```python
|
||||
def build_render_plan(
|
||||
...
|
||||
slide_range: Optional[tuple[str, Optional[str]]] = None, # (start_slide, end_slide)
|
||||
# OR
|
||||
time_range: Optional[tuple[float, Optional[float]]] = None, # (start_time, end_time)
|
||||
) -> RenderPlan:
|
||||
```
|
||||
|
||||
New field on `RenderPlan`:
|
||||
```python
|
||||
@dataclass
|
||||
class RenderPlan:
|
||||
...
|
||||
time_offset: float = 0.0 # Offset to subtract from all timestamps
|
||||
initial_camera_state: CameraState = field(default_factory=CameraState) # State at render start
|
||||
input_seek_time: float = 0.0 # Seek position for input videos
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Phase 1: Compute Full State, Filter Events
|
||||
|
||||
Modify `_extract_camera_events()` to accept a time range:
|
||||
|
||||
```python
|
||||
def _extract_camera_events(
|
||||
transcript: list[TimedWord],
|
||||
time_range: Optional[tuple[float, float]] = None, # (start, end)
|
||||
) -> tuple[list[CameraEvent], CameraState]:
|
||||
"""
|
||||
Returns:
|
||||
- List of CameraEvents within time_range
|
||||
- Initial CameraState at start of time_range
|
||||
"""
|
||||
events: list[CameraEvent] = []
|
||||
current_state = CameraState()
|
||||
initial_state = CameraState()
|
||||
start_time, end_time = time_range or (0.0, float('inf'))
|
||||
|
||||
found_start = False
|
||||
|
||||
for timed_word in transcript:
|
||||
if not timed_word.is_marker:
|
||||
continue
|
||||
|
||||
marker_id = timed_word.marker_id
|
||||
if not marker_id or marker_id not in CAMERA_PRESETS:
|
||||
continue
|
||||
|
||||
# Always update current_state (full walk)
|
||||
preset = CAMERA_PRESETS[marker_id]
|
||||
new_state = _apply_preset(current_state, marker_id, preset)
|
||||
|
||||
# Capture state just before we enter the render range
|
||||
if not found_start and timed_word.time >= start_time:
|
||||
initial_state = current_state # State BEFORE this marker
|
||||
found_start = True
|
||||
|
||||
# Only emit events within range
|
||||
if start_time <= timed_word.time < end_time:
|
||||
events.append(CameraEvent(
|
||||
time=timed_word.time,
|
||||
target_state=new_state,
|
||||
duration=0.2,
|
||||
easing="ease-out",
|
||||
))
|
||||
|
||||
current_state = new_state
|
||||
|
||||
return events, initial_state
|
||||
```
|
||||
|
||||
### Phase 2: Apply Time Offset
|
||||
|
||||
After extracting events, apply offset to all timestamps:
|
||||
|
||||
```python
|
||||
def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
|
||||
"""Shift all timestamps by offset (subtract offset from all times)."""
|
||||
|
||||
# Adjust slide events
|
||||
for event in plan.slide_events:
|
||||
event.start_time -= offset
|
||||
event.end_time -= offset
|
||||
|
||||
# Adjust video events
|
||||
for event in plan.video_events:
|
||||
event.start_time -= offset
|
||||
event.end_time -= offset
|
||||
|
||||
# Adjust audio events
|
||||
for event in plan.audio_events:
|
||||
event.start_time = max(0, event.start_time - offset)
|
||||
|
||||
# Adjust camera events
|
||||
for event in plan.camera_events:
|
||||
event.time -= offset
|
||||
|
||||
# Adjust total duration
|
||||
plan.total_duration -= offset
|
||||
plan.time_offset = offset
|
||||
plan.input_seek_time = offset
|
||||
|
||||
return plan
|
||||
```
|
||||
|
||||
### Phase 3: FFmpeg Seeking
|
||||
|
||||
Modify `build_ffmpeg_command()` to add seeking:
|
||||
|
||||
```python
|
||||
def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
|
||||
cmd = ["ffmpeg", "-y"]
|
||||
|
||||
# Add seek for always-visible videos
|
||||
for video_id, video_source, cutout in plan.narration_videos:
|
||||
video_path = _resolve_video_path(videos_dir, video_source)
|
||||
if plan.input_seek_time > 0:
|
||||
cmd.extend(["-ss", str(plan.input_seek_time)]) # Seek BEFORE -i
|
||||
cmd.extend(["-i", str(video_path)])
|
||||
...
|
||||
```
|
||||
|
||||
### Phase 4: Initial Camera State Handling
|
||||
|
||||
If `initial_camera_state` is not default, inject a "virtual" camera event at t=0:
|
||||
|
||||
```python
|
||||
def build_camera_transform(
|
||||
camera_events: list[CameraEvent],
|
||||
initial_state: CameraState, # NEW PARAMETER
|
||||
...
|
||||
) -> str:
|
||||
# If initial state differs from default, prepend a virtual event
|
||||
if not initial_state.is_default():
|
||||
initial_event = CameraEvent(
|
||||
time=0.0,
|
||||
target_state=initial_state,
|
||||
duration=0.0, # Instant - no transition
|
||||
easing="linear",
|
||||
)
|
||||
camera_events = [initial_event] + camera_events
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## FFmpeg Optimization
|
||||
|
||||
**Only emit filters for events within range.**
|
||||
|
||||
When rendering a partial range, the `RenderPlan` should only contain events within that range. This means:
|
||||
- Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
|
||||
- Fewer overlay filters in filter_complex
|
||||
- Fewer `between(t, start, end)` enable expressions to evaluate per frame
|
||||
|
||||
Example: Full video has 50 slides, rendering S40:S50 only:
|
||||
- **Before**: 50 slide inputs, 50 overlay filters
|
||||
- **After**: 10 slide inputs, 10 overlay filters
|
||||
|
||||
This is achieved naturally by filtering events in `build_render_plan()` before constructing the plan - the renderer already only processes events present in the plan.
|
||||
|
||||
---
|
||||
|
||||
## Edge Cases (v1 Simplified)
|
||||
|
||||
### 1. Camera state from before range
|
||||
If rendering S5:S10 but there's a camera event at the S4 marker:
|
||||
- Camera state from S4 must be captured as `initial_camera_state`
|
||||
- Rendered output starts with that state already applied at t=0
|
||||
|
||||
### 2. Events filter by marker position
|
||||
All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
|
||||
- Events beginning outside range are excluded
|
||||
- No "carry over" or boundary-crossing logic needed
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
1. Camera state computation maintains state across full transcript
|
||||
2. Time offset correctly shifts all event types
|
||||
3. Initial camera state correctly captured at boundary
|
||||
|
||||
### Integration Tests
|
||||
1. Render slides 1-5, then 5-10, concatenate, compare to full render
|
||||
2. Camera state continuity across segment boundaries
|
||||
3. Audio alignment after seeking
|
||||
|
||||
### Manual Verification
|
||||
1. Visual inspection of camera state at segment boundaries
|
||||
2. Audio sync verification
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Parallel Rendering Pipeline
|
||||
```bash
|
||||
# Render in parallel, then concatenate
|
||||
gnommo render proj.json seg1.mp4 --slides S1:S10 &
|
||||
gnommo render proj.json seg2.mp4 --slides S10:S20 &
|
||||
gnommo render proj.json seg3.mp4 --slides S20: &
|
||||
wait
|
||||
ffmpeg -f concat -i segments.txt -c copy final.mp4
|
||||
```
|
||||
|
||||
### Smart Re-rendering
|
||||
Track which slides changed and only re-render affected segments.
|
||||
|
||||
### Preview Mode
|
||||
Quick low-quality render of specific section for review.
|
||||
@@ -0,0 +1,265 @@
|
||||
# Virtual Camera Effects
|
||||
|
||||
Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
|
||||
These effects are triggered by markers in the manuscript, just like slides.
|
||||
|
||||
## Zoom Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[Zoom1]` | Zoom to 110% - subtle emphasis |
|
||||
| `[Zoom2]` | Zoom to 125% - moderate emphasis |
|
||||
| `[Zoom3]` | Zoom to 150% - strong emphasis |
|
||||
| `[Zoom0]` | Return to 100% (default) |
|
||||
| `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |
|
||||
|
||||
**Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.
|
||||
|
||||
## Tilt/Rotation Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[TiltLeft]` | Rotate -15 degrees |
|
||||
| `[TiltRight]` | Rotate +15 degrees |
|
||||
| `[NoTilt]` | Return to 0 degrees |
|
||||
| `[TiltShake]` | Quick left-right shake (confusion/emphasis) |
|
||||
|
||||
**Use case:** Tilt when saying something "off" or wrong, return to flat for correction.
|
||||
|
||||
## Pan/Position Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[PanLeft]` | Shift frame left (subject moves right) |
|
||||
| `[PanRight]` | Shift frame right (subject moves left) |
|
||||
| `[PanUp]` | Shift frame up |
|
||||
| `[PanDown]` | Shift frame down |
|
||||
| `[PanCenter]` | Return to center |
|
||||
|
||||
**Use case:** Pan to make room for a slide appearing on one side.
|
||||
|
||||
## Shake/Movement Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[Shake]` | Brief screen shake (impact, surprise) |
|
||||
| `[ShakeHard]` | Intense shake (explosion, error) |
|
||||
| `[Wobble]` | Gentle continuous wobble |
|
||||
| `[NoWobble]` | Stop wobble |
|
||||
|
||||
**Use case:** Shake on "WRONG!" or when something crashes/fails.
|
||||
|
||||
## Speed/Rhythm Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[Beat]` | Single visual pulse (scale bump) |
|
||||
| `[BeatStart]` | Start pulsing to rhythm |
|
||||
| `[BeatStop]` | Stop pulsing |
|
||||
|
||||
**Use case:** Rhythmic emphasis during lists or key points.
|
||||
|
||||
## Transition Effects
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[Flash]` | Quick white flash |
|
||||
| `[Blackout]` | Brief black frame |
|
||||
| `[Glitch]` | Digital glitch effect |
|
||||
|
||||
**Use case:** Transition between topics or for "record scratch" moments.
|
||||
|
||||
## Picture-in-Picture Variations
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[PipGrow]` | Enlarge talking head cutout |
|
||||
| `[PipShrink]` | Shrink talking head cutout |
|
||||
| `[PipHide]` | Temporarily hide talking head |
|
||||
| `[PipShow]` | Restore talking head |
|
||||
| `[PipMove:corner]` | Move pip to different corner |
|
||||
|
||||
**Use case:** Shrink self when showing important diagram, grow when making personal point.
|
||||
|
||||
## Combination Presets
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
|
||||
| `[Surprise]` | Quick zoom + shake |
|
||||
| `[Sarcasm]` | Slow zoom + tilt |
|
||||
| `[Reset]` | Return all effects to default |
|
||||
|
||||
---
|
||||
|
||||
## Architecture: The Camera Abstraction
|
||||
|
||||
### The Core Insight
|
||||
|
||||
All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
|
||||
The **camera** views the scene. When the camera zooms, tilts, or pans - everything
|
||||
moves together, just like a real camera filming a physical set.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ SCENE │
|
||||
│ ┌─────────────────────────────────────────────────┐ │
|
||||
│ │ Background Layer │ │
|
||||
│ │ ┌─────────────┐ │ │
|
||||
│ │ │ Talking Head│ ┌──────────────────┐ │ │
|
||||
│ │ │ (cutout) │ │ Slide │ │ │
|
||||
│ │ └─────────────┘ │ (from .png) │ │ │
|
||||
│ │ └──────────────────┘ │ │
|
||||
│ └─────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ CAMERA │
|
||||
│ zoom: 1.25 │
|
||||
│ tilt: -15° │
|
||||
│ pan: 0, 0 │
|
||||
└─────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Final Output │
|
||||
│ (1920x1080) │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
### Why This Matters
|
||||
|
||||
**Keynote slides are designed for a specific frame.** If you create a slide with
|
||||
an arrow pointing at where the talking head cutout will be, that spatial
|
||||
relationship must be preserved when the camera zooms or tilts.
|
||||
|
||||
If we zoomed only the background and not the slides, the arrow would point to
|
||||
the wrong place. The camera abstraction ensures everything transforms together.
|
||||
|
||||
### Camera Properties
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CameraState:
|
||||
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%
|
||||
rotation: float = 0.0 # degrees, positive = clockwise
|
||||
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame
|
||||
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame
|
||||
|
||||
@dataclass
|
||||
class CameraKeyframe:
|
||||
time: float # timestamp in seconds
|
||||
state: CameraState
|
||||
easing: str = "linear" # linear, ease-in, ease-out, ease-in-out
|
||||
```
|
||||
|
||||
### Rendering Pipeline (Updated)
|
||||
|
||||
```
|
||||
Current Pipeline:
|
||||
Parse → Validate → Transform → Render
|
||||
│
|
||||
▼
|
||||
build_filter_complex()
|
||||
│
|
||||
[bg] → overlays → [vout]
|
||||
|
||||
New Pipeline:
|
||||
Parse → Validate → Transform → Render
|
||||
│
|
||||
Extract camera
|
||||
keyframes from
|
||||
markers
|
||||
│
|
||||
▼
|
||||
build_filter_complex()
|
||||
│
|
||||
[bg] → overlays → [scene]
|
||||
│
|
||||
apply_camera_transform()
|
||||
│
|
||||
[scene] → zoom/rotate/pan → [vout]
|
||||
```
|
||||
|
||||
### FFmpeg Implementation
|
||||
|
||||
The camera transform is a **final filter stage** applied to the composed scene:
|
||||
|
||||
```
|
||||
# Compose scene (existing code)
|
||||
[0:v]scale=1920:1080[bg];
|
||||
[bg][slide1]overlay=...[s1];
|
||||
[s1][talkinghead]overlay=...[scene];
|
||||
|
||||
# Camera transform (new)
|
||||
[scene]scale=iw*{zoom}:ih*{zoom},
|
||||
rotate={rotation}*PI/180:fillcolor=black,
|
||||
crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
|
||||
```
|
||||
|
||||
For smooth animated zoom (using expressions):
|
||||
```
|
||||
[scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
|
||||
x='iw/2-(iw/zoom/2)':
|
||||
y='ih/2-(ih/zoom/2)':
|
||||
d=1:s=1920x1080:fps=30[vout]
|
||||
```
|
||||
|
||||
### Camera Events in Timeline
|
||||
|
||||
New model for camera changes:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CameraEvent:
|
||||
time: float
|
||||
target_state: CameraState
|
||||
duration: float = 0.0 # 0 = instant snap
|
||||
easing: str = "ease-out"
|
||||
```
|
||||
|
||||
Markers map to camera events:
|
||||
- `[Zoom2]` → `CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
|
||||
- `[TiltLeft]` → `CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
|
||||
- `[Reset]` → `CameraEvent(time=t, target_state=CameraState(), duration=0.2)`
|
||||
|
||||
### Considerations
|
||||
|
||||
1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
|
||||
larger than output (e.g., 2x) to have room for zoom without quality loss.
|
||||
|
||||
2. **Rotation center**: Rotate around frame center, not corner.
|
||||
|
||||
3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
|
||||
are both active. `[Reset]` clears all.
|
||||
|
||||
4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
|
||||
transform naturally with the camera. No special handling needed.
|
||||
|
||||
5. **Slides stay synced**: Keynote exports are positioned for the base frame.
|
||||
Camera zoom/tilt transforms them identically to everything else.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Camera Data Model ✓
|
||||
- [x] Add `CameraState` and `CameraEvent` to models.py
|
||||
- [x] Add camera effect markers to transformer.py
|
||||
- [x] Generate camera keyframes from markers
|
||||
|
||||
### Phase 2: Render Pipeline ✓
|
||||
- [x] Modify renderer to compose to `[scene]` instead of `[vout]`
|
||||
- [x] Add camera transform stage after composition
|
||||
- [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now
|
||||
|
||||
### Phase 3: Smooth Animation (partial)
|
||||
- [x] Support animated transitions between keyframes (linear interpolation)
|
||||
- [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
|
||||
- [ ] Test with rapid zoom sequences
|
||||
|
||||
### Phase 4: Effect Presets ✓
|
||||
- [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
|
||||
- [x] Presets defined in `CAMERA_PRESETS` dict in models.py
|
||||
- [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement
|
||||
Reference in New Issue
Block a user