9.4 KiB
Partial Rendering Specification
Overview
Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
- Faster iteration during development
- Re-rendering specific sections after fixes
- Parallel rendering of segments that can be concatenated later
Scope (v1)
In scope:
- Camera state tracking (cumulative state must be computed from t=0)
- Time offset adjustment for all events
- Slide range filtering
- Input video seeking
Out of scope (v1):
- Audio events crossing range boundaries
- Triggered video duration edge cases
- Events are assumed to begin at their marker timestamp and never "carry over"
Current Architecture Analysis
1. Camera State Management
Current behavior (transformer.py:250-332):
- Camera state is cumulative across the transcript
_extract_camera_events()walks through ALL markers sequentially- Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
- Example:
[Zoom2]then[TiltLeft]= both zoom AND tilt active
Problem for partial rendering: If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
Solution: Separate "state computation" from "event generation":
- Always walk through ALL transcript markers to compute cumulative state
- Track the "initial state" at the start of the render range
- Only emit CameraEvents for markers WITHIN the render range
- First event in partial render must transition FROM the computed initial state
2. Time Signature Adjustment
Current behavior:
All timing uses absolute timestamps from transcript.csv:
SlideEvent.start_time/end_timeVideoEvent.start_time/end_timeAudioEvent.start_timeCameraEvent.time- FFmpeg expressions:
enable=between(t, start, end) - Camera animation:
if(between(t, 1.000, 1.200), ...)
Problem for partial rendering: If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
Solution:
Apply a time_offset to all events after extraction:
new_time = original_time - time_offset
Where time_offset = start time of first slide/event in range.
3. Input Video Seeking
Current behavior:
- Always-visible videos (talking head) start from the beginning
- FFmpeg processes entire input duration
Problem for partial rendering: Need to seek into source videos to the correct position.
Solution:
Add -ss <seek_time> before input files for always-visible videos:
ffmpeg -ss 10.0 -i talking_head.mov ...
Proposed API
Command Line Interface
# Render full video (current behavior)
gnommo render example/project.json output.mp4
# Render specific slide range
gnommo render example/project.json output.mp4 --slides S1:S10
gnommo render example/project.json output.mp4 --slides S10:S20
gnommo render example/project.json output.mp4 --slides S5: # S5 to end
# Render specific time range (alternative)
gnommo render example/project.json output.mp4 --time 0:60
gnommo render example/project.json output.mp4 --time 60:120
Internal API
New parameters for build_render_plan():
def build_render_plan(
...
slide_range: Optional[tuple[str, Optional[str]]] = None, # (start_slide, end_slide)
# OR
time_range: Optional[tuple[float, Optional[float]]] = None, # (start_time, end_time)
) -> RenderPlan:
New field on RenderPlan:
@dataclass
class RenderPlan:
...
time_offset: float = 0.0 # Offset to subtract from all timestamps
initial_camera_state: CameraState = field(default_factory=CameraState) # State at render start
input_seek_time: float = 0.0 # Seek position for input videos
Implementation Details
Phase 1: Compute Full State, Filter Events
Modify _extract_camera_events() to accept a time range:
def _extract_camera_events(
transcript: list[TimedWord],
time_range: Optional[tuple[float, float]] = None, # (start, end)
) -> tuple[list[CameraEvent], CameraState]:
"""
Returns:
- List of CameraEvents within time_range
- Initial CameraState at start of time_range
"""
events: list[CameraEvent] = []
current_state = CameraState()
initial_state = CameraState()
start_time, end_time = time_range or (0.0, float('inf'))
found_start = False
for timed_word in transcript:
if not timed_word.is_marker:
continue
marker_id = timed_word.marker_id
if not marker_id or marker_id not in CAMERA_PRESETS:
continue
# Always update current_state (full walk)
preset = CAMERA_PRESETS[marker_id]
new_state = _apply_preset(current_state, marker_id, preset)
# Capture state just before we enter the render range
if not found_start and timed_word.time >= start_time:
initial_state = current_state # State BEFORE this marker
found_start = True
# Only emit events within range
if start_time <= timed_word.time < end_time:
events.append(CameraEvent(
time=timed_word.time,
target_state=new_state,
duration=0.2,
easing="ease-out",
))
current_state = new_state
return events, initial_state
Phase 2: Apply Time Offset
After extracting events, apply offset to all timestamps:
def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
"""Shift all timestamps by offset (subtract offset from all times)."""
# Adjust slide events
for event in plan.slide_events:
event.start_time -= offset
event.end_time -= offset
# Adjust video events
for event in plan.video_events:
event.start_time -= offset
event.end_time -= offset
# Adjust audio events
for event in plan.audio_events:
event.start_time = max(0, event.start_time - offset)
# Adjust camera events
for event in plan.camera_events:
event.time -= offset
# Adjust total duration
plan.total_duration -= offset
plan.time_offset = offset
plan.input_seek_time = offset
return plan
Phase 3: FFmpeg Seeking
Modify build_ffmpeg_command() to add seeking:
def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
cmd = ["ffmpeg", "-y"]
# Add seek for always-visible videos
for video_id, video_source, cutout in plan.narration_videos:
video_path = _resolve_video_path(videos_dir, video_source)
if plan.input_seek_time > 0:
cmd.extend(["-ss", str(plan.input_seek_time)]) # Seek BEFORE -i
cmd.extend(["-i", str(video_path)])
...
Phase 4: Initial Camera State Handling
If initial_camera_state is not default, inject a "virtual" camera event at t=0:
def build_camera_transform(
camera_events: list[CameraEvent],
initial_state: CameraState, # NEW PARAMETER
...
) -> str:
# If initial state differs from default, prepend a virtual event
if not initial_state.is_default():
initial_event = CameraEvent(
time=0.0,
target_state=initial_state,
duration=0.0, # Instant - no transition
easing="linear",
)
camera_events = [initial_event] + camera_events
...
FFmpeg Optimization
Only emit filters for events within range.
When rendering a partial range, the RenderPlan should only contain events within that range. This means:
- Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
- Fewer overlay filters in filter_complex
- Fewer
between(t, start, end)enable expressions to evaluate per frame
Example: Full video has 50 slides, rendering S40:S50 only:
- Before: 50 slide inputs, 50 overlay filters
- After: 10 slide inputs, 10 overlay filters
This is achieved naturally by filtering events in build_render_plan() before constructing the plan - the renderer already only processes events present in the plan.
Edge Cases (v1 Simplified)
1. Camera state from before range
If rendering S5:S10 but there's a camera event at the S4 marker:
- Camera state from S4 must be captured as
initial_camera_state - Rendered output starts with that state already applied at t=0
2. Events filter by marker position
All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
- Events beginning outside range are excluded
- No "carry over" or boundary-crossing logic needed
Testing Strategy
Unit Tests
- Camera state computation maintains state across full transcript
- Time offset correctly shifts all event types
- Initial camera state correctly captured at boundary
Integration Tests
- Render slides 1-5, then 5-10, concatenate, compare to full render
- Camera state continuity across segment boundaries
- Audio alignment after seeking
Manual Verification
- Visual inspection of camera state at segment boundaries
- Audio sync verification
Future Enhancements
Parallel Rendering Pipeline
# Render in parallel, then concatenate
gnommo render proj.json seg1.mp4 --slides S1:S10 &
gnommo render proj.json seg2.mp4 --slides S10:S20 &
gnommo render proj.json seg3.mp4 --slides S20: &
wait
ffmpeg -f concat -i segments.txt -c copy final.mp4
Smart Re-rendering
Track which slides changed and only re-render affected segments.
Preview Mode
Quick low-quality render of specific section for review.