Files
gnommo/docs/partial-rendering-spec.md
T

9.4 KiB

Partial Rendering Specification

Overview

Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:

  • Faster iteration during development
  • Re-rendering specific sections after fixes
  • Parallel rendering of segments that can be concatenated later

Scope (v1)

In scope:

  • Camera state tracking (cumulative state must be computed from t=0)
  • Time offset adjustment for all events
  • Slide range filtering
  • Input video seeking

Out of scope (v1):

  • Audio events crossing range boundaries
  • Triggered video duration edge cases
  • Events are assumed to begin at their marker timestamp and never "carry over"

Current Architecture Analysis

1. Camera State Management

Current behavior (transformer.py:250-332):

  • Camera state is cumulative across the transcript
  • _extract_camera_events() walks through ALL markers sequentially
  • Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
  • Example: [Zoom2] then [TiltLeft] = both zoom AND tilt active

Problem for partial rendering: If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.

Solution: Separate "state computation" from "event generation":

  1. Always walk through ALL transcript markers to compute cumulative state
  2. Track the "initial state" at the start of the render range
  3. Only emit CameraEvents for markers WITHIN the render range
  4. First event in partial render must transition FROM the computed initial state

2. Time Signature Adjustment

Current behavior: All timing uses absolute timestamps from transcript.csv:

  • SlideEvent.start_time/end_time
  • VideoEvent.start_time/end_time
  • AudioEvent.start_time
  • CameraEvent.time
  • FFmpeg expressions: enable=between(t, start, end)
  • Camera animation: if(between(t, 1.000, 1.200), ...)

Problem for partial rendering: If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.

Solution: Apply a time_offset to all events after extraction:

new_time = original_time - time_offset

Where time_offset = start time of first slide/event in range.

3. Input Video Seeking

Current behavior:

  • Always-visible videos (talking head) start from the beginning
  • FFmpeg processes entire input duration

Problem for partial rendering: Need to seek into source videos to the correct position.

Solution: Add -ss <seek_time> before input files for always-visible videos:

ffmpeg -ss 10.0 -i talking_head.mov ...

Proposed API

Command Line Interface

# Render full video (current behavior)
gnommo render example/project.json output.mp4

# Render specific slide range
gnommo render example/project.json output.mp4 --slides S1:S10
gnommo render example/project.json output.mp4 --slides S10:S20
gnommo render example/project.json output.mp4 --slides S5:  # S5 to end

# Render specific time range (alternative)
gnommo render example/project.json output.mp4 --time 0:60
gnommo render example/project.json output.mp4 --time 60:120

Internal API

New parameters for build_render_plan():

def build_render_plan(
    ...
    slide_range: Optional[tuple[str, Optional[str]]] = None,  # (start_slide, end_slide)
    # OR
    time_range: Optional[tuple[float, Optional[float]]] = None,  # (start_time, end_time)
) -> RenderPlan:

New field on RenderPlan:

@dataclass
class RenderPlan:
    ...
    time_offset: float = 0.0  # Offset to subtract from all timestamps
    initial_camera_state: CameraState = field(default_factory=CameraState)  # State at render start
    input_seek_time: float = 0.0  # Seek position for input videos

Implementation Details

Phase 1: Compute Full State, Filter Events

Modify _extract_camera_events() to accept a time range:

def _extract_camera_events(
    transcript: list[TimedWord],
    time_range: Optional[tuple[float, float]] = None,  # (start, end)
) -> tuple[list[CameraEvent], CameraState]:
    """
    Returns:
        - List of CameraEvents within time_range
        - Initial CameraState at start of time_range
    """
    events: list[CameraEvent] = []
    current_state = CameraState()
    initial_state = CameraState()
    start_time, end_time = time_range or (0.0, float('inf'))

    found_start = False

    for timed_word in transcript:
        if not timed_word.is_marker:
            continue

        marker_id = timed_word.marker_id
        if not marker_id or marker_id not in CAMERA_PRESETS:
            continue

        # Always update current_state (full walk)
        preset = CAMERA_PRESETS[marker_id]
        new_state = _apply_preset(current_state, marker_id, preset)

        # Capture state just before we enter the render range
        if not found_start and timed_word.time >= start_time:
            initial_state = current_state  # State BEFORE this marker
            found_start = True

        # Only emit events within range
        if start_time <= timed_word.time < end_time:
            events.append(CameraEvent(
                time=timed_word.time,
                target_state=new_state,
                duration=0.2,
                easing="ease-out",
            ))

        current_state = new_state

    return events, initial_state

Phase 2: Apply Time Offset

After extracting events, apply offset to all timestamps:

def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
    """Shift all timestamps by offset (subtract offset from all times)."""

    # Adjust slide events
    for event in plan.slide_events:
        event.start_time -= offset
        event.end_time -= offset

    # Adjust video events
    for event in plan.video_events:
        event.start_time -= offset
        event.end_time -= offset

    # Adjust audio events
    for event in plan.audio_events:
        event.start_time = max(0, event.start_time - offset)

    # Adjust camera events
    for event in plan.camera_events:
        event.time -= offset

    # Adjust total duration
    plan.total_duration -= offset
    plan.time_offset = offset
    plan.input_seek_time = offset

    return plan

Phase 3: FFmpeg Seeking

Modify build_ffmpeg_command() to add seeking:

def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
    cmd = ["ffmpeg", "-y"]

    # Add seek for always-visible videos
    for video_id, video_source, cutout in plan.narration_videos:
        video_path = _resolve_video_path(videos_dir, video_source)
        if plan.input_seek_time > 0:
            cmd.extend(["-ss", str(plan.input_seek_time)])  # Seek BEFORE -i
        cmd.extend(["-i", str(video_path)])
        ...

Phase 4: Initial Camera State Handling

If initial_camera_state is not default, inject a "virtual" camera event at t=0:

def build_camera_transform(
    camera_events: list[CameraEvent],
    initial_state: CameraState,  # NEW PARAMETER
    ...
) -> str:
    # If initial state differs from default, prepend a virtual event
    if not initial_state.is_default():
        initial_event = CameraEvent(
            time=0.0,
            target_state=initial_state,
            duration=0.0,  # Instant - no transition
            easing="linear",
        )
        camera_events = [initial_event] + camera_events
    ...

FFmpeg Optimization

Only emit filters for events within range.

When rendering a partial range, the RenderPlan should only contain events within that range. This means:

  • Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
  • Fewer overlay filters in filter_complex
  • Fewer between(t, start, end) enable expressions to evaluate per frame

Example: Full video has 50 slides, rendering S40:S50 only:

  • Before: 50 slide inputs, 50 overlay filters
  • After: 10 slide inputs, 10 overlay filters

This is achieved naturally by filtering events in build_render_plan() before constructing the plan - the renderer already only processes events present in the plan.


Edge Cases (v1 Simplified)

1. Camera state from before range

If rendering S5:S10 but there's a camera event at the S4 marker:

  • Camera state from S4 must be captured as initial_camera_state
  • Rendered output starts with that state already applied at t=0

2. Events filter by marker position

All events (slides, videos, audio) are filtered by whether their START marker falls within the range.

  • Events beginning outside range are excluded
  • No "carry over" or boundary-crossing logic needed

Testing Strategy

Unit Tests

  1. Camera state computation maintains state across full transcript
  2. Time offset correctly shifts all event types
  3. Initial camera state correctly captured at boundary

Integration Tests

  1. Render slides 1-5, then 5-10, concatenate, compare to full render
  2. Camera state continuity across segment boundaries
  3. Audio alignment after seeking

Manual Verification

  1. Visual inspection of camera state at segment boundaries
  2. Audio sync verification

Future Enhancements

Parallel Rendering Pipeline

# Render in parallel, then stitch
gnommo render proj.json seg1.mp4 --slides S1:S10 &
gnommo render proj.json seg2.mp4 --slides S10:S20 &
gnommo render proj.json seg3.mp4 --slides S20: &
wait
ffmpeg -f concat -i segments.txt -c copy final.mp4

Smart Re-rendering

Track which slides changed and only re-render affected segments.

Preview Mode

Quick low-quality render of specific section for review.