Adding changes version 1
This commit is contained in:
@@ -0,0 +1,317 @@
|
|||||||
|
# Partial Rendering Specification
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Enable rendering of specific sections of a video (e.g., slides 1-10, then 10-20) instead of the full video. This is useful for:
|
||||||
|
- Faster iteration during development
|
||||||
|
- Re-rendering specific sections after fixes
|
||||||
|
- Parallel rendering of segments that can be concatenated later
|
||||||
|
|
||||||
|
## Scope (v1)
|
||||||
|
|
||||||
|
**In scope:**
|
||||||
|
- Camera state tracking (cumulative state must be computed from t=0)
|
||||||
|
- Time offset adjustment for all events
|
||||||
|
- Slide range filtering
|
||||||
|
- Input video seeking
|
||||||
|
|
||||||
|
**Out of scope (v1):**
|
||||||
|
- Audio events crossing range boundaries
|
||||||
|
- Triggered video duration edge cases
|
||||||
|
- Events are assumed to begin at their marker timestamp and never "carry over"
|
||||||
|
|
||||||
|
## Current Architecture Analysis
|
||||||
|
|
||||||
|
### 1. Camera State Management
|
||||||
|
|
||||||
|
**Current behavior** (`transformer.py:250-332`):
|
||||||
|
- Camera state is **cumulative** across the transcript
|
||||||
|
- `_extract_camera_events()` walks through ALL markers sequentially
|
||||||
|
- Each marker type (Zoom/Tilt/Pan) only modifies its property while preserving others
|
||||||
|
- Example: `[Zoom2]` then `[TiltLeft]` = both zoom AND tilt active
|
||||||
|
|
||||||
|
**Problem for partial rendering**:
|
||||||
|
If we start rendering at slide 10, we need the camera state AS IT WOULD BE after processing slides 1-9.
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
Separate "state computation" from "event generation":
|
||||||
|
1. Always walk through ALL transcript markers to compute cumulative state
|
||||||
|
2. Track the "initial state" at the start of the render range
|
||||||
|
3. Only emit CameraEvents for markers WITHIN the render range
|
||||||
|
4. First event in partial render must transition FROM the computed initial state
|
||||||
|
|
||||||
|
### 2. Time Signature Adjustment
|
||||||
|
|
||||||
|
**Current behavior**:
|
||||||
|
All timing uses absolute timestamps from `transcript.csv`:
|
||||||
|
- `SlideEvent.start_time/end_time`
|
||||||
|
- `VideoEvent.start_time/end_time`
|
||||||
|
- `AudioEvent.start_time`
|
||||||
|
- `CameraEvent.time`
|
||||||
|
- FFmpeg expressions: `enable=between(t, start, end)`
|
||||||
|
- Camera animation: `if(between(t, 1.000, 1.200), ...)`
|
||||||
|
|
||||||
|
**Problem for partial rendering**:
|
||||||
|
If slide 10 starts at t=10.0s and we render from there, FFmpeg expects t=0 at the start of output.
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
Apply a `time_offset` to all events after extraction:
|
||||||
|
```
|
||||||
|
new_time = original_time - time_offset
|
||||||
|
```
|
||||||
|
Where `time_offset` = start time of first slide/event in range.
|
||||||
|
|
||||||
|
### 3. Input Video Seeking
|
||||||
|
|
||||||
|
**Current behavior**:
|
||||||
|
- Always-visible videos (talking head) start from the beginning
|
||||||
|
- FFmpeg processes entire input duration
|
||||||
|
|
||||||
|
**Problem for partial rendering**:
|
||||||
|
Need to seek into source videos to the correct position.
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
Add `-ss <seek_time>` before input files for always-visible videos:
|
||||||
|
```
|
||||||
|
ffmpeg -ss 10.0 -i talking_head.mov ...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proposed API
|
||||||
|
|
||||||
|
### Command Line Interface
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Render full video (current behavior)
|
||||||
|
gnommo render example/project.json output.mp4
|
||||||
|
|
||||||
|
# Render specific slide range
|
||||||
|
gnommo render example/project.json output.mp4 --slides S1:S10
|
||||||
|
gnommo render example/project.json output.mp4 --slides S10:S20
|
||||||
|
gnommo render example/project.json output.mp4 --slides S5: # S5 to end
|
||||||
|
|
||||||
|
# Render specific time range (alternative)
|
||||||
|
gnommo render example/project.json output.mp4 --time 0:60
|
||||||
|
gnommo render example/project.json output.mp4 --time 60:120
|
||||||
|
```
|
||||||
|
|
||||||
|
### Internal API
|
||||||
|
|
||||||
|
New parameters for `build_render_plan()`:
|
||||||
|
```python
|
||||||
|
def build_render_plan(
|
||||||
|
...
|
||||||
|
slide_range: Optional[tuple[str, Optional[str]]] = None, # (start_slide, end_slide)
|
||||||
|
# OR
|
||||||
|
time_range: Optional[tuple[float, Optional[float]]] = None, # (start_time, end_time)
|
||||||
|
) -> RenderPlan:
|
||||||
|
```
|
||||||
|
|
||||||
|
New field on `RenderPlan`:
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class RenderPlan:
|
||||||
|
...
|
||||||
|
time_offset: float = 0.0 # Offset to subtract from all timestamps
|
||||||
|
initial_camera_state: CameraState = field(default_factory=CameraState) # State at render start
|
||||||
|
input_seek_time: float = 0.0 # Seek position for input videos
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
### Phase 1: Compute Full State, Filter Events
|
||||||
|
|
||||||
|
Modify `_extract_camera_events()` to accept a time range:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _extract_camera_events(
|
||||||
|
transcript: list[TimedWord],
|
||||||
|
time_range: Optional[tuple[float, float]] = None, # (start, end)
|
||||||
|
) -> tuple[list[CameraEvent], CameraState]:
|
||||||
|
"""
|
||||||
|
Returns:
|
||||||
|
- List of CameraEvents within time_range
|
||||||
|
- Initial CameraState at start of time_range
|
||||||
|
"""
|
||||||
|
events: list[CameraEvent] = []
|
||||||
|
current_state = CameraState()
|
||||||
|
initial_state = CameraState()
|
||||||
|
start_time, end_time = time_range or (0.0, float('inf'))
|
||||||
|
|
||||||
|
found_start = False
|
||||||
|
|
||||||
|
for timed_word in transcript:
|
||||||
|
if not timed_word.is_marker:
|
||||||
|
continue
|
||||||
|
|
||||||
|
marker_id = timed_word.marker_id
|
||||||
|
if not marker_id or marker_id not in CAMERA_PRESETS:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Always update current_state (full walk)
|
||||||
|
preset = CAMERA_PRESETS[marker_id]
|
||||||
|
new_state = _apply_preset(current_state, marker_id, preset)
|
||||||
|
|
||||||
|
# Capture state just before we enter the render range
|
||||||
|
if not found_start and timed_word.time >= start_time:
|
||||||
|
initial_state = current_state # State BEFORE this marker
|
||||||
|
found_start = True
|
||||||
|
|
||||||
|
# Only emit events within range
|
||||||
|
if start_time <= timed_word.time < end_time:
|
||||||
|
events.append(CameraEvent(
|
||||||
|
time=timed_word.time,
|
||||||
|
target_state=new_state,
|
||||||
|
duration=0.2,
|
||||||
|
easing="ease-out",
|
||||||
|
))
|
||||||
|
|
||||||
|
current_state = new_state
|
||||||
|
|
||||||
|
return events, initial_state
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Apply Time Offset
|
||||||
|
|
||||||
|
After extracting events, apply offset to all timestamps:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _apply_time_offset(plan: RenderPlan, offset: float) -> RenderPlan:
|
||||||
|
"""Shift all timestamps by offset (subtract offset from all times)."""
|
||||||
|
|
||||||
|
# Adjust slide events
|
||||||
|
for event in plan.slide_events:
|
||||||
|
event.start_time -= offset
|
||||||
|
event.end_time -= offset
|
||||||
|
|
||||||
|
# Adjust video events
|
||||||
|
for event in plan.video_events:
|
||||||
|
event.start_time -= offset
|
||||||
|
event.end_time -= offset
|
||||||
|
|
||||||
|
# Adjust audio events
|
||||||
|
for event in plan.audio_events:
|
||||||
|
event.start_time = max(0, event.start_time - offset)
|
||||||
|
|
||||||
|
# Adjust camera events
|
||||||
|
for event in plan.camera_events:
|
||||||
|
event.time -= offset
|
||||||
|
|
||||||
|
# Adjust total duration
|
||||||
|
plan.total_duration -= offset
|
||||||
|
plan.time_offset = offset
|
||||||
|
plan.input_seek_time = offset
|
||||||
|
|
||||||
|
return plan
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: FFmpeg Seeking
|
||||||
|
|
||||||
|
Modify `build_ffmpeg_command()` to add seeking:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def build_ffmpeg_command(plan: RenderPlan, output_path: Path) -> list[str]:
|
||||||
|
cmd = ["ffmpeg", "-y"]
|
||||||
|
|
||||||
|
# Add seek for always-visible videos
|
||||||
|
for video_id, video_source, cutout in plan.narration_videos:
|
||||||
|
video_path = _resolve_video_path(videos_dir, video_source)
|
||||||
|
if plan.input_seek_time > 0:
|
||||||
|
cmd.extend(["-ss", str(plan.input_seek_time)]) # Seek BEFORE -i
|
||||||
|
cmd.extend(["-i", str(video_path)])
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: Initial Camera State Handling
|
||||||
|
|
||||||
|
If `initial_camera_state` is not default, inject a "virtual" camera event at t=0:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def build_camera_transform(
|
||||||
|
camera_events: list[CameraEvent],
|
||||||
|
initial_state: CameraState, # NEW PARAMETER
|
||||||
|
...
|
||||||
|
) -> str:
|
||||||
|
# If initial state differs from default, prepend a virtual event
|
||||||
|
if not initial_state.is_default():
|
||||||
|
initial_event = CameraEvent(
|
||||||
|
time=0.0,
|
||||||
|
target_state=initial_state,
|
||||||
|
duration=0.0, # Instant - no transition
|
||||||
|
easing="linear",
|
||||||
|
)
|
||||||
|
camera_events = [initial_event] + camera_events
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## FFmpeg Optimization
|
||||||
|
|
||||||
|
**Only emit filters for events within range.**
|
||||||
|
|
||||||
|
When rendering a partial range, the `RenderPlan` should only contain events within that range. This means:
|
||||||
|
- Fewer inputs added to the FFmpeg command (only slides/videos/audio actually used)
|
||||||
|
- Fewer overlay filters in filter_complex
|
||||||
|
- Fewer `between(t, start, end)` enable expressions to evaluate per frame
|
||||||
|
|
||||||
|
Example: Full video has 50 slides, rendering S40:S50 only:
|
||||||
|
- **Before**: 50 slide inputs, 50 overlay filters
|
||||||
|
- **After**: 10 slide inputs, 10 overlay filters
|
||||||
|
|
||||||
|
This is achieved naturally by filtering events in `build_render_plan()` before constructing the plan - the renderer already only processes events present in the plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Edge Cases (v1 Simplified)
|
||||||
|
|
||||||
|
### 1. Camera state from before range
|
||||||
|
If rendering S5:S10 but there's a camera event at the S4 marker:
|
||||||
|
- Camera state from S4 must be captured as `initial_camera_state`
|
||||||
|
- Rendered output starts with that state already applied at t=0
|
||||||
|
|
||||||
|
### 2. Events filter by marker position
|
||||||
|
All events (slides, videos, audio) are filtered by whether their START marker falls within the range.
|
||||||
|
- Events beginning outside range are excluded
|
||||||
|
- No "carry over" or boundary-crossing logic needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
1. Camera state computation maintains state across full transcript
|
||||||
|
2. Time offset correctly shifts all event types
|
||||||
|
3. Initial camera state correctly captured at boundary
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
1. Render slides 1-5, then 5-10, concatenate, compare to full render
|
||||||
|
2. Camera state continuity across segment boundaries
|
||||||
|
3. Audio alignment after seeking
|
||||||
|
|
||||||
|
### Manual Verification
|
||||||
|
1. Visual inspection of camera state at segment boundaries
|
||||||
|
2. Audio sync verification
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
### Parallel Rendering Pipeline
|
||||||
|
```bash
|
||||||
|
# Render in parallel, then concatenate
|
||||||
|
gnommo render proj.json seg1.mp4 --slides S1:S10 &
|
||||||
|
gnommo render proj.json seg2.mp4 --slides S10:S20 &
|
||||||
|
gnommo render proj.json seg3.mp4 --slides S20: &
|
||||||
|
wait
|
||||||
|
ffmpeg -f concat -i segments.txt -c copy final.mp4
|
||||||
|
```
|
||||||
|
|
||||||
|
### Smart Re-rendering
|
||||||
|
Track which slides changed and only re-render affected segments.
|
||||||
|
|
||||||
|
### Preview Mode
|
||||||
|
Quick low-quality render of specific section for review.
|
||||||
@@ -0,0 +1,265 @@
|
|||||||
|
# Virtual Camera Effects
|
||||||
|
|
||||||
|
Ideas for "stuff happening" to keep viewers engaged in edutainment videos.
|
||||||
|
These effects are triggered by markers in the manuscript, just like slides.
|
||||||
|
|
||||||
|
## Zoom Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[Zoom1]` | Zoom to 110% - subtle emphasis |
|
||||||
|
| `[Zoom2]` | Zoom to 125% - moderate emphasis |
|
||||||
|
| `[Zoom3]` | Zoom to 150% - strong emphasis |
|
||||||
|
| `[Zoom0]` | Return to 100% (default) |
|
||||||
|
| `[ZoomPunch]` | Quick zoom in + out (single beat emphasis) |
|
||||||
|
|
||||||
|
**Use case:** Rapid `[Zoom1][Zoom2][Zoom3]` for comedic/dramatic triple emphasis.
|
||||||
|
|
||||||
|
## Tilt/Rotation Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[TiltLeft]` | Rotate -15 degrees |
|
||||||
|
| `[TiltRight]` | Rotate +15 degrees |
|
||||||
|
| `[NoTilt]` | Return to 0 degrees |
|
||||||
|
| `[TiltShake]` | Quick left-right shake (confusion/emphasis) |
|
||||||
|
|
||||||
|
**Use case:** Tilt when saying something "off" or wrong, return to flat for correction.
|
||||||
|
|
||||||
|
## Pan/Position Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[PanLeft]` | Shift frame left (subject moves right) |
|
||||||
|
| `[PanRight]` | Shift frame right (subject moves left) |
|
||||||
|
| `[PanUp]` | Shift frame up |
|
||||||
|
| `[PanDown]` | Shift frame down |
|
||||||
|
| `[PanCenter]` | Return to center |
|
||||||
|
|
||||||
|
**Use case:** Pan to make room for a slide appearing on one side.
|
||||||
|
|
||||||
|
## Shake/Movement Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[Shake]` | Brief screen shake (impact, surprise) |
|
||||||
|
| `[ShakeHard]` | Intense shake (explosion, error) |
|
||||||
|
| `[Wobble]` | Gentle continuous wobble |
|
||||||
|
| `[NoWobble]` | Stop wobble |
|
||||||
|
|
||||||
|
**Use case:** Shake on "WRONG!" or when something crashes/fails.
|
||||||
|
|
||||||
|
## Speed/Rhythm Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[Beat]` | Single visual pulse (scale bump) |
|
||||||
|
| `[BeatStart]` | Start pulsing to rhythm |
|
||||||
|
| `[BeatStop]` | Stop pulsing |
|
||||||
|
|
||||||
|
**Use case:** Rhythmic emphasis during lists or key points.
|
||||||
|
|
||||||
|
## Transition Effects
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[Flash]` | Quick white flash |
|
||||||
|
| `[Blackout]` | Brief black frame |
|
||||||
|
| `[Glitch]` | Digital glitch effect |
|
||||||
|
|
||||||
|
**Use case:** Transition between topics or for "record scratch" moments.
|
||||||
|
|
||||||
|
## Picture-in-Picture Variations
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[PipGrow]` | Enlarge talking head cutout |
|
||||||
|
| `[PipShrink]` | Shrink talking head cutout |
|
||||||
|
| `[PipHide]` | Temporarily hide talking head |
|
||||||
|
| `[PipShow]` | Restore talking head |
|
||||||
|
| `[PipMove:corner]` | Move pip to different corner |
|
||||||
|
|
||||||
|
**Use case:** Shrink self when showing important diagram, grow when making personal point.
|
||||||
|
|
||||||
|
## Combination Presets
|
||||||
|
|
||||||
|
| Marker | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `[Emphasis]` | Zoom2 + slight tilt (general emphasis) |
|
||||||
|
| `[Surprise]` | Quick zoom + shake |
|
||||||
|
| `[Sarcasm]` | Slow zoom + tilt |
|
||||||
|
| `[Reset]` | Return all effects to default |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture: The Camera Abstraction
|
||||||
|
|
||||||
|
### The Core Insight
|
||||||
|
|
||||||
|
All visual elements (slides, cutouts, talking head, background) exist in a **scene**.
|
||||||
|
The **camera** views the scene. When the camera zooms, tilts, or pans - everything
|
||||||
|
moves together, just like a real camera filming a physical set.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ SCENE │
|
||||||
|
│ ┌─────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Background Layer │ │
|
||||||
|
│ │ ┌─────────────┐ │ │
|
||||||
|
│ │ │ Talking Head│ ┌──────────────────┐ │ │
|
||||||
|
│ │ │ (cutout) │ │ Slide │ │ │
|
||||||
|
│ │ └─────────────┘ │ (from .png) │ │ │
|
||||||
|
│ │ └──────────────────┘ │ │
|
||||||
|
│ └─────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ CAMERA │
|
||||||
|
│ zoom: 1.25 │
|
||||||
|
│ tilt: -15° │
|
||||||
|
│ pan: 0, 0 │
|
||||||
|
└─────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────┐
|
||||||
|
│ Final Output │
|
||||||
|
│ (1920x1080) │
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why This Matters
|
||||||
|
|
||||||
|
**Keynote slides are designed for a specific frame.** If you create a slide with
|
||||||
|
an arrow pointing at where the talking head cutout will be, that spatial
|
||||||
|
relationship must be preserved when the camera zooms or tilts.
|
||||||
|
|
||||||
|
If we zoomed only the background and not the slides, the arrow would point to
|
||||||
|
the wrong place. The camera abstraction ensures everything transforms together.
|
||||||
|
|
||||||
|
### Camera Properties
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class CameraState:
|
||||||
|
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%
|
||||||
|
rotation: float = 0.0 # degrees, positive = clockwise
|
||||||
|
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame
|
||||||
|
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CameraKeyframe:
|
||||||
|
time: float # timestamp in seconds
|
||||||
|
state: CameraState
|
||||||
|
easing: str = "linear" # linear, ease-in, ease-out, ease-in-out
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rendering Pipeline (Updated)
|
||||||
|
|
||||||
|
```
|
||||||
|
Current Pipeline:
|
||||||
|
Parse → Validate → Transform → Render
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
build_filter_complex()
|
||||||
|
│
|
||||||
|
[bg] → overlays → [vout]
|
||||||
|
|
||||||
|
New Pipeline:
|
||||||
|
Parse → Validate → Transform → Render
|
||||||
|
│
|
||||||
|
Extract camera
|
||||||
|
keyframes from
|
||||||
|
markers
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
build_filter_complex()
|
||||||
|
│
|
||||||
|
[bg] → overlays → [scene]
|
||||||
|
│
|
||||||
|
apply_camera_transform()
|
||||||
|
│
|
||||||
|
[scene] → zoom/rotate/pan → [vout]
|
||||||
|
```
|
||||||
|
|
||||||
|
### FFmpeg Implementation
|
||||||
|
|
||||||
|
The camera transform is a **final filter stage** applied to the composed scene:
|
||||||
|
|
||||||
|
```
|
||||||
|
# Compose scene (existing code)
|
||||||
|
[0:v]scale=1920:1080[bg];
|
||||||
|
[bg][slide1]overlay=...[s1];
|
||||||
|
[s1][talkinghead]overlay=...[scene];
|
||||||
|
|
||||||
|
# Camera transform (new)
|
||||||
|
[scene]scale=iw*{zoom}:ih*{zoom},
|
||||||
|
rotate={rotation}*PI/180:fillcolor=black,
|
||||||
|
crop=1920:1080:(iw-1920)/2:(ih-1080)/2[vout]
|
||||||
|
```
|
||||||
|
|
||||||
|
For smooth animated zoom (using expressions):
|
||||||
|
```
|
||||||
|
[scene]zoompan=z='if(between(t,5,8), 1+0.25*(t-5)/3, 1)':
|
||||||
|
x='iw/2-(iw/zoom/2)':
|
||||||
|
y='ih/2-(ih/zoom/2)':
|
||||||
|
d=1:s=1920x1080:fps=30[vout]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Camera Events in Timeline
|
||||||
|
|
||||||
|
New model for camera changes:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class CameraEvent:
|
||||||
|
time: float
|
||||||
|
target_state: CameraState
|
||||||
|
duration: float = 0.0 # 0 = instant snap
|
||||||
|
easing: str = "ease-out"
|
||||||
|
```
|
||||||
|
|
||||||
|
Markers map to camera events:
|
||||||
|
- `[Zoom2]` → `CameraEvent(time=t, target_state=CameraState(zoom=1.25), duration=0.2)`
|
||||||
|
- `[TiltLeft]` → `CameraEvent(time=t, target_state=CameraState(rotation=-15), duration=0.3)`
|
||||||
|
- `[Reset]` → `CameraEvent(time=t, target_state=CameraState(), duration=0.2)`
|
||||||
|
|
||||||
|
### Considerations
|
||||||
|
|
||||||
|
1. **Overscan**: When zoomed in, we're cropping. The scene must be rendered
|
||||||
|
larger than output (e.g., 2x) to have room for zoom without quality loss.
|
||||||
|
|
||||||
|
2. **Rotation center**: Rotate around frame center, not corner.
|
||||||
|
|
||||||
|
3. **State accumulation**: `[Zoom2]` then `[TiltLeft]` means zoom AND tilt
|
||||||
|
are both active. `[Reset]` clears all.
|
||||||
|
|
||||||
|
4. **Interaction with cutouts**: Cutout positions are in scene-space, so they
|
||||||
|
transform naturally with the camera. No special handling needed.
|
||||||
|
|
||||||
|
5. **Slides stay synced**: Keynote exports are positioned for the base frame.
|
||||||
|
Camera zoom/tilt transforms them identically to everything else.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Camera Data Model ✓
|
||||||
|
- [x] Add `CameraState` and `CameraEvent` to models.py
|
||||||
|
- [x] Add camera effect markers to transformer.py
|
||||||
|
- [x] Generate camera keyframes from markers
|
||||||
|
|
||||||
|
### Phase 2: Render Pipeline ✓
|
||||||
|
- [x] Modify renderer to compose to `[scene]` instead of `[vout]`
|
||||||
|
- [x] Add camera transform stage after composition
|
||||||
|
- [ ] Handle overscan (render larger, crop to output) - deferred, upsampling OK for now
|
||||||
|
|
||||||
|
### Phase 3: Smooth Animation (partial)
|
||||||
|
- [x] Support animated transitions between keyframes (linear interpolation)
|
||||||
|
- [ ] Implement easing functions as FFmpeg expressions (ease-in, ease-out)
|
||||||
|
- [ ] Test with rapid zoom sequences
|
||||||
|
|
||||||
|
### Phase 4: Effect Presets ✓
|
||||||
|
- [x] Define presets (Zoom0/1/2/3, TiltLeft/Right/NoTilt, Pan*, Reset)
|
||||||
|
- [x] Presets defined in `CAMERA_PRESETS` dict in models.py
|
||||||
|
- [ ] Support custom parameterized markers `[Zoom:1.35]` - future enhancement
|
||||||
@@ -0,0 +1,10 @@
|
|||||||
|
[
|
||||||
|
{
|
||||||
|
"reference": "Gnommo Documentation - https://github.com/example/gnommo",
|
||||||
|
"context": ""
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"reference": "FFmpeg Documentation - https://ffmpeg.org/documentation.html",
|
||||||
|
"context": ""
|
||||||
|
}
|
||||||
|
]
|
||||||
+17
-3
@@ -1,5 +1,19 @@
|
|||||||
Welcome to GnommoEditor, a code-first video editing system. [S1]
|
[S1]
|
||||||
|
This is the first slide. It appears immediately. [cite:Gnommo Documentation - https://github.com/example/gnommo]
|
||||||
|
|
||||||
In this example, we demonstrate how slides appear at specific timestamps based on markers in the transcript. [S2]
|
[S2]
|
||||||
|
However, this is the second slide. It should appear 1 second prior to when I say "however"
|
||||||
|
|
||||||
And that's the end of our demo.
|
[S3]
|
||||||
|
[video:Zoomin_MontageZoom]
|
||||||
|
This is me talking alongside a video. The video is constrained within the red square. Notice how the video stops immediately when we make the transition to the next slide. [cite:FFmpeg Documentation - https://ffmpeg.org/documentation.html]
|
||||||
|
|
||||||
|
[S4]
|
||||||
|
I will continue to talk without pause, but in the finished recording - there will be a pause before the narration continues. Now a video will play that pauses the narration
|
||||||
|
|
||||||
|
[S5]
|
||||||
|
[video:gnommologo]
|
||||||
|
|
||||||
|
Notice how my voice continues after the video finished.
|
||||||
|
|
||||||
|
[S6]
|
||||||
|
|||||||
@@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"S1": {
|
||||||
|
"image": "example.001.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
},
|
||||||
|
"S2": {
|
||||||
|
"image": "example.002.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
},
|
||||||
|
"S3": {
|
||||||
|
"image": "example.003.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
},
|
||||||
|
"S4": {
|
||||||
|
"image": "example.004.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
},
|
||||||
|
"S5": {
|
||||||
|
"image": "example.005.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
},
|
||||||
|
"S6": {
|
||||||
|
"image": "example.006.png",
|
||||||
|
"type": "fullscreen"
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,2 @@
|
|||||||
|
file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/talking_head_batch0.mov'
|
||||||
|
file '/Users/jenstandstad/Projects/gnommo/example/media/videos/intermediate/segments/segment_0002.mov'
|
||||||
@@ -0,0 +1,497 @@
|
|||||||
|
[
|
||||||
|
{
|
||||||
|
"word": "This",
|
||||||
|
"start": 10.72,
|
||||||
|
"end": 11.4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "is",
|
||||||
|
"start": 11.4,
|
||||||
|
"end": 11.6
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 11.6,
|
||||||
|
"end": 11.78
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "first",
|
||||||
|
"start": 11.78,
|
||||||
|
"end": 11.98
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "slide.",
|
||||||
|
"start": 11.98,
|
||||||
|
"end": 12.44
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "It",
|
||||||
|
"start": 13.02,
|
||||||
|
"end": 13.3
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "appears",
|
||||||
|
"start": 13.3,
|
||||||
|
"end": 13.66
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "immediately.",
|
||||||
|
"start": 13.66,
|
||||||
|
"end": 14.3
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "However,",
|
||||||
|
"start": 15.34,
|
||||||
|
"end": 16.02
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "this",
|
||||||
|
"start": 16.34,
|
||||||
|
"end": 16.46
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "is",
|
||||||
|
"start": 16.46,
|
||||||
|
"end": 16.58
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 16.58,
|
||||||
|
"end": 16.76
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "second",
|
||||||
|
"start": 16.76,
|
||||||
|
"end": 17.04
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "slide.",
|
||||||
|
"start": 17.04,
|
||||||
|
"end": 17.4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "It",
|
||||||
|
"start": 17.74,
|
||||||
|
"end": 17.96
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "should",
|
||||||
|
"start": 17.96,
|
||||||
|
"end": 18.2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "appear",
|
||||||
|
"start": 18.2,
|
||||||
|
"end": 18.54
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "one",
|
||||||
|
"start": 18.54,
|
||||||
|
"end": 18.98
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "second",
|
||||||
|
"start": 18.98,
|
||||||
|
"end": 19.46
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "prior",
|
||||||
|
"start": 19.46,
|
||||||
|
"end": 19.88
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "to",
|
||||||
|
"start": 19.88,
|
||||||
|
"end": 20.1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 20.1,
|
||||||
|
"end": 20.22
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "word",
|
||||||
|
"start": 20.22,
|
||||||
|
"end": 20.52
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "to",
|
||||||
|
"start": 20.52,
|
||||||
|
"end": 21.14
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "say",
|
||||||
|
"start": 21.14,
|
||||||
|
"end": 21.42
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "whoever",
|
||||||
|
"start": 21.42,
|
||||||
|
"end": 21.8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 21.8,
|
||||||
|
"end": 22.16
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "first",
|
||||||
|
"start": 22.16,
|
||||||
|
"end": 22.4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "time.",
|
||||||
|
"start": 22.4,
|
||||||
|
"end": 22.68
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "This",
|
||||||
|
"start": 24.28,
|
||||||
|
"end": 24.96
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "is",
|
||||||
|
"start": 24.96,
|
||||||
|
"end": 25.12
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "me",
|
||||||
|
"start": 25.12,
|
||||||
|
"end": 25.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "taking,",
|
||||||
|
"start": 25.36,
|
||||||
|
"end": 25.74
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "talking",
|
||||||
|
"start": 26.12,
|
||||||
|
"end": 27.12
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "alongside",
|
||||||
|
"start": 27.12,
|
||||||
|
"end": 27.64
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "a",
|
||||||
|
"start": 27.64,
|
||||||
|
"end": 27.88
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "video.",
|
||||||
|
"start": 27.88,
|
||||||
|
"end": 28.16
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "The",
|
||||||
|
"start": 28.16,
|
||||||
|
"end": 28.92
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "video",
|
||||||
|
"start": 28.92,
|
||||||
|
"end": 29.18
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "is",
|
||||||
|
"start": 29.18,
|
||||||
|
"end": 29.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "constrained",
|
||||||
|
"start": 29.36,
|
||||||
|
"end": 29.76
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "within",
|
||||||
|
"start": 29.76,
|
||||||
|
"end": 30.14
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 30.14,
|
||||||
|
"end": 30.32
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "red",
|
||||||
|
"start": 30.32,
|
||||||
|
"end": 30.48
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "square.",
|
||||||
|
"start": 30.48,
|
||||||
|
"end": 30.9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "Notice",
|
||||||
|
"start": 31.26,
|
||||||
|
"end": 31.44
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "how",
|
||||||
|
"start": 31.44,
|
||||||
|
"end": 31.74
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 31.74,
|
||||||
|
"end": 31.92
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "video",
|
||||||
|
"start": 31.92,
|
||||||
|
"end": 32.14
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "stops",
|
||||||
|
"start": 32.14,
|
||||||
|
"end": 32.44
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "immediately",
|
||||||
|
"start": 32.44,
|
||||||
|
"end": 32.94
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "when",
|
||||||
|
"start": 32.94,
|
||||||
|
"end": 33.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "we",
|
||||||
|
"start": 33.36,
|
||||||
|
"end": 33.54
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "make",
|
||||||
|
"start": 33.54,
|
||||||
|
"end": 33.74
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 33.74,
|
||||||
|
"end": 33.94
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "transition",
|
||||||
|
"start": 33.94,
|
||||||
|
"end": 34.38
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "to",
|
||||||
|
"start": 34.38,
|
||||||
|
"end": 34.68
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 34.68,
|
||||||
|
"end": 34.8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "next",
|
||||||
|
"start": 34.8,
|
||||||
|
"end": 35.02
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "slide.",
|
||||||
|
"start": 35.02,
|
||||||
|
"end": 35.48
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "I",
|
||||||
|
"start": 37.18,
|
||||||
|
"end": 37.72
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "will",
|
||||||
|
"start": 37.72,
|
||||||
|
"end": 37.78
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "continue",
|
||||||
|
"start": 37.78,
|
||||||
|
"end": 38.08
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "to",
|
||||||
|
"start": 38.08,
|
||||||
|
"end": 38.32
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "talk",
|
||||||
|
"start": 38.32,
|
||||||
|
"end": 38.56
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "without",
|
||||||
|
"start": 38.56,
|
||||||
|
"end": 38.88
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "pause,",
|
||||||
|
"start": 38.88,
|
||||||
|
"end": 39.24
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "but",
|
||||||
|
"start": 39.46,
|
||||||
|
"end": 39.56
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "in",
|
||||||
|
"start": 39.56,
|
||||||
|
"end": 39.68
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 39.68,
|
||||||
|
"end": 39.74
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "finished",
|
||||||
|
"start": 39.74,
|
||||||
|
"end": 39.98
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "recording",
|
||||||
|
"start": 39.98,
|
||||||
|
"end": 40.46
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "there",
|
||||||
|
"start": 40.46,
|
||||||
|
"end": 41.18
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "will",
|
||||||
|
"start": 41.18,
|
||||||
|
"end": 41.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "be",
|
||||||
|
"start": 41.36,
|
||||||
|
"end": 41.54
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "a",
|
||||||
|
"start": 41.54,
|
||||||
|
"end": 41.64
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "pause",
|
||||||
|
"start": 41.64,
|
||||||
|
"end": 41.92
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "before",
|
||||||
|
"start": 41.92,
|
||||||
|
"end": 42.28
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 42.28,
|
||||||
|
"end": 42.5
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "narration",
|
||||||
|
"start": 42.5,
|
||||||
|
"end": 43.0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "continues.",
|
||||||
|
"start": 43.0,
|
||||||
|
"end": 43.64
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "Now",
|
||||||
|
"start": 44.38,
|
||||||
|
"end": 44.52
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "a",
|
||||||
|
"start": 44.52,
|
||||||
|
"end": 44.68
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "video",
|
||||||
|
"start": 44.68,
|
||||||
|
"end": 44.9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "will",
|
||||||
|
"start": 44.9,
|
||||||
|
"end": 45.08
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "play",
|
||||||
|
"start": 45.08,
|
||||||
|
"end": 45.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "that",
|
||||||
|
"start": 45.36,
|
||||||
|
"end": 45.76
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "pauses",
|
||||||
|
"start": 45.76,
|
||||||
|
"end": 46.52
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 46.52,
|
||||||
|
"end": 46.76
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "narration.",
|
||||||
|
"start": 46.76,
|
||||||
|
"end": 47.2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "Notice",
|
||||||
|
"start": 48.64,
|
||||||
|
"end": 49.18
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "how",
|
||||||
|
"start": 49.18,
|
||||||
|
"end": 49.42
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "my",
|
||||||
|
"start": 49.42,
|
||||||
|
"end": 49.58
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "voice",
|
||||||
|
"start": 49.58,
|
||||||
|
"end": 49.8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "continues",
|
||||||
|
"start": 49.8,
|
||||||
|
"end": 50.36
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "after",
|
||||||
|
"start": 50.36,
|
||||||
|
"end": 50.84
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "the",
|
||||||
|
"start": 50.84,
|
||||||
|
"end": 51.02
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "video",
|
||||||
|
"start": 51.02,
|
||||||
|
"end": 51.24
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"word": "finished.",
|
||||||
|
"start": 51.24,
|
||||||
|
"end": 51.76
|
||||||
|
}
|
||||||
|
]
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
{
|
||||||
|
"talking_head": {
|
||||||
|
"source_file": "talking_head.mov",
|
||||||
|
"output_file": "talking_head_processed.mov",
|
||||||
|
"cutout": "talkinghead",
|
||||||
|
"always_visible": true,
|
||||||
|
"filter": [
|
||||||
|
{
|
||||||
|
"type": "chroma_key",
|
||||||
|
"color": [131, 177, 83],
|
||||||
|
"similarity": 0.04,
|
||||||
|
"blend": 0.025,
|
||||||
|
"spill": 0.05
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "mask",
|
||||||
|
"left": 0.05,
|
||||||
|
"right": 0.10
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"gnommologo": {
|
||||||
|
"source_file": "Logo.mov",
|
||||||
|
"is_shared": true,
|
||||||
|
"cutout": "fullscreen",
|
||||||
|
"pause_narration": 0 ,
|
||||||
|
"take": 10,
|
||||||
|
"skip": 0
|
||||||
|
},
|
||||||
|
"Zoomin_MontageZoom": {
|
||||||
|
"description": "Montage zoom",
|
||||||
|
"source_file": "MontageZoom.mp4",
|
||||||
|
"output_file": "MontageZoom.mp4",
|
||||||
|
"pause_narration":3,
|
||||||
|
"cutout": "square",
|
||||||
|
"is_shared": true,
|
||||||
|
"filter": []
|
||||||
|
}
|
||||||
|
}
|
||||||
+31
-7
@@ -1,11 +1,35 @@
|
|||||||
{
|
{
|
||||||
|
"id": "VideoExample",
|
||||||
|
"name": "Example",
|
||||||
|
"description": "In this video, I demonstrate the Gnommo video editing pipeline - a code-first approach to creating presenter-mode videos from Keynote presentations.",
|
||||||
|
"footer": "Subscribe for more tutorials!\nTwitter: @example",
|
||||||
"resolution": [1920, 1080],
|
"resolution": [1920, 1080],
|
||||||
"fps": 30,
|
"fps": 30,
|
||||||
"talkinghead": {
|
"gnommo_scratch": null,
|
||||||
"x": 50,
|
"defaultSlideType": "fullscreen",
|
||||||
"y": 600,
|
"keynote_file": "media/example.key",
|
||||||
"targetheight": 400
|
"transcript": "media/videos/talking_head.transcript.json",
|
||||||
},
|
"background": "shared_assets/solarpunk.png",
|
||||||
"defaultSlideType": "square",
|
"videos": "media/videos/videos.json",
|
||||||
"background_video": ""
|
"slides": "media/slides/Example/slides.json",
|
||||||
|
"audio": "media/audio/audio.json",
|
||||||
|
"main_video": "talking_head",
|
||||||
|
"cutouts": {
|
||||||
|
"talkinghead": {
|
||||||
|
"x": "-10%",
|
||||||
|
"y": "40%",
|
||||||
|
"height": "60%"
|
||||||
|
},
|
||||||
|
"square": {
|
||||||
|
"x": "45%",
|
||||||
|
"y": "3%",
|
||||||
|
"width": "53%",
|
||||||
|
"height": "94%"
|
||||||
|
},
|
||||||
|
"fullscreen": {
|
||||||
|
"x": "0%",
|
||||||
|
"y": "0%",
|
||||||
|
"height": "100%"
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,10 +0,0 @@
|
|||||||
{
|
|
||||||
"S1": {
|
|
||||||
"image": "S1.png",
|
|
||||||
"type": "square"
|
|
||||||
},
|
|
||||||
"S2": {
|
|
||||||
"image": "S2.png",
|
|
||||||
"type": "square"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,8 +0,0 @@
|
|||||||
t,word
|
|
||||||
0.00,Hello
|
|
||||||
0.30,world
|
|
||||||
0.60,[S1]
|
|
||||||
1.50,Second
|
|
||||||
1.80,slide
|
|
||||||
2.00,[S2]
|
|
||||||
2.50,End
|
|
||||||
|
@@ -1,6 +0,0 @@
|
|||||||
{
|
|
||||||
"talking_head": {
|
|
||||||
"file": "media/talking_head.mp4",
|
|
||||||
"preprocess": []
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,154 +1,21 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
#
|
#
|
||||||
# GnommoEditor - Code-first video editing pipeline
|
# GnommoEditor - Code-first video editing pipeline
|
||||||
|
# This is a thin wrapper that activates the venv and runs the Python CLI.
|
||||||
#
|
#
|
||||||
# Usage:
|
# Usage: gnommo -p <project> [action] [options]
|
||||||
# gnommo.sh -p <project> Render project
|
# Run with -h for full help.
|
||||||
# gnommo.sh -p <project> import Generate slides.json from image files
|
|
||||||
# gnommo.sh -p <project> validate Validate only
|
|
||||||
# gnommo.sh -p <project> preprocess Apply video preprocessing filters
|
|
||||||
# gnommo.sh -p <project> transcribe Transcribe video
|
|
||||||
# gnommo.sh -p <project> align Align markers to transcript
|
|
||||||
# gnommo.sh -p <project> all Full pipeline: transcribe → align → render
|
|
||||||
#
|
#
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
VENV_PYTHON="$SCRIPT_DIR/venv/bin/python"
|
VENV_PYTHON="$SCRIPT_DIR/venv/bin/python"
|
||||||
|
|
||||||
# Check for venv
|
# Check for venv
|
||||||
if [[ ! -f "$VENV_PYTHON" ]]; then
|
if [[ ! -f "$VENV_PYTHON" ]]; then
|
||||||
echo "Error: Virtual environment not found at $SCRIPT_DIR/venv"
|
echo "Error: Virtual environment not found at $SCRIPT_DIR/venv"
|
||||||
echo "Create it with: python -m venv venv && ./venv/bin/pip install openai-whisper"
|
echo "Create it with: python -m venv venv && ./venv/bin/pip install -e . openai-whisper"
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Parse arguments
|
# Pass all arguments directly to the Python CLI
|
||||||
PROJECT=""
|
exec "$VENV_PYTHON" -m gnommo "$@"
|
||||||
COMMAND="render"
|
|
||||||
VERBOSE=""
|
|
||||||
FORCE=""
|
|
||||||
|
|
||||||
usage() {
|
|
||||||
echo "Usage: gnommo.sh -p <project> [command] [options]"
|
|
||||||
echo ""
|
|
||||||
echo "Commands:"
|
|
||||||
echo " render Render video (default)"
|
|
||||||
echo " import Generate slides.json from image files"
|
|
||||||
echo " validate Validate project only"
|
|
||||||
echo " preprocess Apply video preprocessing filters (chroma key, etc.)"
|
|
||||||
echo " transcribe Transcribe video audio"
|
|
||||||
echo " align Align manuscript to transcript"
|
|
||||||
echo " all Full pipeline: transcribe → align → render"
|
|
||||||
echo ""
|
|
||||||
echo "Options:"
|
|
||||||
echo " -p <dir> Project directory (required)"
|
|
||||||
echo " -v Verbose output"
|
|
||||||
echo " -f Force overwrite existing files"
|
|
||||||
echo " -h Show this help"
|
|
||||||
echo ""
|
|
||||||
echo "Examples:"
|
|
||||||
echo " gnommo.sh -p video1 # Render video1 project"
|
|
||||||
echo " gnommo.sh -p video1 import # Generate slides.json"
|
|
||||||
echo " gnommo.sh -p video1 import -f # Force overwrite slides.json"
|
|
||||||
echo " gnommo.sh -p video1 validate # Validate only"
|
|
||||||
echo " gnommo.sh -p video1 all # Full pipeline"
|
|
||||||
exit 0
|
|
||||||
}
|
|
||||||
|
|
||||||
while [[ $# -gt 0 ]]; do
|
|
||||||
case $1 in
|
|
||||||
-p|--project)
|
|
||||||
PROJECT="$2"
|
|
||||||
shift 2
|
|
||||||
;;
|
|
||||||
-v|--verbose)
|
|
||||||
VERBOSE="-v"
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-f|--force)
|
|
||||||
FORCE="-f"
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
-h|--help)
|
|
||||||
usage
|
|
||||||
;;
|
|
||||||
import|validate|render|preprocess|transcribe|align|all)
|
|
||||||
COMMAND="$1"
|
|
||||||
shift
|
|
||||||
;;
|
|
||||||
*)
|
|
||||||
echo "Unknown option: $1"
|
|
||||||
usage
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
done
|
|
||||||
|
|
||||||
# Validate project argument
|
|
||||||
if [[ -z "$PROJECT" ]]; then
|
|
||||||
echo "Error: Project directory required (-p <project>)"
|
|
||||||
echo ""
|
|
||||||
usage
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ ! -d "$PROJECT" ]]; then
|
|
||||||
echo "Error: Project directory not found: $PROJECT"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [[ ! -f "$PROJECT/project.json" ]]; then
|
|
||||||
echo "Error: project.json not found in $PROJECT"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Run commands using new CLI interface
|
|
||||||
run_gnommo() {
|
|
||||||
"$VENV_PYTHON" -m gnommo -p "$PROJECT" -a "$1" $VERBOSE
|
|
||||||
}
|
|
||||||
|
|
||||||
run_gnommo_import() {
|
|
||||||
"$VENV_PYTHON" -m gnommo -p "$PROJECT" -a validate -i $FORCE $VERBOSE
|
|
||||||
}
|
|
||||||
|
|
||||||
case $COMMAND in
|
|
||||||
import)
|
|
||||||
echo "=== Importing assets for $PROJECT ==="
|
|
||||||
run_gnommo_import
|
|
||||||
;;
|
|
||||||
|
|
||||||
validate)
|
|
||||||
echo "=== Validating $PROJECT ==="
|
|
||||||
run_gnommo validate
|
|
||||||
;;
|
|
||||||
|
|
||||||
transcribe)
|
|
||||||
echo "=== Transcribing $PROJECT ==="
|
|
||||||
run_gnommo transcribe
|
|
||||||
;;
|
|
||||||
|
|
||||||
align)
|
|
||||||
echo "=== Aligning $PROJECT ==="
|
|
||||||
run_gnommo align
|
|
||||||
;;
|
|
||||||
|
|
||||||
render)
|
|
||||||
echo "=== Rendering $PROJECT ==="
|
|
||||||
run_gnommo render
|
|
||||||
;;
|
|
||||||
|
|
||||||
preprocess)
|
|
||||||
echo "=== Preprocessing $PROJECT ==="
|
|
||||||
run_gnommo preprocess
|
|
||||||
;;
|
|
||||||
|
|
||||||
all)
|
|
||||||
echo "=== Full Pipeline: $PROJECT ==="
|
|
||||||
run_gnommo all
|
|
||||||
;;
|
|
||||||
|
|
||||||
*)
|
|
||||||
echo "Unknown command: $COMMAND"
|
|
||||||
usage
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
|
|||||||
@@ -1,199 +0,0 @@
|
|||||||
"""Alignment stage: match manuscript markers to transcript timestamps."""
|
|
||||||
|
|
||||||
import csv
|
|
||||||
import re
|
|
||||||
from dataclasses import dataclass
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from .errors import GnommoError
|
|
||||||
from .transcriber import TranscribedWord
|
|
||||||
|
|
||||||
|
|
||||||
class AlignmentError(GnommoError):
|
|
||||||
"""Error during alignment."""
|
|
||||||
pass
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class MarkerAlignment:
|
|
||||||
"""A marker with its aligned timestamp."""
|
|
||||||
marker_id: str
|
|
||||||
timestamp: float
|
|
||||||
matched_phrase: str
|
|
||||||
confidence: float # 0-1, how confident the match is
|
|
||||||
|
|
||||||
|
|
||||||
def extract_marker_contexts(manuscript_text: str) -> list[tuple[str, str]]:
|
|
||||||
"""
|
|
||||||
Extract markers and the text immediately following them.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of (marker_id, following_text) tuples
|
|
||||||
"""
|
|
||||||
# Split by markers, keeping the markers
|
|
||||||
parts = re.split(r"\[([A-Za-z0-9_]+)\]", manuscript_text)
|
|
||||||
|
|
||||||
# parts will be: [text_before, marker1, text_after1, marker2, text_after2, ...]
|
|
||||||
contexts = []
|
|
||||||
|
|
||||||
for i in range(1, len(parts), 2):
|
|
||||||
marker_id = parts[i]
|
|
||||||
if i + 1 < len(parts):
|
|
||||||
following_text = parts[i + 1].strip()
|
|
||||||
# Get first sentence or first N words
|
|
||||||
following_text = _get_first_phrase(following_text)
|
|
||||||
contexts.append((marker_id, following_text))
|
|
||||||
|
|
||||||
return contexts
|
|
||||||
|
|
||||||
|
|
||||||
def _get_first_phrase(text: str, max_words: int = 10) -> str:
|
|
||||||
"""Extract first phrase (up to first sentence end or max_words)."""
|
|
||||||
# Clean up the text
|
|
||||||
text = text.replace("\n", " ").strip()
|
|
||||||
|
|
||||||
# Find first sentence boundary
|
|
||||||
match = re.search(r"[.!?]", text)
|
|
||||||
if match and match.start() < 200:
|
|
||||||
text = text[: match.start()]
|
|
||||||
|
|
||||||
# Limit to max_words
|
|
||||||
words = text.split()[:max_words]
|
|
||||||
return " ".join(words)
|
|
||||||
|
|
||||||
|
|
||||||
def normalize_text(text: str) -> str:
|
|
||||||
"""Normalize text for matching (lowercase, remove punctuation)."""
|
|
||||||
text = text.lower()
|
|
||||||
text = re.sub(r"[^\w\s]", "", text)
|
|
||||||
text = re.sub(r"\s+", " ", text)
|
|
||||||
return text.strip()
|
|
||||||
|
|
||||||
|
|
||||||
def find_phrase_in_transcript(
|
|
||||||
phrase: str,
|
|
||||||
transcript: list[TranscribedWord],
|
|
||||||
start_from: int = 0,
|
|
||||||
) -> tuple[int, float]:
|
|
||||||
"""
|
|
||||||
Find a phrase in the transcript and return the word index and timestamp.
|
|
||||||
|
|
||||||
Uses sliding window matching with normalization.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Tuple of (word_index, timestamp) or (-1, 0.0) if not found
|
|
||||||
"""
|
|
||||||
phrase_normalized = normalize_text(phrase)
|
|
||||||
phrase_words = phrase_normalized.split()
|
|
||||||
|
|
||||||
if not phrase_words:
|
|
||||||
return -1, 0.0
|
|
||||||
|
|
||||||
# Try to find increasingly shorter prefixes
|
|
||||||
for length in range(len(phrase_words), 2, -1):
|
|
||||||
target = " ".join(phrase_words[:length])
|
|
||||||
|
|
||||||
# Sliding window through transcript
|
|
||||||
for i in range(start_from, len(transcript) - length + 1):
|
|
||||||
window_words = [normalize_text(transcript[j].word) for j in range(i, i + length)]
|
|
||||||
window_text = " ".join(window_words)
|
|
||||||
|
|
||||||
if target in window_text or window_text in target:
|
|
||||||
return i, transcript[i].start
|
|
||||||
|
|
||||||
# Fallback: try to find just the first few words
|
|
||||||
if len(phrase_words) >= 2:
|
|
||||||
target = " ".join(phrase_words[:3])
|
|
||||||
for i in range(start_from, len(transcript) - 2):
|
|
||||||
window_words = [normalize_text(transcript[j].word) for j in range(i, min(i + 5, len(transcript)))]
|
|
||||||
window_text = " ".join(window_words)
|
|
||||||
if phrase_words[0] in window_text and phrase_words[1] in window_text:
|
|
||||||
return i, transcript[i].start
|
|
||||||
|
|
||||||
return -1, 0.0
|
|
||||||
|
|
||||||
|
|
||||||
def align_markers(
|
|
||||||
manuscript_text: str,
|
|
||||||
transcript: list[TranscribedWord],
|
|
||||||
offset_seconds: float = -1.0,
|
|
||||||
) -> list[MarkerAlignment]:
|
|
||||||
"""
|
|
||||||
Align manuscript markers to transcript timestamps.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
manuscript_text: Full manuscript text with [S1], [S2] etc.
|
|
||||||
transcript: Word-level transcript with timestamps
|
|
||||||
offset_seconds: Offset to apply to found timestamps (default -1.0)
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of MarkerAlignment with timestamps
|
|
||||||
"""
|
|
||||||
contexts = extract_marker_contexts(manuscript_text)
|
|
||||||
alignments: list[MarkerAlignment] = []
|
|
||||||
|
|
||||||
last_index = 0
|
|
||||||
|
|
||||||
for marker_id, following_text in contexts:
|
|
||||||
idx, timestamp = find_phrase_in_transcript(
|
|
||||||
following_text, transcript, start_from=last_index
|
|
||||||
)
|
|
||||||
|
|
||||||
if idx >= 0:
|
|
||||||
# Apply offset (e.g., -1 second before the word)
|
|
||||||
adjusted_time = max(0.0, timestamp + offset_seconds)
|
|
||||||
alignments.append(MarkerAlignment(
|
|
||||||
marker_id=marker_id,
|
|
||||||
timestamp=adjusted_time,
|
|
||||||
matched_phrase=following_text[:50],
|
|
||||||
confidence=1.0,
|
|
||||||
))
|
|
||||||
last_index = idx
|
|
||||||
else:
|
|
||||||
# Could not find match - report but continue
|
|
||||||
alignments.append(MarkerAlignment(
|
|
||||||
marker_id=marker_id,
|
|
||||||
timestamp=-1.0, # Indicates not found
|
|
||||||
matched_phrase=following_text[:50],
|
|
||||||
confidence=0.0,
|
|
||||||
))
|
|
||||||
|
|
||||||
return alignments
|
|
||||||
|
|
||||||
|
|
||||||
def save_aligned_transcript(
|
|
||||||
alignments: list[MarkerAlignment],
|
|
||||||
transcript: list[TranscribedWord],
|
|
||||||
output_path: Path,
|
|
||||||
) -> None:
|
|
||||||
"""
|
|
||||||
Save aligned transcript as CSV compatible with gnommo's transcript.csv format.
|
|
||||||
|
|
||||||
Format:
|
|
||||||
t,word
|
|
||||||
0.00,Hello
|
|
||||||
1.50,[S1]
|
|
||||||
1.51,This
|
|
||||||
...
|
|
||||||
"""
|
|
||||||
# Build list of (timestamp, word) including markers
|
|
||||||
entries: list[tuple[float, str]] = []
|
|
||||||
|
|
||||||
# Add all words from transcript
|
|
||||||
for word in transcript:
|
|
||||||
entries.append((word.start, word.word))
|
|
||||||
|
|
||||||
# Add markers at their aligned positions
|
|
||||||
for alignment in alignments:
|
|
||||||
if alignment.timestamp >= 0:
|
|
||||||
entries.append((alignment.timestamp, f"[{alignment.marker_id}]"))
|
|
||||||
|
|
||||||
# Sort by timestamp
|
|
||||||
entries.sort(key=lambda x: x[0])
|
|
||||||
|
|
||||||
# Write CSV
|
|
||||||
with open(output_path, "w", encoding="utf-8", newline="") as f:
|
|
||||||
writer = csv.writer(f)
|
|
||||||
writer.writerow(["t", "word"])
|
|
||||||
for timestamp, word in entries:
|
|
||||||
writer.writerow([f"{timestamp:.2f}", word])
|
|
||||||
+894
-152
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,359 @@
|
|||||||
|
"""Description generator: Create YouTube description with chapters, citations, and attributions."""
|
||||||
|
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .models import (
|
||||||
|
Attribution,
|
||||||
|
Citation,
|
||||||
|
ProjectConfig,
|
||||||
|
SlideDefinition,
|
||||||
|
VideoSource,
|
||||||
|
)
|
||||||
|
from .transcriber import TranscribedWord
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ChapterMarker:
|
||||||
|
"""A chapter marker with timestamp and title."""
|
||||||
|
|
||||||
|
slide_id: str
|
||||||
|
timestamp: float
|
||||||
|
title: str
|
||||||
|
|
||||||
|
|
||||||
|
def _format_timestamp(seconds: float) -> str:
|
||||||
|
"""Format seconds as M:SS or H:MM:SS for YouTube chapters."""
|
||||||
|
if seconds < 0:
|
||||||
|
return "0:00"
|
||||||
|
|
||||||
|
hours = int(seconds // 3600)
|
||||||
|
minutes = int((seconds % 3600) // 60)
|
||||||
|
secs = int(seconds % 60)
|
||||||
|
|
||||||
|
if hours > 0:
|
||||||
|
return f"{hours}:{minutes:02d}:{secs:02d}"
|
||||||
|
else:
|
||||||
|
return f"{minutes}:{secs:02d}"
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_chapter_title(
|
||||||
|
manuscript_text: str, slide_id: str, slides: dict[str, SlideDefinition]
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Extract a chapter title for a slide.
|
||||||
|
|
||||||
|
Tries to find meaningful title from:
|
||||||
|
1. First sentence/line after the slide marker
|
||||||
|
2. Falls back to slide ID if nothing useful found
|
||||||
|
"""
|
||||||
|
# Find the marker and text after it
|
||||||
|
pattern = rf"\[{re.escape(slide_id)}\]\s*(.+?)(?=\[S\d+\]|\[video:|\[narration:|\Z)"
|
||||||
|
match = re.search(pattern, manuscript_text, re.DOTALL)
|
||||||
|
|
||||||
|
if match:
|
||||||
|
text = match.group(1).strip()
|
||||||
|
# Remove any other markers from the text
|
||||||
|
text = re.sub(r"\[[^\]]+\]", "", text).strip()
|
||||||
|
|
||||||
|
if text:
|
||||||
|
# Take first line or first sentence
|
||||||
|
first_line = text.split("\n")[0].strip()
|
||||||
|
# Truncate if too long
|
||||||
|
if len(first_line) > 50:
|
||||||
|
# Try to break at word boundary
|
||||||
|
truncated = first_line[:47]
|
||||||
|
last_space = truncated.rfind(" ")
|
||||||
|
if last_space > 30:
|
||||||
|
truncated = truncated[:last_space]
|
||||||
|
first_line = truncated + "..."
|
||||||
|
|
||||||
|
if first_line:
|
||||||
|
return first_line
|
||||||
|
|
||||||
|
# Fallback to slide number
|
||||||
|
slide_num = slide_id[1:] if slide_id.startswith("S") else slide_id
|
||||||
|
return f"Section {slide_num}"
|
||||||
|
|
||||||
|
|
||||||
|
def _align_citation_to_transcription(
|
||||||
|
citation: Citation,
|
||||||
|
transcription: list[TranscribedWord],
|
||||||
|
manuscript_text: str,
|
||||||
|
) -> float:
|
||||||
|
"""
|
||||||
|
Align a citation to the transcription to find its timestamp.
|
||||||
|
|
||||||
|
Uses the context text following the citation to find the approximate
|
||||||
|
position in the audio.
|
||||||
|
|
||||||
|
Returns timestamp in seconds, or -1 if not found.
|
||||||
|
"""
|
||||||
|
if not transcription or not citation.context:
|
||||||
|
return -1.0
|
||||||
|
|
||||||
|
# Get more context from the manuscript for better matching
|
||||||
|
# Find the citation in the manuscript and get surrounding text
|
||||||
|
pattern = rf"\[cite:{re.escape(citation.reference)}\]\s*(.{{0,200}})"
|
||||||
|
match = re.search(pattern, manuscript_text, re.DOTALL)
|
||||||
|
|
||||||
|
if not match:
|
||||||
|
return -1.0
|
||||||
|
|
||||||
|
context_text = match.group(1).strip()
|
||||||
|
# Clean up: remove markers, normalize whitespace
|
||||||
|
context_text = re.sub(r"\[[^\]]+\]", "", context_text)
|
||||||
|
context_text = " ".join(context_text.split())
|
||||||
|
|
||||||
|
if not context_text:
|
||||||
|
return -1.0
|
||||||
|
|
||||||
|
# Normalize for matching
|
||||||
|
context_words = context_text.lower().split()[:10] # Use up to 10 words
|
||||||
|
if not context_words:
|
||||||
|
return -1.0
|
||||||
|
|
||||||
|
# Build normalized transcription
|
||||||
|
trans_words = [(w.word.lower(), w.start) for w in transcription]
|
||||||
|
|
||||||
|
# Simple sliding window match
|
||||||
|
best_match_score = 0
|
||||||
|
best_match_time = -1.0
|
||||||
|
|
||||||
|
for i in range(len(trans_words) - len(context_words) + 1):
|
||||||
|
matches = 0
|
||||||
|
for j, ctx_word in enumerate(context_words):
|
||||||
|
trans_word = trans_words[i + j][0]
|
||||||
|
# Allow partial matches for longer words
|
||||||
|
if ctx_word == trans_word:
|
||||||
|
matches += 1
|
||||||
|
elif len(ctx_word) >= 4 and (
|
||||||
|
ctx_word in trans_word or trans_word in ctx_word
|
||||||
|
):
|
||||||
|
matches += 0.5
|
||||||
|
|
||||||
|
score = matches / len(context_words)
|
||||||
|
if score > best_match_score and score >= 0.5:
|
||||||
|
best_match_score = score
|
||||||
|
best_match_time = trans_words[i][1]
|
||||||
|
|
||||||
|
return best_match_time
|
||||||
|
|
||||||
|
|
||||||
|
def generate_chapters(
|
||||||
|
manuscript_text: str,
|
||||||
|
slides: dict[str, SlideDefinition],
|
||||||
|
marker_timings: list, # List of MarkerTiming from transformer
|
||||||
|
min_chapter_duration: float = 30.0,
|
||||||
|
) -> list[ChapterMarker]:
|
||||||
|
"""
|
||||||
|
Generate chapter markers from slide timings.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
manuscript_text: The manuscript content
|
||||||
|
slides: Slide definitions
|
||||||
|
marker_timings: Aligned marker timings from the transformer
|
||||||
|
min_chapter_duration: Minimum seconds between chapters (merges short ones)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of ChapterMarker objects
|
||||||
|
"""
|
||||||
|
chapters = []
|
||||||
|
|
||||||
|
# Build timing lookup
|
||||||
|
timing_lookup = {t.marker_id: t.timestamp for t in marker_timings if t.timestamp >= 0}
|
||||||
|
|
||||||
|
# Process slides in order
|
||||||
|
slide_ids = sorted(
|
||||||
|
[s for s in slides.keys() if s.startswith("S")],
|
||||||
|
key=lambda x: int(x[1:]) if x[1:].isdigit() else 0,
|
||||||
|
)
|
||||||
|
|
||||||
|
for slide_id in slide_ids:
|
||||||
|
if slide_id not in timing_lookup:
|
||||||
|
continue
|
||||||
|
|
||||||
|
timestamp = timing_lookup[slide_id]
|
||||||
|
title = _extract_chapter_title(manuscript_text, slide_id, slides)
|
||||||
|
|
||||||
|
# Check if we should merge with previous chapter (too short)
|
||||||
|
if chapters and (timestamp - chapters[-1].timestamp) < min_chapter_duration:
|
||||||
|
continue # Skip this chapter, previous one covers it
|
||||||
|
|
||||||
|
chapters.append(
|
||||||
|
ChapterMarker(
|
||||||
|
slide_id=slide_id,
|
||||||
|
timestamp=timestamp,
|
||||||
|
title=title,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Ensure first chapter starts at 0:00
|
||||||
|
if chapters and chapters[0].timestamp > 0:
|
||||||
|
chapters[0] = ChapterMarker(
|
||||||
|
slide_id=chapters[0].slide_id,
|
||||||
|
timestamp=0.0,
|
||||||
|
title=chapters[0].title,
|
||||||
|
)
|
||||||
|
|
||||||
|
return chapters
|
||||||
|
|
||||||
|
|
||||||
|
def collect_attributions(
|
||||||
|
videos: dict[str, VideoSource],
|
||||||
|
video_events: list = None,
|
||||||
|
) -> list[tuple[str, Attribution]]:
|
||||||
|
"""
|
||||||
|
Collect all video attributions.
|
||||||
|
|
||||||
|
Returns list of (video_id, Attribution) tuples for videos that have attribution.
|
||||||
|
Only includes videos that are actually used in the project (via video_events)
|
||||||
|
or videos from shared assets that have attribution.
|
||||||
|
"""
|
||||||
|
attributions = []
|
||||||
|
|
||||||
|
# Get set of used video IDs from events
|
||||||
|
used_video_ids = set()
|
||||||
|
if video_events:
|
||||||
|
for event in video_events:
|
||||||
|
used_video_ids.add(event.video_id)
|
||||||
|
|
||||||
|
for video_id, video_source in videos.items():
|
||||||
|
if video_source.attribution:
|
||||||
|
# Include if used in video or if it's a shared asset
|
||||||
|
if video_id in used_video_ids or video_source.is_shared:
|
||||||
|
attributions.append((video_id, video_source.attribution))
|
||||||
|
|
||||||
|
return attributions
|
||||||
|
|
||||||
|
|
||||||
|
def generate_description(
|
||||||
|
config: ProjectConfig,
|
||||||
|
manuscript_text: str,
|
||||||
|
slides: dict[str, SlideDefinition],
|
||||||
|
videos: dict[str, VideoSource],
|
||||||
|
marker_timings: list,
|
||||||
|
transcription: list[TranscribedWord] = None,
|
||||||
|
video_events: list = None,
|
||||||
|
citations: list[Citation] = None,
|
||||||
|
include_chapters: bool = True,
|
||||||
|
include_citations: bool = True,
|
||||||
|
include_attributions: bool = True,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Generate complete YouTube description.
|
||||||
|
|
||||||
|
Combines:
|
||||||
|
- Video description from project.json
|
||||||
|
- Chapter markers (optional)
|
||||||
|
- Citations from manuscript (optional)
|
||||||
|
- Stock footage attributions (optional)
|
||||||
|
- Footer from project.json
|
||||||
|
|
||||||
|
Returns formatted description text.
|
||||||
|
"""
|
||||||
|
sections = []
|
||||||
|
|
||||||
|
# 1. Video description
|
||||||
|
if config.description:
|
||||||
|
sections.append(config.description.strip())
|
||||||
|
|
||||||
|
# 2. Chapters
|
||||||
|
if include_chapters:
|
||||||
|
chapters = generate_chapters(manuscript_text, slides, marker_timings)
|
||||||
|
if chapters:
|
||||||
|
chapter_lines = ["CHAPTERS", ""]
|
||||||
|
for ch in chapters:
|
||||||
|
chapter_lines.append(f"{_format_timestamp(ch.timestamp)} {ch.title}")
|
||||||
|
sections.append("\n".join(chapter_lines))
|
||||||
|
|
||||||
|
# 3. Citations/References
|
||||||
|
if include_citations:
|
||||||
|
citations = citations or []
|
||||||
|
if citations and transcription:
|
||||||
|
# Align citations to get timestamps
|
||||||
|
for citation in citations:
|
||||||
|
citation.timestamp = _align_citation_to_transcription(
|
||||||
|
citation, transcription, manuscript_text
|
||||||
|
)
|
||||||
|
|
||||||
|
if citations:
|
||||||
|
ref_lines = ["REFERENCES", ""]
|
||||||
|
for citation in citations:
|
||||||
|
if citation.timestamp >= 0:
|
||||||
|
ref_lines.append(
|
||||||
|
f"{_format_timestamp(citation.timestamp)} - {citation.reference}"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
ref_lines.append(f"- {citation.reference}")
|
||||||
|
sections.append("\n".join(ref_lines))
|
||||||
|
|
||||||
|
# 4. Stock footage attributions
|
||||||
|
if include_attributions:
|
||||||
|
attributions = collect_attributions(videos, video_events)
|
||||||
|
if attributions:
|
||||||
|
attr_lines = ["STOCK FOOTAGE", ""]
|
||||||
|
for video_id, attr in attributions:
|
||||||
|
# Format: "Description by Creator via Source: URL"
|
||||||
|
line = f"{video_id.replace('_', ' ').title()} by {attr.creator} via {attr.source.title()}"
|
||||||
|
if attr.url:
|
||||||
|
line += f": {attr.url}"
|
||||||
|
attr_lines.append(line)
|
||||||
|
sections.append("\n".join(attr_lines))
|
||||||
|
|
||||||
|
# 5. Footer
|
||||||
|
if config.footer:
|
||||||
|
sections.append(config.footer.strip())
|
||||||
|
|
||||||
|
# Join sections with double newlines
|
||||||
|
return "\n\n".join(sections)
|
||||||
|
|
||||||
|
|
||||||
|
def write_description_file(
|
||||||
|
output_path: Path,
|
||||||
|
config: ProjectConfig,
|
||||||
|
manuscript_text: str,
|
||||||
|
slides: dict[str, SlideDefinition],
|
||||||
|
videos: dict[str, VideoSource],
|
||||||
|
marker_timings: list,
|
||||||
|
transcription: list[TranscribedWord] = None,
|
||||||
|
video_events: list = None,
|
||||||
|
citations: list[Citation] = None,
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Generate and write YouTube description to file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
output_path: Path to write description (e.g., out/description_youtube.txt)
|
||||||
|
config: Project configuration
|
||||||
|
manuscript_text: Manuscript content
|
||||||
|
slides: Slide definitions
|
||||||
|
videos: Video definitions
|
||||||
|
marker_timings: Aligned marker timings
|
||||||
|
transcription: Word-level transcription (optional, for citation timestamps)
|
||||||
|
video_events: Video events from render plan (optional, for attribution filtering)
|
||||||
|
citations: Pre-extracted citations (optional, loaded from citations.json)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The generated description text
|
||||||
|
"""
|
||||||
|
description = generate_description(
|
||||||
|
config=config,
|
||||||
|
manuscript_text=manuscript_text,
|
||||||
|
slides=slides,
|
||||||
|
videos=videos,
|
||||||
|
marker_timings=marker_timings,
|
||||||
|
transcription=transcription,
|
||||||
|
video_events=video_events,
|
||||||
|
citations=citations,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Ensure output directory exists
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Write description
|
||||||
|
output_path.write_text(description, encoding="utf-8")
|
||||||
|
|
||||||
|
return description
|
||||||
+15
-3
@@ -7,12 +7,14 @@ from typing import Optional
|
|||||||
|
|
||||||
class GnommoError(Exception):
|
class GnommoError(Exception):
|
||||||
"""Base exception for all GnommoEditor errors."""
|
"""Base exception for all GnommoEditor errors."""
|
||||||
|
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class ValidationIssue:
|
class ValidationIssue:
|
||||||
"""A single validation issue with location context."""
|
"""A single validation issue with location context."""
|
||||||
|
|
||||||
message: str
|
message: str
|
||||||
file: Optional[Path] = None
|
file: Optional[Path] = None
|
||||||
line: Optional[int] = None
|
line: Optional[int] = None
|
||||||
@@ -30,7 +32,9 @@ class ValidationIssue:
|
|||||||
class ParseError(GnommoError):
|
class ParseError(GnommoError):
|
||||||
"""Error during parsing of input files."""
|
"""Error during parsing of input files."""
|
||||||
|
|
||||||
def __init__(self, message: str, file: Optional[Path] = None, line: Optional[int] = None):
|
def __init__(
|
||||||
|
self, message: str, file: Optional[Path] = None, line: Optional[int] = None
|
||||||
|
):
|
||||||
self.issue = ValidationIssue(message, file, line)
|
self.issue = ValidationIssue(message, file, line)
|
||||||
super().__init__(str(self.issue))
|
super().__init__(str(self.issue))
|
||||||
|
|
||||||
@@ -48,7 +52,9 @@ class ValidationError(GnommoError):
|
|||||||
class RenderError(GnommoError):
|
class RenderError(GnommoError):
|
||||||
"""Error during rendering stage."""
|
"""Error during rendering stage."""
|
||||||
|
|
||||||
def __init__(self, message: str, command: Optional[str] = None, stderr: Optional[str] = None):
|
def __init__(
|
||||||
|
self, message: str, command: Optional[str] = None, stderr: Optional[str] = None
|
||||||
|
):
|
||||||
self.command = command
|
self.command = command
|
||||||
self.stderr = stderr
|
self.stderr = stderr
|
||||||
full_message = message
|
full_message = message
|
||||||
@@ -62,7 +68,13 @@ class RenderError(GnommoError):
|
|||||||
class PreprocessError(GnommoError):
|
class PreprocessError(GnommoError):
|
||||||
"""Error during preprocessing stage."""
|
"""Error during preprocessing stage."""
|
||||||
|
|
||||||
def __init__(self, message: str, filter_type: Optional[str] = None, command: Optional[str] = None, stderr: Optional[str] = None):
|
def __init__(
|
||||||
|
self,
|
||||||
|
message: str,
|
||||||
|
filter_type: Optional[str] = None,
|
||||||
|
command: Optional[str] = None,
|
||||||
|
stderr: Optional[str] = None,
|
||||||
|
):
|
||||||
self.filter_type = filter_type
|
self.filter_type = filter_type
|
||||||
self.command = command
|
self.command = command
|
||||||
self.stderr = stderr
|
self.stderr = stderr
|
||||||
|
|||||||
@@ -0,0 +1,74 @@
|
|||||||
|
ObjC.import('stdlib');
|
||||||
|
ObjC.import('Foundation');
|
||||||
|
|
||||||
|
function toAbsolutePath(p) {
|
||||||
|
// Expand ~ and make absolute relative to current working directory
|
||||||
|
var s = $(String(p)).stringByExpandingTildeInPath;
|
||||||
|
if (!s.isAbsolutePath) {
|
||||||
|
var cwd = $.NSFileManager.defaultManager.currentDirectoryPath;
|
||||||
|
s = cwd.stringByAppendingPathComponent(s);
|
||||||
|
}
|
||||||
|
return s.stringByStandardizingPath.js;
|
||||||
|
}
|
||||||
|
|
||||||
|
function fileExists(p) {
|
||||||
|
return $.NSFileManager.defaultManager.fileExistsAtPath($(p));
|
||||||
|
}
|
||||||
|
|
||||||
|
function getNotes(slide) {
|
||||||
|
try { return slide.presenterNotes(); } catch (e) {}
|
||||||
|
try { return slide.speakerNotes(); } catch (e) {}
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
function run(argv) {
|
||||||
|
if (!argv || argv.length < 1) throw new Error("Usage: script.js <file.key> [slides_output_dir]");
|
||||||
|
var abs = toAbsolutePath(argv[0]);
|
||||||
|
var slidesDir = argv.length >= 2 ? toAbsolutePath(argv[1]) : null;
|
||||||
|
|
||||||
|
if (!fileExists(abs)) {
|
||||||
|
throw new Error("File not found: " + abs);
|
||||||
|
}
|
||||||
|
|
||||||
|
var Keynote = Application('Keynote');
|
||||||
|
Keynote.activate();
|
||||||
|
|
||||||
|
// Keynote is happiest when given a Path() made from an absolute POSIX path
|
||||||
|
var doc = Keynote.open(Path(abs));
|
||||||
|
|
||||||
|
// Export slides as PNG if output directory is provided
|
||||||
|
if (slidesDir) {
|
||||||
|
// Create directory if it doesn't exist
|
||||||
|
var fm = $.NSFileManager.defaultManager;
|
||||||
|
if (!fm.fileExistsAtPath($(slidesDir))) {
|
||||||
|
fm.createDirectoryAtPathWithIntermediateDirectoriesAttributesError(
|
||||||
|
$(slidesDir), true, $(), $()
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Export using AppleScript (more reliable than JXA for Keynote export)
|
||||||
|
var app = Application.currentApplication();
|
||||||
|
app.includeStandardAdditions = true;
|
||||||
|
|
||||||
|
// Build osascript command with proper escaping
|
||||||
|
// Using multiple -e flags to avoid quoting issues
|
||||||
|
var cmd = '/usr/bin/osascript' +
|
||||||
|
' -e \'tell application "Keynote"\'' +
|
||||||
|
' -e \'export front document to POSIX file "' + slidesDir + '" as slide images with properties {image format:PNG}\'' +
|
||||||
|
' -e \'end tell\'';
|
||||||
|
|
||||||
|
app.doShellScript(cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
var slides = doc.slides();
|
||||||
|
var out = [];
|
||||||
|
for (var i = 0; i < slides.length; i++) {
|
||||||
|
out.push({
|
||||||
|
slide_index: i + 1,
|
||||||
|
notes: String(getNotes(slides[i]) || "")
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
doc.close({ saving: 'no' });
|
||||||
|
return JSON.stringify(out, null, 2);
|
||||||
|
}
|
||||||
@@ -0,0 +1,94 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Extract presenter notes from a Keynote .key file.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python extract_keynote_notes.py path/to/deck.key --out notes.json
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- A .key file is a package (zip). The presenter notes live in an XML-ish file
|
||||||
|
typically called index.apxl inside the package.
|
||||||
|
- This script tries to be robust across minor format changes by searching for
|
||||||
|
likely note fields.
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import shutil
|
||||||
|
import tempfile
|
||||||
|
import zipfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def write_manuscript(data: Path, out_path: Path):
|
||||||
|
data = json.loads(
|
||||||
|
data.read_text(encoding="utf-8")
|
||||||
|
) # list of {"slide_index": int, "notes": str}
|
||||||
|
|
||||||
|
lines = []
|
||||||
|
i = 0
|
||||||
|
for item in data:
|
||||||
|
print(f"Writing notes for slide {i} to file")
|
||||||
|
idx = item.get("slide_index")
|
||||||
|
notes = (item.get("notes") or "").rstrip()
|
||||||
|
|
||||||
|
lines.append(f"[S{idx}]")
|
||||||
|
lines.append(notes)
|
||||||
|
lines.append("") # blank line between slides
|
||||||
|
i += 1
|
||||||
|
|
||||||
|
out_path.write_text("\n".join(lines).rstrip() + "\n", encoding="utf-8")
|
||||||
|
print(f"Wrote {out_path}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
keynote_file = Path("video1/video1.key").expanduser().resolve()
|
||||||
|
if not keynote_file.exists():
|
||||||
|
raise FileNotFoundError(f"Keynote file not found: {keynote_file}")
|
||||||
|
|
||||||
|
script_file = Path("gnommo/extract_keynote_notes.js").expanduser().resolve()
|
||||||
|
if not script_file.exists():
|
||||||
|
raise FileNotFoundError(f"Extractor script not found: {script_file}")
|
||||||
|
|
||||||
|
presenter_notes_json_file = Path("video1/manuscript.json").expanduser().resolve()
|
||||||
|
|
||||||
|
# Run JXA extractor
|
||||||
|
proc = subprocess.run(
|
||||||
|
[
|
||||||
|
"osascript",
|
||||||
|
"-l",
|
||||||
|
"JavaScript",
|
||||||
|
str(script_file),
|
||||||
|
str(keynote_file),
|
||||||
|
],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
if proc.returncode != 0:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Failed to extract presenter notes:\n"
|
||||||
|
f"STDERR:\n{proc.stderr}\n"
|
||||||
|
f"STDOUT:\n{proc.stdout}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Write JSON output
|
||||||
|
presenter_notes_json_file.write_text(proc.stdout, encoding="utf-8")
|
||||||
|
|
||||||
|
if not presenter_notes_json_file.exists():
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"Failed to extract presenter notes to {presenter_notes_json_file}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert JSON → manuscript.txt
|
||||||
|
write_manuscript(
|
||||||
|
presenter_notes_json_file, out_path=keynote_file.parent / "manuscript.txt"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
+366
-36
@@ -6,31 +6,64 @@ from typing import Optional
|
|||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class TalkingHeadConfig:
|
class CutoutDefinition:
|
||||||
"""Configuration for talking head video positioning."""
|
"""Definition of a named zone for placing video content.
|
||||||
x: int
|
|
||||||
y: int
|
All positioning values support both pixels (int) and percentages (str like "50%").
|
||||||
target_height: int # in pixels, or -1 for percentage-based
|
Percentage values are stored as floats (0.0-1.0) with pixel value set to -1.
|
||||||
target_height_percent: float = 0.0 # percentage (0.0-1.0) if target_height is -1
|
|
||||||
file: Optional[str] = None # Path to video or metadata JSON file
|
Videos placed in cutouts are cropped to fit the cutout dimensions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
x: int # in pixels, or -1 for percentage-based
|
||||||
|
y: int # in pixels, or -1 for percentage-based
|
||||||
|
height: int # in pixels, or -1 for percentage-based
|
||||||
|
width: int = (
|
||||||
|
-1
|
||||||
|
) # in pixels, or -1 for percentage-based (defaults to height for square)
|
||||||
|
x_percent: float = 0.0 # percentage (0.0-1.0) if x is -1
|
||||||
|
y_percent: float = 0.0 # percentage (0.0-1.0) if y is -1
|
||||||
|
height_percent: float = 0.0 # percentage (0.0-1.0) if height is -1
|
||||||
|
width_percent: float = 0.0 # percentage (0.0-1.0) if width is -1
|
||||||
|
|
||||||
|
|
||||||
|
# Backwards compatibility alias
|
||||||
|
TalkingHeadConfig = CutoutDefinition
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class ProjectConfig:
|
class ProjectConfig:
|
||||||
"""Global project configuration from project.json."""
|
"""Global project configuration from project.json."""
|
||||||
|
|
||||||
resolution: tuple[int, int]
|
resolution: tuple[int, int]
|
||||||
fps: int
|
fps: int
|
||||||
talking_head: TalkingHeadConfig
|
|
||||||
default_slide_type: str
|
default_slide_type: str
|
||||||
|
cutouts: dict[str, CutoutDefinition] = field(
|
||||||
|
default_factory=dict
|
||||||
|
) # Named zones for video placement
|
||||||
background: str = "" # Background image or video path (in shared_assets/)
|
background: str = "" # Background image or video path (in shared_assets/)
|
||||||
background_video: str = "" # Deprecated: use background instead
|
background_video: str = "" # Deprecated: use background instead
|
||||||
slides_path: str = "slides.json" # path to slides.json relative to project
|
slides_path: str = "slides.json" # path to slides.json relative to project
|
||||||
|
videos_path: str = "videos.json" # path to videos.json relative to project
|
||||||
|
audio_path: str = "audio.json" # path to audio.json relative to project
|
||||||
audio_source: Optional[str] = None # defaults to talking head
|
audio_source: Optional[str] = None # defaults to talking head
|
||||||
|
main_video: Optional[str] = None # ID of main video (e.g., talking head)
|
||||||
|
gnommo_scratch: Optional[
|
||||||
|
str
|
||||||
|
] = None # directory for intermediate files (e.g., external SSD)
|
||||||
|
# Outro sequence - plays after narration ends (not marker-triggered)
|
||||||
|
outro: list[str] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # List of video IDs to play in sequence after narration
|
||||||
|
# YouTube description fields
|
||||||
|
description: str = "" # Video description text for YouTube
|
||||||
|
footer: str = "" # Footer text (social links, subscribe CTA, etc.)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class SlideDefinition:
|
class SlideDefinition:
|
||||||
"""Definition of a single slide from slides.json."""
|
"""Definition of a single slide from slides.json."""
|
||||||
|
|
||||||
image: str
|
image: str
|
||||||
type: str # "fullscreen" | "square"
|
type: str # "fullscreen" | "square"
|
||||||
|
|
||||||
@@ -38,25 +71,170 @@ class SlideDefinition:
|
|||||||
@dataclass
|
@dataclass
|
||||||
class ChromaKeyConfig:
|
class ChromaKeyConfig:
|
||||||
"""Configuration for chroma key (green screen) filter."""
|
"""Configuration for chroma key (green screen) filter."""
|
||||||
|
|
||||||
color: tuple[int, int, int] = (0, 255, 0) # RGB color to key out
|
color: tuple[int, int, int] = (0, 255, 0) # RGB color to key out
|
||||||
similarity: float = 0.15 # Color similarity threshold (0.0-1.0)
|
similarity: float = (
|
||||||
blend: float = 0.1 # Edge blend/feathering (0.0-1.0)
|
0.4 # Color similarity threshold (0.0-1.0), higher = more aggressive
|
||||||
spill: float = 0.0 # Spill suppression amount (0.0-1.0)
|
)
|
||||||
|
blend: float = 0.08 # Edge blend/feathering (0.0-1.0), lower = tighter edges
|
||||||
|
spill: float = 0.1 # Spill suppression amount (0.0-1.0)
|
||||||
|
edge_erode: int = 0 # Pixels to erode from alpha edge (0-5), removes green fringe
|
||||||
|
# Color protection - restore opacity for colors that shouldn't be keyed
|
||||||
|
protect_color: tuple[int, int, int] = None # RGB color to protect from keying
|
||||||
|
protect_tolerance: float = (
|
||||||
|
0.15 # How much variation from protect_color to allow (0-1)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GnommoKeyConfig:
|
||||||
|
"""Configuration for gnommokey filter - Keylight-style color-difference keyer.
|
||||||
|
|
||||||
|
Uses YCbCr color-difference keying (like Keylight/Ultimatte) instead of
|
||||||
|
simple Euclidean distance. This handles lighting variation much better
|
||||||
|
than basic chromakey.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Screen color (the green/blue screen color to key out)
|
||||||
|
screen_color: tuple[int, int, int] = (0, 177, 64) # RGB of the screen
|
||||||
|
|
||||||
|
# Key extraction strength (default 100, higher = more aggressive)
|
||||||
|
# Values 80-150 are typical. Maps to Keylight's Screen Gain.
|
||||||
|
screen_gain: float = 100.0
|
||||||
|
|
||||||
|
# Balance between chrominance and luminance in key calculation (0-100)
|
||||||
|
# 0 = pure color-difference, 100 = luminance weighted
|
||||||
|
# Maps to Keylight's Screen Balance.
|
||||||
|
screen_balance: float = 50.0
|
||||||
|
|
||||||
|
# Alpha/matte adjustments
|
||||||
|
clip_black: float = 0.0 # Crush blacks (0-100). Higher = more transparent areas
|
||||||
|
clip_white: float = 100.0 # Crush whites (0-100). Lower = more opaque areas
|
||||||
|
|
||||||
|
# Despill: color to shift green spill toward (RGB)
|
||||||
|
# Typical values: skin tone [217, 200, 180] or neutral [200, 200, 200]
|
||||||
|
despill_bias: tuple[int, int, int] = None
|
||||||
|
|
||||||
|
# How aggressively to apply despill (0-1)
|
||||||
|
despill_strength: float = 0.5
|
||||||
|
|
||||||
|
# Alpha bias: influences edge treatment (RGB)
|
||||||
|
# Can help with edge color contamination
|
||||||
|
alpha_bias: tuple[int, int, int] = None
|
||||||
|
|
||||||
|
# Edge refinement
|
||||||
|
edge_erode: int = 0 # Pixels to erode from alpha edge (0-5)
|
||||||
|
edge_soften: float = 0.0 # Blur the alpha edge (0-5 pixels)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ColorGradeConfig:
|
||||||
|
"""Configuration for color grading filter.
|
||||||
|
|
||||||
|
Applies color balance, contrast curves, and saturation adjustments
|
||||||
|
while preserving the alpha channel.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Color balance (range: -1.0 to 1.0, 0 = no change)
|
||||||
|
# Midtones
|
||||||
|
rm: float = 0.0 # Red midtones adjustment
|
||||||
|
gm: float = 0.0 # Green midtones adjustment
|
||||||
|
bm: float = 0.0 # Blue midtones adjustment
|
||||||
|
# Highlights
|
||||||
|
rh: float = 0.0 # Red highlights adjustment
|
||||||
|
gh: float = 0.0 # Green highlights adjustment
|
||||||
|
bh: float = 0.0 # Blue highlights adjustment
|
||||||
|
# Shadows
|
||||||
|
rs: float = 0.0 # Red shadows adjustment
|
||||||
|
gs: float = 0.0 # Green shadows adjustment
|
||||||
|
bs: float = 0.0 # Blue shadows adjustment
|
||||||
|
|
||||||
|
# Curves preset (none, lighter, darker, increase_contrast, medium_contrast, etc.)
|
||||||
|
curves_preset: str = "none"
|
||||||
|
|
||||||
|
# EQ adjustments
|
||||||
|
contrast: float = 1.0 # Contrast multiplier (0.0-2.0, 1.0 = no change)
|
||||||
|
brightness: float = 0.0 # Brightness adjustment (-1.0 to 1.0, 0 = no change)
|
||||||
|
saturation: float = 1.0 # Saturation multiplier (0.0-3.0, 1.0 = no change)
|
||||||
|
|
||||||
|
# Custom curves for lift/gamma/gain control
|
||||||
|
# Format: "0/0 0.5/0.56 1/1" means (input/output) control points
|
||||||
|
curves_r: str = "" # Red channel curve
|
||||||
|
curves_g: str = "" # Green channel curve
|
||||||
|
curves_b: str = "" # Blue channel curve
|
||||||
|
curves_master: str = "" # Master (luminance) curve
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AudioNormalizeConfig:
|
||||||
|
"""Configuration for audio normalization filter.
|
||||||
|
|
||||||
|
Applies noise reduction, compression, and loudness normalization
|
||||||
|
to improve audio quality and consistency.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Noise reduction (afftdn filter)
|
||||||
|
denoise: bool = True # Enable noise reduction
|
||||||
|
noise_floor: float = -25.0 # Noise floor in dB (default -25, lower = more aggressive)
|
||||||
|
|
||||||
|
# Compression (acompressor filter)
|
||||||
|
compress: bool = True # Enable dynamic range compression
|
||||||
|
threshold: float = -20.0 # Compression threshold in dB
|
||||||
|
ratio: float = 4.0 # Compression ratio (4:1 default)
|
||||||
|
attack: float = 5.0 # Attack time in ms
|
||||||
|
release: float = 50.0 # Release time in ms
|
||||||
|
makeup: float = 2.0 # Makeup gain in dB
|
||||||
|
|
||||||
|
# Loudness normalization (loudnorm filter - EBU R128)
|
||||||
|
normalize: bool = True # Enable loudness normalization
|
||||||
|
target_lufs: float = -16.0 # Target integrated loudness (YouTube recommends -14 to -16)
|
||||||
|
target_lra: float = 11.0 # Target loudness range
|
||||||
|
target_tp: float = -1.5 # Target true peak in dB
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class FilterConfig:
|
class FilterConfig:
|
||||||
"""Base configuration for a preprocessing filter."""
|
"""Base configuration for a preprocessing filter."""
|
||||||
|
|
||||||
type: str
|
type: str
|
||||||
# Type-specific config stored in subclasses or as dict
|
# Type-specific config stored in subclasses or as dict
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Attribution:
|
||||||
|
"""Attribution information for stock footage (e.g., Pexels)."""
|
||||||
|
|
||||||
|
source: str # Source platform (e.g., "pexels", "pixabay", "unsplash")
|
||||||
|
creator: str # Creator/photographer name
|
||||||
|
url: Optional[str] = None # URL to the original content
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class VideoSource:
|
class VideoSource:
|
||||||
"""Video source definition from videos.json."""
|
"""Video source definition from videos.json."""
|
||||||
file: str
|
|
||||||
preprocess: list[dict] = field(default_factory=list) # List of filter config dicts
|
source_file: str # Source video filename (relative to videos.json location or shared_assets/)
|
||||||
output_file: Optional[str] = None # Path to preprocessed output (if any)
|
filter: list[dict] = field(default_factory=list) # List of filter config dicts
|
||||||
|
output_file: Optional[
|
||||||
|
str
|
||||||
|
] = None # Path to preprocessed output (relative to videos.json)
|
||||||
|
take: Optional[
|
||||||
|
float
|
||||||
|
] = None # Max duration to play (seconds). Default: until next slide or end of clip
|
||||||
|
skip: float = 0.0 # Skip this many seconds at start of video (seek point)
|
||||||
|
zoom: float = (
|
||||||
|
1.0 # Scale factor for video (1.0 = fit to cutout height, >1 = enlarge)
|
||||||
|
)
|
||||||
|
cutout: Optional[
|
||||||
|
str
|
||||||
|
] = None # Name of cutout to place video in (from project.json cutouts)
|
||||||
|
always_visible: bool = False # If True, video is always shown (like talking head)
|
||||||
|
is_shared: bool = False # If True, source_file is relative to shared_assets/
|
||||||
|
pause_narration: float = (
|
||||||
|
0.0 # Seconds to pause narration during this video (0 = no pause)
|
||||||
|
)
|
||||||
|
attribution: Optional[Attribution] = None # Attribution for stock footage
|
||||||
|
use_audio_channels: str = "both" # Audio channel selection: "both", "left", or "right"
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
@@ -67,50 +245,202 @@ class VideoMetadata:
|
|||||||
This allows defining preprocessing steps separately from videos.json,
|
This allows defining preprocessing steps separately from videos.json,
|
||||||
enabling per-video preprocessing configuration.
|
enabling per-video preprocessing configuration.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
source_file: str # Original source video file
|
source_file: str # Original source video file
|
||||||
preprocess: list[dict] = field(default_factory=list) # Preprocessing filters
|
preprocess: list[dict] = field(default_factory=list) # Preprocessing filters
|
||||||
output: Optional[dict] = None # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
|
output: Optional[
|
||||||
|
dict
|
||||||
|
] = None # Output config {"file": "...", "colorspace": "...", "alpha": "..."}
|
||||||
@dataclass
|
|
||||||
class TimedWord:
|
|
||||||
"""A word or marker with its timestamp from transcript.csv."""
|
|
||||||
time: float
|
|
||||||
word: str
|
|
||||||
|
|
||||||
@property
|
|
||||||
def is_marker(self) -> bool:
|
|
||||||
"""Check if this is a slide marker like [S1]."""
|
|
||||||
return self.word.startswith("[") and self.word.endswith("]")
|
|
||||||
|
|
||||||
@property
|
|
||||||
def marker_id(self) -> Optional[str]:
|
|
||||||
"""Extract marker ID (e.g., 'S1' from '[S1]')."""
|
|
||||||
if self.is_marker:
|
|
||||||
return self.word[1:-1]
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class SlideEvent:
|
class SlideEvent:
|
||||||
"""A resolved slide event with timing information."""
|
"""A resolved slide event with timing information."""
|
||||||
|
|
||||||
slide_id: str
|
slide_id: str
|
||||||
start_time: float
|
start_time: float
|
||||||
end_time: float
|
end_time: float
|
||||||
slide_def: SlideDefinition
|
slide_def: SlideDefinition
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AudioDefinition:
|
||||||
|
"""Definition of an audio clip from audio.json."""
|
||||||
|
|
||||||
|
file: str # Audio filename (relative to audio.json location)
|
||||||
|
volume: float = 1.0 # Volume multiplier (0.0-1.0)
|
||||||
|
loop: bool = False # If True, loop for entire duration from trigger point
|
||||||
|
ignore_pauses: bool = False # If True, audio continues playing during narration pauses
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Citation:
|
||||||
|
"""A citation extracted from manuscript.txt [cite:...] markers."""
|
||||||
|
|
||||||
|
reference: str # The literal reference text after cite:
|
||||||
|
marker_id: str # The full marker (e.g., "cite:Smith et al...")
|
||||||
|
timestamp: float = -1.0 # Aligned timestamp (-1 if not aligned)
|
||||||
|
context: str = "" # Text following the citation for alignment
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AudioEvent:
|
||||||
|
"""A resolved audio event with timing information."""
|
||||||
|
|
||||||
|
audio_id: str
|
||||||
|
start_time: float # When to start playing (marker time - offset)
|
||||||
|
audio_def: AudioDefinition
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class VideoEvent:
|
||||||
|
"""A resolved video event with timing information."""
|
||||||
|
|
||||||
|
video_id: str
|
||||||
|
start_time: float
|
||||||
|
end_time: float
|
||||||
|
video_source: "VideoSource"
|
||||||
|
cutout: "CutoutDefinition"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CameraState:
|
||||||
|
"""State of the virtual camera at a point in time.
|
||||||
|
|
||||||
|
The camera transforms the entire composed scene (background, slides, cutouts).
|
||||||
|
This ensures all elements stay spatially synchronized when zooming/tilting.
|
||||||
|
"""
|
||||||
|
|
||||||
|
zoom: float = 1.0 # 1.0 = 100%, 1.25 = 125%, etc.
|
||||||
|
rotation: float = 0.0 # degrees, positive = clockwise
|
||||||
|
pan_x: float = 0.0 # -1.0 to 1.0, percentage of frame width
|
||||||
|
pan_y: float = 0.0 # -1.0 to 1.0, percentage of frame height
|
||||||
|
focal_x: float = 0.5 # 0.0 to 1.0, zoom focal point X (0.5 = center)
|
||||||
|
focal_y: float = 0.5 # 0.0 to 1.0, zoom focal point Y (0.5 = center)
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
# Clamp values to reasonable ranges
|
||||||
|
self.zoom = max(0.5, min(3.0, self.zoom))
|
||||||
|
self.rotation = max(-45.0, min(45.0, self.rotation))
|
||||||
|
self.pan_x = max(-1.0, min(1.0, self.pan_x))
|
||||||
|
self.pan_y = max(-1.0, min(1.0, self.pan_y))
|
||||||
|
self.focal_x = max(0.0, min(1.0, self.focal_x))
|
||||||
|
self.focal_y = max(0.0, min(1.0, self.focal_y))
|
||||||
|
|
||||||
|
def is_default(self) -> bool:
|
||||||
|
"""Check if this is the default camera state (no transform)."""
|
||||||
|
return (
|
||||||
|
self.zoom == 1.0
|
||||||
|
and self.rotation == 0.0
|
||||||
|
and self.pan_x == 0.0
|
||||||
|
and self.pan_y == 0.0
|
||||||
|
and self.focal_x == 0.5
|
||||||
|
and self.focal_y == 0.5
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CameraEvent:
|
||||||
|
"""A camera state change at a specific time.
|
||||||
|
|
||||||
|
Camera events can be instant (duration=0) or animated (duration>0).
|
||||||
|
When animated, the camera smoothly transitions from its current state
|
||||||
|
to the target state over the specified duration using the easing function.
|
||||||
|
"""
|
||||||
|
|
||||||
|
time: float # timestamp in seconds
|
||||||
|
target_state: CameraState
|
||||||
|
duration: float = 0.2 # transition duration (0 = instant snap)
|
||||||
|
easing: str = "ease-out" # linear, ease-in, ease-out, ease-in-out
|
||||||
|
|
||||||
|
|
||||||
|
# Camera effect presets - map marker names to camera states
|
||||||
|
# Effect strengths are intentionally subtle for professional look
|
||||||
|
CAMERA_PRESETS: dict[str, CameraState] = {
|
||||||
|
# Zoom levels (halved for subtlety)
|
||||||
|
"Zoom0": CameraState(zoom=1.0),
|
||||||
|
"Zoom1": CameraState(zoom=1.05),
|
||||||
|
"Zoom2": CameraState(zoom=1.125),
|
||||||
|
"Zoom3": CameraState(zoom=1.25),
|
||||||
|
# Tilt/rotation (halved)
|
||||||
|
"TiltLeft": CameraState(rotation=-7.5),
|
||||||
|
"TiltRight": CameraState(rotation=7.5),
|
||||||
|
"NoTilt": CameraState(), # Full reset to default state
|
||||||
|
# Pan (halved)
|
||||||
|
"PanLeft": CameraState(pan_x=-0.1),
|
||||||
|
"PanRight": CameraState(pan_x=0.1),
|
||||||
|
"PanUp": CameraState(pan_y=-0.075),
|
||||||
|
"PanDown": CameraState(pan_y=0.075),
|
||||||
|
"PanCenter": CameraState(pan_x=0.0, pan_y=0.0),
|
||||||
|
# Reset all
|
||||||
|
"Reset": CameraState(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class NarrationPause:
|
||||||
|
"""A pause in the narration timeline for an interstitial video."""
|
||||||
|
|
||||||
|
output_time: float # When the pause starts in the OUTPUT timeline
|
||||||
|
narration_time: float # Where we are in the NARRATION source when pause starts
|
||||||
|
duration: float # How long the pause lasts
|
||||||
|
video_id: str # The video that plays during the pause
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class OutroEvent:
|
||||||
|
"""A video that plays as part of the outro sequence (after narration ends)."""
|
||||||
|
|
||||||
|
video_id: str
|
||||||
|
start_time: float # When this outro video starts (in output timeline)
|
||||||
|
end_time: float # When this outro video ends
|
||||||
|
video_source: "VideoSource"
|
||||||
|
cutout: Optional["CutoutDefinition"] = None # None = fullscreen
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class RenderPlan:
|
class RenderPlan:
|
||||||
"""Complete plan for rendering the final video."""
|
"""Complete plan for rendering the final video."""
|
||||||
|
|
||||||
project_path: Path
|
project_path: Path
|
||||||
config: ProjectConfig
|
config: ProjectConfig
|
||||||
talking_head: VideoSource
|
|
||||||
slide_events: list[SlideEvent]
|
slide_events: list[SlideEvent]
|
||||||
total_duration: float
|
total_duration: float
|
||||||
slides: dict[str, SlideDefinition]
|
slides: dict[str, SlideDefinition]
|
||||||
|
videos: dict[str, VideoSource] = field(default_factory=dict)
|
||||||
|
video_events: list[VideoEvent] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # Triggered video overlays
|
||||||
|
narration_videos: list[tuple[str, VideoSource, CutoutDefinition]] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # (video_id, source, cutout)
|
||||||
slides_dir: Path = None # directory containing slide images
|
slides_dir: Path = None # directory containing slide images
|
||||||
talking_head_path: Path = None # Resolved path to actual video file
|
videos_dir: Path = None # directory containing videos.json and video files
|
||||||
|
audio_events: list[AudioEvent] = field(default_factory=list)
|
||||||
|
audio: dict[str, AudioDefinition] = field(default_factory=dict)
|
||||||
|
audio_dir: Path = None # directory containing audio.json and audio files
|
||||||
|
camera_events: list[CameraEvent] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # Virtual camera keyframes
|
||||||
|
# Partial rendering support
|
||||||
|
time_offset: float = (
|
||||||
|
0.0 # Offset subtracted from all timestamps (for partial render)
|
||||||
|
)
|
||||||
|
initial_camera_state: "CameraState" = (
|
||||||
|
None # Camera state at render start (for partial render)
|
||||||
|
)
|
||||||
|
input_seek_time: float = 0.0 # Seek position for input videos (for partial render)
|
||||||
|
# Shared assets support
|
||||||
|
shared_assets_dir: Path = None # Directory containing shared assets (pexels, etc.)
|
||||||
|
# Narration pause support
|
||||||
|
narration_pauses: list[NarrationPause] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # Gaps in narration for interstitial videos
|
||||||
|
# Outro sequence (plays after narration ends)
|
||||||
|
outro_events: list["OutroEvent"] = field(
|
||||||
|
default_factory=list
|
||||||
|
) # Videos that play after narration ends
|
||||||
|
narration_end_time: float = 0.0 # When narration ends (before outro starts)
|
||||||
|
|
||||||
|
|
||||||
# Slide layout configurations (hardcoded for POC)
|
# Slide layout configurations (hardcoded for POC)
|
||||||
|
|||||||
+207
-67
@@ -1,6 +1,5 @@
|
|||||||
"""Extract stage: parse all input files."""
|
"""Extract stage: parse all input files."""
|
||||||
|
|
||||||
import csv
|
|
||||||
import json
|
import json
|
||||||
import re
|
import re
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
@@ -8,21 +7,28 @@ from typing import Any, Optional
|
|||||||
|
|
||||||
from .errors import ParseError
|
from .errors import ParseError
|
||||||
from .models import (
|
from .models import (
|
||||||
|
Attribution,
|
||||||
|
AudioDefinition,
|
||||||
|
Citation,
|
||||||
|
CutoutDefinition,
|
||||||
ProjectConfig,
|
ProjectConfig,
|
||||||
SlideDefinition,
|
SlideDefinition,
|
||||||
TalkingHeadConfig,
|
|
||||||
TimedWord,
|
|
||||||
VideoMetadata,
|
VideoMetadata,
|
||||||
VideoSource,
|
VideoSource,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int, str]]]:
|
def parse_manuscript(
|
||||||
|
project_path: Path,
|
||||||
|
) -> tuple[str, list[str], list[tuple[int, str]], list[Citation]]:
|
||||||
"""
|
"""
|
||||||
Parse manuscript.txt and extract text content and slide markers.
|
Parse manuscript.txt and extract text content and slide markers.
|
||||||
|
|
||||||
|
Strips [cite:...] markers from the returned text so they never pollute
|
||||||
|
alignment contexts. Citations are extracted and returned separately.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Tuple of (full text, list of marker IDs found, list of malformed markers as (line_num, text))
|
Tuple of (full text, list of marker IDs found, list of malformed markers, list of citations)
|
||||||
"""
|
"""
|
||||||
manuscript_path = project_path / "manuscript.txt"
|
manuscript_path = project_path / "manuscript.txt"
|
||||||
|
|
||||||
@@ -31,8 +37,15 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
|
|||||||
|
|
||||||
text = manuscript_path.read_text(encoding="utf-8")
|
text = manuscript_path.read_text(encoding="utf-8")
|
||||||
|
|
||||||
# Extract all valid slide markers like [S1], [S2], etc.
|
# Extract citations before stripping them
|
||||||
markers = re.findall(r"\[([A-Za-z0-9_]+)\]", text)
|
citations = parse_citations(text)
|
||||||
|
|
||||||
|
# Strip [cite:...] markers from text so they don't pollute alignment
|
||||||
|
text = re.sub(r"\[cite:[^\]]+\]", "", text)
|
||||||
|
|
||||||
|
# Extract all valid markers like [S1], [video:demo], [Zoom2], etc.
|
||||||
|
# Include . in pattern to catch markers with file extensions (so validator can warn about them)
|
||||||
|
markers = re.findall(r"\[([A-Za-z0-9_:.]+)\]", text)
|
||||||
|
|
||||||
# Find malformed markers (missing brackets, extra spaces, etc.)
|
# Find malformed markers (missing brackets, extra spaces, etc.)
|
||||||
malformed: list[tuple[int, str]] = []
|
malformed: list[tuple[int, str]] = []
|
||||||
@@ -56,48 +69,75 @@ def parse_manuscript(project_path: Path) -> tuple[str, list[str], list[tuple[int
|
|||||||
for match in spaced:
|
for match in spaced:
|
||||||
malformed.append((line_num, match))
|
malformed.append((line_num, match))
|
||||||
|
|
||||||
return text, markers, malformed
|
return text, markers, malformed, citations
|
||||||
|
|
||||||
|
|
||||||
def parse_transcript(project_path: Path) -> list[TimedWord]:
|
def parse_citations(manuscript_text: str) -> list[Citation]:
|
||||||
"""
|
"""
|
||||||
Parse transcript.csv into a list of timed words.
|
Extract all [cite:...] markers from manuscript text.
|
||||||
|
|
||||||
Expected format:
|
The text after 'cite:' is the literal reference that should appear
|
||||||
t,word
|
in the video description.
|
||||||
0.00,This
|
|
||||||
0.42,is
|
Returns:
|
||||||
...
|
List of Citation objects with reference text and context for alignment.
|
||||||
"""
|
"""
|
||||||
transcript_path = project_path / "transcript.csv"
|
citations = []
|
||||||
|
|
||||||
if not transcript_path.exists():
|
# Match [cite:...] markers - content can include any characters except ]
|
||||||
raise ParseError("transcript.csv not found", transcript_path)
|
# Use a more permissive pattern that handles multi-word citations
|
||||||
|
pattern = r"\[cite:([^\]]+)\]"
|
||||||
|
|
||||||
timed_words = []
|
for match in re.finditer(pattern, manuscript_text):
|
||||||
|
reference = match.group(1).strip()
|
||||||
|
marker_id = f"cite:{reference}"
|
||||||
|
|
||||||
with open(transcript_path, "r", encoding="utf-8") as f:
|
# Extract context: text following the citation (for alignment)
|
||||||
reader = csv.DictReader(f)
|
# Get up to 100 chars after the marker, stopping at next marker or newline
|
||||||
|
end_pos = match.end()
|
||||||
|
context_text = manuscript_text[end_pos : end_pos + 150]
|
||||||
|
|
||||||
if reader.fieldnames is None or "t" not in reader.fieldnames or "word" not in reader.fieldnames:
|
# Clean up context: take text until next marker or double newline
|
||||||
raise ParseError(
|
context_match = re.match(r"([^\[]*?)(?:\[|\n\n|$)", context_text)
|
||||||
"transcript.csv must have columns: t, word",
|
context = context_match.group(1).strip() if context_match else ""
|
||||||
transcript_path
|
|
||||||
|
# Truncate context to ~50 chars for display
|
||||||
|
if len(context) > 50:
|
||||||
|
context = context[:47] + "..."
|
||||||
|
|
||||||
|
citations.append(
|
||||||
|
Citation(
|
||||||
|
reference=reference,
|
||||||
|
marker_id=marker_id,
|
||||||
|
context=context,
|
||||||
)
|
)
|
||||||
|
)
|
||||||
|
|
||||||
for line_num, row in enumerate(reader, start=2): # start=2 because line 1 is header
|
return citations
|
||||||
try:
|
|
||||||
time = float(row["t"])
|
|
||||||
word = row["word"].strip()
|
|
||||||
timed_words.append(TimedWord(time=time, word=word))
|
|
||||||
except (ValueError, KeyError) as e:
|
|
||||||
raise ParseError(
|
|
||||||
f"Invalid row: {e}",
|
|
||||||
transcript_path,
|
|
||||||
line_num
|
|
||||||
)
|
|
||||||
|
|
||||||
return timed_words
|
|
||||||
|
def save_citations(citations: list[Citation], path: Path) -> None:
|
||||||
|
"""Save citations to a JSON file."""
|
||||||
|
data = [
|
||||||
|
{"reference": c.reference, "context": c.context}
|
||||||
|
for c in citations
|
||||||
|
]
|
||||||
|
path.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def load_citations(path: Path) -> list[Citation]:
|
||||||
|
"""Load citations from a JSON file."""
|
||||||
|
if not path.exists():
|
||||||
|
return []
|
||||||
|
data = json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
return [
|
||||||
|
Citation(
|
||||||
|
reference=item["reference"],
|
||||||
|
marker_id=f"cite:{item['reference']}",
|
||||||
|
context=item.get("context", ""),
|
||||||
|
)
|
||||||
|
for item in data
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
def parse_project_config(project_path: Path) -> ProjectConfig:
|
def parse_project_config(project_path: Path) -> ProjectConfig:
|
||||||
@@ -112,16 +152,27 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
|
|||||||
except json.JSONDecodeError as e:
|
except json.JSONDecodeError as e:
|
||||||
raise ParseError(f"Invalid JSON: {e}", config_path)
|
raise ParseError(f"Invalid JSON: {e}", config_path)
|
||||||
|
|
||||||
# Parse talking head config
|
# Parse cutouts (named zones for video placement)
|
||||||
th_data = data.get("talkinghead", {})
|
cutouts: dict[str, CutoutDefinition] = {}
|
||||||
th_height, th_height_pct = _parse_dimension(th_data.get("targetheight", 200))
|
cutouts_data = data.get("cutouts", {})
|
||||||
talking_head = TalkingHeadConfig(
|
for cutout_name, cutout_data in cutouts_data.items():
|
||||||
x=th_data.get("x", 100),
|
x, x_pct = _parse_dimension(cutout_data.get("x", 0))
|
||||||
y=th_data.get("y", 100),
|
y, y_pct = _parse_dimension(cutout_data.get("y", 0))
|
||||||
target_height=th_height,
|
height, height_pct = _parse_dimension(cutout_data.get("height", 200))
|
||||||
target_height_percent=th_height_pct,
|
# Width defaults to same as height (square) if not specified
|
||||||
file=th_data.get("file"),
|
width, width_pct = _parse_dimension(
|
||||||
)
|
cutout_data.get("width", cutout_data.get("height", 200))
|
||||||
|
)
|
||||||
|
cutouts[cutout_name] = CutoutDefinition(
|
||||||
|
x=x,
|
||||||
|
y=y,
|
||||||
|
height=height,
|
||||||
|
width=width,
|
||||||
|
x_percent=x_pct,
|
||||||
|
y_percent=y_pct,
|
||||||
|
height_percent=height_pct,
|
||||||
|
width_percent=width_pct,
|
||||||
|
)
|
||||||
|
|
||||||
# Parse resolution
|
# Parse resolution
|
||||||
resolution = data.get("resolution", [1920, 1080])
|
resolution = data.get("resolution", [1920, 1080])
|
||||||
@@ -131,12 +182,19 @@ def parse_project_config(project_path: Path) -> ProjectConfig:
|
|||||||
return ProjectConfig(
|
return ProjectConfig(
|
||||||
resolution=tuple(resolution),
|
resolution=tuple(resolution),
|
||||||
fps=data.get("fps", 30),
|
fps=data.get("fps", 30),
|
||||||
talking_head=talking_head,
|
|
||||||
default_slide_type=data.get("defaultSlideType", "square"),
|
default_slide_type=data.get("defaultSlideType", "square"),
|
||||||
|
cutouts=cutouts,
|
||||||
background=data.get("background", ""),
|
background=data.get("background", ""),
|
||||||
background_video=data.get("background_video", ""), # Deprecated
|
background_video=data.get("background_video", ""), # Deprecated
|
||||||
slides_path=data.get("slides", "slides.json"),
|
slides_path=data.get("slides", "slides.json"),
|
||||||
|
videos_path=data.get("videos", "videos.json"),
|
||||||
|
audio_path=data.get("audio", "audio.json"),
|
||||||
audio_source=data.get("audio_source"),
|
audio_source=data.get("audio_source"),
|
||||||
|
main_video=data.get("main_video"),
|
||||||
|
gnommo_scratch=data.get("gnommo_scratch"),
|
||||||
|
outro=data.get("outro", []),
|
||||||
|
description=data.get("description", ""),
|
||||||
|
footer=data.get("footer", ""),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -157,7 +215,9 @@ def _parse_dimension(value: Any) -> tuple[int, float]:
|
|||||||
return 200, 0.0 # default
|
return 200, 0.0 # default
|
||||||
|
|
||||||
|
|
||||||
def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str, SlideDefinition]:
|
def parse_slides(
|
||||||
|
project_path: Path, config: ProjectConfig = None
|
||||||
|
) -> dict[str, SlideDefinition]:
|
||||||
"""Parse slides.json into slide definitions."""
|
"""Parse slides.json into slide definitions."""
|
||||||
if config and config.slides_path:
|
if config and config.slides_path:
|
||||||
slides_path = project_path / config.slides_path
|
slides_path = project_path / config.slides_path
|
||||||
@@ -176,8 +236,7 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
|
|||||||
for slide_id, slide_data in data.items():
|
for slide_id, slide_data in data.items():
|
||||||
if "image" not in slide_data:
|
if "image" not in slide_data:
|
||||||
raise ParseError(
|
raise ParseError(
|
||||||
f"Slide '{slide_id}' missing required field 'image'",
|
f"Slide '{slide_id}' missing required field 'image'", slides_path
|
||||||
slides_path
|
|
||||||
)
|
)
|
||||||
slides[slide_id] = SlideDefinition(
|
slides[slide_id] = SlideDefinition(
|
||||||
image=slide_data["image"],
|
image=slide_data["image"],
|
||||||
@@ -187,12 +246,67 @@ def parse_slides(project_path: Path, config: ProjectConfig = None) -> dict[str,
|
|||||||
return slides
|
return slides
|
||||||
|
|
||||||
|
|
||||||
def parse_videos(project_path: Path) -> dict[str, VideoSource]:
|
def parse_audio(
|
||||||
"""Parse videos.json into video source definitions."""
|
project_path: Path, config: Optional[ProjectConfig] = None
|
||||||
videos_path = project_path / "videos.json"
|
) -> tuple[dict[str, AudioDefinition], Path]:
|
||||||
|
"""
|
||||||
|
Parse audio.json into audio definitions.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (audio dict, audio_dir) where audio_dir is the directory
|
||||||
|
containing audio.json (for resolving relative file paths).
|
||||||
|
"""
|
||||||
|
if config and config.audio_path:
|
||||||
|
audio_path = project_path / config.audio_path
|
||||||
|
else:
|
||||||
|
audio_path = project_path / "audio.json"
|
||||||
|
|
||||||
|
# Audio is optional - return empty dict if not found
|
||||||
|
if not audio_path.exists():
|
||||||
|
return {}, project_path
|
||||||
|
|
||||||
|
audio_dir = audio_path.parent
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(audio_path.read_text(encoding="utf-8"))
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
raise ParseError(f"Invalid JSON: {e}", audio_path)
|
||||||
|
|
||||||
|
audio = {}
|
||||||
|
for audio_id, audio_data in data.items():
|
||||||
|
if "file" not in audio_data:
|
||||||
|
raise ParseError(
|
||||||
|
f"Audio '{audio_id}' missing required field 'file'", audio_path
|
||||||
|
)
|
||||||
|
audio[audio_id] = AudioDefinition(
|
||||||
|
file=audio_data["file"],
|
||||||
|
volume=float(audio_data.get("volume", 1.0)),
|
||||||
|
loop=bool(audio_data.get("loop", False)),
|
||||||
|
ignore_pauses=bool(audio_data.get("ignore_pauses", False)),
|
||||||
|
)
|
||||||
|
|
||||||
|
return audio, audio_dir
|
||||||
|
|
||||||
|
|
||||||
|
def parse_videos(
|
||||||
|
project_path: Path, config: Optional[ProjectConfig] = None
|
||||||
|
) -> tuple[dict[str, VideoSource], Path]:
|
||||||
|
"""
|
||||||
|
Parse videos.json into video source definitions.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (videos dict, videos_dir) where videos_dir is the directory
|
||||||
|
containing videos.json (for resolving relative file paths).
|
||||||
|
"""
|
||||||
|
if config and config.videos_path:
|
||||||
|
videos_path = project_path / config.videos_path
|
||||||
|
else:
|
||||||
|
videos_path = project_path / "videos.json"
|
||||||
|
|
||||||
if not videos_path.exists():
|
if not videos_path.exists():
|
||||||
raise ParseError("videos.json not found", videos_path)
|
raise ParseError(f"videos.json not found: {videos_path}", videos_path)
|
||||||
|
|
||||||
|
videos_dir = videos_path.parent
|
||||||
|
|
||||||
try:
|
try:
|
||||||
data = json.loads(videos_path.read_text(encoding="utf-8"))
|
data = json.loads(videos_path.read_text(encoding="utf-8"))
|
||||||
@@ -201,18 +315,37 @@ def parse_videos(project_path: Path) -> dict[str, VideoSource]:
|
|||||||
|
|
||||||
videos = {}
|
videos = {}
|
||||||
for video_id, video_data in data.items():
|
for video_id, video_data in data.items():
|
||||||
if "file" not in video_data:
|
if "source_file" not in video_data:
|
||||||
raise ParseError(
|
raise ParseError(
|
||||||
f"Video '{video_id}' missing required field 'file'",
|
f"Video '{video_id}' missing required field 'source_file'", videos_path
|
||||||
videos_path
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Parse attribution if present
|
||||||
|
attribution = None
|
||||||
|
if "attribution" in video_data:
|
||||||
|
attr_data = video_data["attribution"]
|
||||||
|
attribution = Attribution(
|
||||||
|
source=attr_data.get("source", "unknown"),
|
||||||
|
creator=attr_data.get("creator", "Unknown"),
|
||||||
|
url=attr_data.get("url"),
|
||||||
|
)
|
||||||
|
|
||||||
videos[video_id] = VideoSource(
|
videos[video_id] = VideoSource(
|
||||||
file=video_data["file"],
|
source_file=video_data["source_file"],
|
||||||
preprocess=video_data.get("preprocess", []),
|
filter=video_data.get("filter", []),
|
||||||
output_file=video_data.get("output_file"),
|
output_file=video_data.get("output_file"),
|
||||||
|
take=video_data.get("take"),
|
||||||
|
skip=video_data.get("skip", 0.0),
|
||||||
|
zoom=video_data.get("zoom", 1.0),
|
||||||
|
cutout=video_data.get("cutout"),
|
||||||
|
always_visible=video_data.get("always_visible", False),
|
||||||
|
is_shared=video_data.get("is_shared", False),
|
||||||
|
pause_narration=float(video_data.get("pause_narration", 0)),
|
||||||
|
attribution=attribution,
|
||||||
|
use_audio_channels=video_data.get("use_audio_channels", "both"),
|
||||||
)
|
)
|
||||||
|
|
||||||
return videos
|
return videos, videos_dir
|
||||||
|
|
||||||
|
|
||||||
def get_video_duration(video_path: Path) -> float:
|
def get_video_duration(video_path: Path) -> float:
|
||||||
@@ -221,10 +354,13 @@ def get_video_duration(video_path: Path) -> float:
|
|||||||
|
|
||||||
cmd = [
|
cmd = [
|
||||||
"ffprobe",
|
"ffprobe",
|
||||||
"-v", "error",
|
"-v",
|
||||||
"-show_entries", "format=duration",
|
"error",
|
||||||
"-of", "default=noprint_wrappers=1:nokey=1",
|
"-show_entries",
|
||||||
str(video_path)
|
"format=duration",
|
||||||
|
"-of",
|
||||||
|
"default=noprint_wrappers=1:nokey=1",
|
||||||
|
str(video_path),
|
||||||
]
|
]
|
||||||
|
|
||||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
@@ -261,7 +397,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
|
|||||||
raise ParseError(f"Invalid JSON: {e}", metadata_path)
|
raise ParseError(f"Invalid JSON: {e}", metadata_path)
|
||||||
|
|
||||||
if "source_file" not in data:
|
if "source_file" not in data:
|
||||||
raise ParseError("Video metadata missing required field 'source_file'", metadata_path)
|
raise ParseError(
|
||||||
|
"Video metadata missing required field 'source_file'", metadata_path
|
||||||
|
)
|
||||||
|
|
||||||
return VideoMetadata(
|
return VideoMetadata(
|
||||||
source_file=data["source_file"],
|
source_file=data["source_file"],
|
||||||
@@ -270,7 +408,9 @@ def parse_video_metadata(metadata_path: Path) -> VideoMetadata:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def resolve_video_file(project_path: Path, file_ref: str) -> tuple[Path, Optional[VideoMetadata]]:
|
def resolve_video_file(
|
||||||
|
project_path: Path, file_ref: str
|
||||||
|
) -> tuple[Path, Optional[VideoMetadata]]:
|
||||||
"""
|
"""
|
||||||
Resolve a video file reference, which can be either:
|
Resolve a video file reference, which can be either:
|
||||||
1. A direct path to a video file
|
1. A direct path to a video file
|
||||||
|
|||||||
+1445
-64
File diff suppressed because it is too large
Load Diff
+840
-71
File diff suppressed because it is too large
Load Diff
+11
-11
@@ -11,6 +11,7 @@ from .errors import GnommoError
|
|||||||
@dataclass
|
@dataclass
|
||||||
class TranscribedWord:
|
class TranscribedWord:
|
||||||
"""A word with its timestamp from transcription."""
|
"""A word with its timestamp from transcription."""
|
||||||
|
|
||||||
word: str
|
word: str
|
||||||
start: float
|
start: float
|
||||||
end: float
|
end: float
|
||||||
@@ -18,6 +19,7 @@ class TranscribedWord:
|
|||||||
|
|
||||||
class TranscriptionError(GnommoError):
|
class TranscriptionError(GnommoError):
|
||||||
"""Error during transcription."""
|
"""Error during transcription."""
|
||||||
|
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
@@ -57,21 +59,20 @@ def transcribe_video(video_path: Path, model: str = "base") -> list[TranscribedW
|
|||||||
|
|
||||||
for segment in result.get("segments", []):
|
for segment in result.get("segments", []):
|
||||||
for word_info in segment.get("words", []):
|
for word_info in segment.get("words", []):
|
||||||
words.append(TranscribedWord(
|
words.append(
|
||||||
word=word_info["word"].strip(),
|
TranscribedWord(
|
||||||
start=word_info["start"],
|
word=word_info["word"].strip(),
|
||||||
end=word_info["end"],
|
start=word_info["start"],
|
||||||
))
|
end=word_info["end"],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
return words
|
return words
|
||||||
|
|
||||||
|
|
||||||
def save_transcript(words: list[TranscribedWord], output_path: Path) -> None:
|
def save_transcript(words: list[TranscribedWord], output_path: Path) -> None:
|
||||||
"""Save transcribed words to a JSON file."""
|
"""Save transcribed words to a JSON file."""
|
||||||
data = [
|
data = [{"word": w.word, "start": w.start, "end": w.end} for w in words]
|
||||||
{"word": w.word, "start": w.start, "end": w.end}
|
|
||||||
for w in words
|
|
||||||
]
|
|
||||||
|
|
||||||
with open(output_path, "w", encoding="utf-8") as f:
|
with open(output_path, "w", encoding="utf-8") as f:
|
||||||
json.dump(data, f, indent=2)
|
json.dump(data, f, indent=2)
|
||||||
@@ -86,6 +87,5 @@ def load_transcript(transcript_path: Path) -> list[TranscribedWord]:
|
|||||||
data = json.load(f)
|
data = json.load(f)
|
||||||
|
|
||||||
return [
|
return [
|
||||||
TranscribedWord(word=w["word"], start=w["start"], end=w["end"])
|
TranscribedWord(word=w["word"], start=w["start"], end=w["end"]) for w in data
|
||||||
for w in data
|
|
||||||
]
|
]
|
||||||
|
|||||||
+929
-57
File diff suppressed because it is too large
Load Diff
+140
-55
@@ -3,7 +3,13 @@
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from .errors import ValidationError, ValidationIssue
|
from .errors import ValidationError, ValidationIssue
|
||||||
from .models import ProjectConfig, SlideDefinition, VideoSource, SLIDE_LAYOUTS
|
from .models import (
|
||||||
|
ProjectConfig,
|
||||||
|
SlideDefinition,
|
||||||
|
VideoSource,
|
||||||
|
SLIDE_LAYOUTS,
|
||||||
|
CAMERA_PRESETS,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def validate_project(
|
def validate_project(
|
||||||
@@ -12,6 +18,7 @@ def validate_project(
|
|||||||
config: ProjectConfig,
|
config: ProjectConfig,
|
||||||
slides: dict[str, SlideDefinition],
|
slides: dict[str, SlideDefinition],
|
||||||
videos: dict[str, VideoSource],
|
videos: dict[str, VideoSource],
|
||||||
|
videos_dir: Path,
|
||||||
malformed_markers: list[tuple[int, str]] = None,
|
malformed_markers: list[tuple[int, str]] = None,
|
||||||
) -> None:
|
) -> None:
|
||||||
"""
|
"""
|
||||||
@@ -30,19 +37,59 @@ def validate_project(
|
|||||||
# Check for malformed markers first (these are likely typos)
|
# Check for malformed markers first (these are likely typos)
|
||||||
if malformed_markers:
|
if malformed_markers:
|
||||||
for line_num, marker_text in malformed_markers:
|
for line_num, marker_text in malformed_markers:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Malformed marker: {marker_text}",
|
ValidationIssue(
|
||||||
project_path / "manuscript.txt",
|
f"Malformed marker: {marker_text}",
|
||||||
line_num
|
project_path / "manuscript.txt",
|
||||||
))
|
line_num,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check all manuscript markers have corresponding slides
|
# Check all manuscript markers have corresponding slides or videos
|
||||||
for marker in manuscript_markers:
|
for marker in manuscript_markers:
|
||||||
|
# Skip camera effect markers (Zoom0, TiltLeft, Reset, etc.)
|
||||||
|
if marker in CAMERA_PRESETS:
|
||||||
|
continue
|
||||||
|
# Skip audio markers (start with 'A' followed by audio id, e.g., Awoosh)
|
||||||
|
if marker.startswith("A") and len(marker) > 1 and marker[1:].isalnum():
|
||||||
|
continue
|
||||||
|
# Validate video trigger markers (video:xxx) - slide-like videos
|
||||||
|
if marker.startswith("video:"):
|
||||||
|
video_id = marker[6:] # Remove 'video:' prefix
|
||||||
|
if video_id not in videos:
|
||||||
|
# Check if it's a file extension mismatch
|
||||||
|
hint = ""
|
||||||
|
if "." in video_id:
|
||||||
|
base_name = video_id.rsplit(".", 1)[0]
|
||||||
|
if base_name in videos:
|
||||||
|
hint = f" (Did you mean [video:{base_name}]? Don't include file extensions in markers)"
|
||||||
|
issues.append(
|
||||||
|
ValidationIssue(
|
||||||
|
f"Video marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json{hint}",
|
||||||
|
project_path / "manuscript.txt",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Validate narration trigger markers (narration:xxx) - continuous videos
|
||||||
|
if marker.startswith("narration:"):
|
||||||
|
video_id = marker[10:] # Remove 'narration:' prefix
|
||||||
|
if video_id not in videos:
|
||||||
|
issues.append(
|
||||||
|
ValidationIssue(
|
||||||
|
f"Narration marker [{marker}] referenced in manuscript but '{video_id}' not defined in videos.json",
|
||||||
|
project_path / "manuscript.txt",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
if marker not in slides:
|
if marker not in slides:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
|
ValidationIssue(
|
||||||
project_path / "manuscript.txt"
|
f"Slide marker [{marker}] referenced in manuscript but not defined in slides.json",
|
||||||
))
|
project_path / "manuscript.txt",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check all slide images exist
|
# Check all slide images exist
|
||||||
# Slides are in the same directory as the slides.json file
|
# Slides are in the same directory as the slides.json file
|
||||||
@@ -52,37 +99,68 @@ def validate_project(
|
|||||||
for slide_id, slide_def in slides.items():
|
for slide_id, slide_def in slides.items():
|
||||||
image_path = slides_dir / slide_def.image
|
image_path = slides_dir / slide_def.image
|
||||||
if not image_path.exists():
|
if not image_path.exists():
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Slide image not found: {slide_def.image}",
|
ValidationIssue(
|
||||||
slides_json_path
|
f"Slide image not found: {slide_def.image}", slides_json_path
|
||||||
))
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check slide type is valid
|
# Check slide type is valid
|
||||||
if slide_def.type not in SLIDE_LAYOUTS:
|
if slide_def.type not in SLIDE_LAYOUTS:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
|
ValidationIssue(
|
||||||
f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
|
f"Unknown slide type '{slide_def.type}' for slide {slide_id}. "
|
||||||
project_path / "slides.json"
|
f"Valid types: {list(SLIDE_LAYOUTS.keys())}",
|
||||||
))
|
project_path / "slides.json",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check all video files exist (paths relative to videos_dir or shared_assets)
|
||||||
|
videos_json_path = project_path / config.videos_path
|
||||||
|
|
||||||
|
# Find shared_assets directory
|
||||||
|
shared_assets_dir = None
|
||||||
|
if (project_path / "shared_assets").exists():
|
||||||
|
shared_assets_dir = project_path / "shared_assets"
|
||||||
|
elif (project_path.parent / "shared_assets").exists():
|
||||||
|
shared_assets_dir = project_path.parent / "shared_assets"
|
||||||
|
|
||||||
# Check all video files exist
|
|
||||||
for video_id, video_source in videos.items():
|
for video_id, video_source in videos.items():
|
||||||
video_path = project_path / video_source.file
|
# Determine base directory based on is_shared flag
|
||||||
if not video_path.exists():
|
if video_source.is_shared:
|
||||||
issues.append(ValidationIssue(
|
if shared_assets_dir:
|
||||||
f"Video file not found: {video_source.file}",
|
base_dir = shared_assets_dir
|
||||||
project_path / "videos.json"
|
else:
|
||||||
))
|
issues.append(
|
||||||
|
ValidationIssue(
|
||||||
|
f"Video '{video_id}' has is_shared=true but shared_assets directory not found",
|
||||||
|
videos_json_path,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
base_dir = videos_dir
|
||||||
|
|
||||||
# Check preprocessed output exists if preprocessing is defined
|
video_path = base_dir / video_source.source_file
|
||||||
if video_source.preprocess and video_source.output_file:
|
if not video_path.exists():
|
||||||
output_path = project_path / video_source.output_file
|
issues.append(
|
||||||
|
ValidationIssue(
|
||||||
|
f"Video file not found: {video_source.source_file}",
|
||||||
|
videos_json_path,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check preprocessed output exists if filters are defined
|
||||||
|
if video_source.filter and video_source.output_file:
|
||||||
|
output_path = base_dir / video_source.output_file
|
||||||
if not output_path.exists():
|
if not output_path.exists():
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Preprocessed output not found: {video_source.output_file}. "
|
ValidationIssue(
|
||||||
f"Run with -a preprocess first.",
|
f"Preprocessed output not found: {video_source.output_file}. "
|
||||||
project_path / "videos.json"
|
f"Run with -a preprocess first.",
|
||||||
))
|
videos_json_path,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check background exists (image or video)
|
# Check background exists (image or video)
|
||||||
# Try 'background' first, fall back to deprecated 'background_video'
|
# Try 'background' first, fall back to deprecated 'background_video'
|
||||||
@@ -94,38 +172,45 @@ def validate_project(
|
|||||||
# Try parent directory (shared_assets at repo root)
|
# Try parent directory (shared_assets at repo root)
|
||||||
bg_path = project_path.parent / bg_file
|
bg_path = project_path.parent / bg_file
|
||||||
if not bg_path.exists():
|
if not bg_path.exists():
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Background not found: {bg_file}",
|
ValidationIssue(
|
||||||
project_path / "project.json"
|
f"Background not found: {bg_file}", project_path / "project.json"
|
||||||
))
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check we have at least one video source
|
# Check we have at least one video source
|
||||||
if not videos:
|
if not videos:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
"No video sources defined in videos.json",
|
ValidationIssue(
|
||||||
project_path / "videos.json"
|
"No video sources defined in videos.json", project_path / "videos.json"
|
||||||
))
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check resolution is reasonable
|
# Check resolution is reasonable
|
||||||
width, height = config.resolution
|
width, height = config.resolution
|
||||||
if width < 100 or height < 100:
|
if width < 100 or height < 100:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Resolution too small: {width}x{height}",
|
ValidationIssue(
|
||||||
project_path / "project.json"
|
f"Resolution too small: {width}x{height}", project_path / "project.json"
|
||||||
))
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if width > 7680 or height > 4320:
|
if width > 7680 or height > 4320:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Resolution too large: {width}x{height} (max 8K)",
|
ValidationIssue(
|
||||||
project_path / "project.json"
|
f"Resolution too large: {width}x{height} (max 8K)",
|
||||||
))
|
project_path / "project.json",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# Check FPS is reasonable
|
# Check FPS is reasonable
|
||||||
if config.fps < 1 or config.fps > 120:
|
if config.fps < 1 or config.fps > 120:
|
||||||
issues.append(ValidationIssue(
|
issues.append(
|
||||||
f"Invalid FPS: {config.fps} (must be 1-120)",
|
ValidationIssue(
|
||||||
project_path / "project.json"
|
f"Invalid FPS: {config.fps} (must be 1-120)",
|
||||||
))
|
project_path / "project.json",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
# If any issues, raise ValidationError
|
# If any issues, raise ValidationError
|
||||||
if issues:
|
if issues:
|
||||||
|
|||||||
@@ -0,0 +1,6 @@
|
|||||||
|
import gnommo
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
print("This is the main module.")
|
||||||
|
|
||||||
|
gnommo.main()
|
||||||
@@ -0,0 +1,2 @@
|
|||||||
|
openai-whisper
|
||||||
|
|
||||||
@@ -0,0 +1,476 @@
|
|||||||
|
# Gnommo Feature Development Roadmap
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Features to standardize the Keynote-to-YouTube workflow, so that once the presentation is complete, only a standardized recording session stands between you and a finished video.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Video Description Generator
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> description`
|
||||||
|
|
||||||
|
Generate a complete YouTube description with citations, attributions, and chapters.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.1 Manuscript Citations (`[cite:...]`)
|
||||||
|
|
||||||
|
Citations embedded in the manuscript represent sources, references, or links mentioned during narration. The text after `cite:` is the **literal reference** that should appear in the description.
|
||||||
|
|
||||||
|
**Format in manuscript.txt:**
|
||||||
|
```
|
||||||
|
[cite:Reference text exactly as it should appear]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
```
|
||||||
|
[S3]
|
||||||
|
According to this study [cite:Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper],
|
||||||
|
the effect is significant.
|
||||||
|
|
||||||
|
[S7]
|
||||||
|
I'm using [cite:Keynote by Apple - https://apple.com/keynote] for all my presentations.
|
||||||
|
|
||||||
|
[S12]
|
||||||
|
This technique was pioneered by [cite:Dr. Jane Doe, MIT Media Lab].
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output in description:**
|
||||||
|
```
|
||||||
|
SOURCES & REFERENCES
|
||||||
|
━━━━━━━━━━━━━━━━━━━━
|
||||||
|
1:23 - Smith et al. (2024) "Effects of AI on Productivity" - https://example.com/paper
|
||||||
|
4:56 - Keynote by Apple - https://apple.com/keynote
|
||||||
|
8:30 - Dr. Jane Doe, MIT Media Lab
|
||||||
|
```
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Parse `[cite:...]` markers from manuscript.txt
|
||||||
|
- Extract the literal text after `cite:` as the reference
|
||||||
|
- Align citations to timestamps (same fuzzy matching as other markers)
|
||||||
|
- Group citations in order of appearance
|
||||||
|
- Citations are NOT aligned for rendering (ignored by renderer) but ARE timestamped for description
|
||||||
|
|
||||||
|
**Note:** `[cite:...]` markers should not affect video rendering or narration alignment - they are metadata-only markers for description generation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 1.2 Pexels/Stock Footage Attribution
|
||||||
|
|
||||||
|
Attribution for Pexels content is **not legally required** but is appreciated and professional.
|
||||||
|
|
||||||
|
**Official Pexels attribution format:**
|
||||||
|
```
|
||||||
|
by [Contributor Name] via Pexels
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- Extend `videos.json` to include attribution metadata:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"beach_waves": {
|
||||||
|
"source_file": "pexels/beach.mp4",
|
||||||
|
"is_shared": true,
|
||||||
|
"attribution": {
|
||||||
|
"source": "pexels",
|
||||||
|
"creator": "John Doe",
|
||||||
|
"url": "https://pexels.com/video/12345"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- Auto-detect Pexels videos from `shared_assets/pexels/` folder
|
||||||
|
- Support Pexels metadata JSON files (if downloaded with video)
|
||||||
|
- Generate attribution section for video description:
|
||||||
|
```
|
||||||
|
STOCK FOOTAGE
|
||||||
|
━━━━━━━━━━━━━
|
||||||
|
Beach waves by John Doe via Pexels: https://pexels.com/video/12345
|
||||||
|
City timelapse by Jane Smith via Pexels: https://pexels.com/video/67890
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pexels License Notes** (from pexels.com/license):
|
||||||
|
- Free for personal and commercial use
|
||||||
|
- Attribution not required but appreciated
|
||||||
|
- Cannot sell unaltered copies
|
||||||
|
- Cannot redistribute on other stock platforms
|
||||||
|
|
||||||
|
### 1.3 Complete Description Output
|
||||||
|
|
||||||
|
**Output file:** `out/description_youtube.txt`
|
||||||
|
|
||||||
|
Combine all elements into a ready-to-paste YouTube description.
|
||||||
|
|
||||||
|
**Structure:**
|
||||||
|
```
|
||||||
|
[Video description from project.json "description" field]
|
||||||
|
|
||||||
|
CHAPTERS
|
||||||
|
━━━━━━━━
|
||||||
|
0:00 Introduction
|
||||||
|
1:23 Topic One
|
||||||
|
3:45 Topic Two
|
||||||
|
...
|
||||||
|
|
||||||
|
REFERENCES
|
||||||
|
━━━━━━━━━━
|
||||||
|
1:23 - Smith et al. (2024) "AI Study" - https://example.com
|
||||||
|
4:56 - Keynote by Apple - https://apple.com/keynote
|
||||||
|
...
|
||||||
|
|
||||||
|
STOCK FOOTAGE
|
||||||
|
━━━━━━━━━━━━━
|
||||||
|
Beach waves by John Doe via Pexels: https://pexels.com/video/12345
|
||||||
|
...
|
||||||
|
|
||||||
|
[Optional footer from project.json "footer" field - social links, subscribe CTA, etc.]
|
||||||
|
```
|
||||||
|
|
||||||
|
**project.json additions:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"description": "In this video, I walk through the complete Gnommo workflow for creating YouTube videos from Keynote presentations.",
|
||||||
|
"footer": "Subscribe for more tutorials: https://youtube.com/@channel\nTwitter: https://twitter.com/handle"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Pull video description from `project.json` "description" field
|
||||||
|
- Generate chapters from slide markers (see Section 2)
|
||||||
|
- Collect all `[cite:...]` references with timestamps
|
||||||
|
- Collect all Pexels/stock attributions from `videos.json`
|
||||||
|
- Append optional footer from `project.json` "footer" field
|
||||||
|
- Output to `out/description_youtube.txt`
|
||||||
|
- Sections with no content are omitted (e.g., no STOCK FOOTAGE section if none used)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. YouTube Chapter Markers
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> chapters`
|
||||||
|
|
||||||
|
Auto-generate chapter timestamps from slide markers.
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Extract chapter titles from:
|
||||||
|
- Keynote slide titles (via presenter notes import)
|
||||||
|
- First sentence after each `[SN]` marker
|
||||||
|
- Optional `[chapter:Title]` markers for explicit chapter names
|
||||||
|
- Calculate timestamps from aligned marker timings
|
||||||
|
- Output copy-paste ready format:
|
||||||
|
```
|
||||||
|
CHAPTERS
|
||||||
|
━━━━━━━━
|
||||||
|
0:00 Introduction
|
||||||
|
1:23 What is Gnommo?
|
||||||
|
3:45 Setting Up Your Project
|
||||||
|
7:12 Recording Tips
|
||||||
|
10:30 Rendering Your Video
|
||||||
|
12:45 Outro
|
||||||
|
```
|
||||||
|
- Option to merge small chapters (minimum duration threshold)
|
||||||
|
- Support for nested chapters (main topics + subtopics)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Subtitle/Caption Export
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> subtitles`
|
||||||
|
|
||||||
|
Generate subtitle files from Whisper transcription.
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Export formats: SRT, VTT, TXT
|
||||||
|
- Use existing word-level timestamps from transcription
|
||||||
|
- Smart line breaking (max characters per line, break at punctuation)
|
||||||
|
- Speaker diarization support (future: multiple speakers)
|
||||||
|
- Options:
|
||||||
|
- `--format srt|vtt|txt`
|
||||||
|
- `--max-chars 42` (characters per line)
|
||||||
|
- `--max-duration 5` (seconds per subtitle block)
|
||||||
|
|
||||||
|
**Example output (SRT):**
|
||||||
|
```
|
||||||
|
1
|
||||||
|
00:00:01,500 --> 00:00:04,200
|
||||||
|
Hello and welcome to this tutorial
|
||||||
|
on video editing with Gnommo.
|
||||||
|
|
||||||
|
2
|
||||||
|
00:00:04,500 --> 00:00:07,800
|
||||||
|
Today we're going to cover
|
||||||
|
the complete workflow.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Thumbnail Generation
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> thumbnail`
|
||||||
|
|
||||||
|
Auto-generate thumbnail candidates from slides.
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Designate thumbnail slides with `[thumbnail]` marker
|
||||||
|
- If no marker, use slide 1 or title slide
|
||||||
|
- Apply text overlays from config:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"thumbnail": {
|
||||||
|
"title_text": "Episode ${episode_number}",
|
||||||
|
"subtitle_text": "${title}",
|
||||||
|
"font": "Impact",
|
||||||
|
"text_color": "#FFFFFF",
|
||||||
|
"outline_color": "#000000",
|
||||||
|
"position": "bottom-left"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- Generate multiple variants:
|
||||||
|
- With/without text overlay
|
||||||
|
- Different zoom levels
|
||||||
|
- Different color treatments (saturated, high contrast)
|
||||||
|
- Output to `out/thumbnails/` folder
|
||||||
|
- Resolution: 1280x720 (YouTube standard)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Intro/Outro Templates
|
||||||
|
|
||||||
|
**Configuration in project.json:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"intro": {
|
||||||
|
"template": "templates/intro_v2.mp4",
|
||||||
|
"duration": 3.5,
|
||||||
|
"transition": "fade",
|
||||||
|
"variables": {
|
||||||
|
"episode_number": "12",
|
||||||
|
"title": "Getting Started with Gnommo"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outro": {
|
||||||
|
"template": "templates/outro_subscribe.mp4",
|
||||||
|
"duration": 8.0,
|
||||||
|
"transition": "fade"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Define intro/outro templates in `shared_assets/templates/`
|
||||||
|
- Auto-prepend intro before first slide
|
||||||
|
- Auto-append outro after last slide
|
||||||
|
- Support variable substitution in templates (episode number, title)
|
||||||
|
- Configurable transition types (fade, cut, wipe)
|
||||||
|
- End screen safe zone support (last 20 seconds)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Multi-Platform Format Presets
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> render --format <preset>`
|
||||||
|
|
||||||
|
**Presets:**
|
||||||
|
| Preset | Aspect | Resolution | Notes |
|
||||||
|
|--------|--------|------------|-------|
|
||||||
|
| `youtube` | 16:9 | 1920x1080 | Default, standard horizontal |
|
||||||
|
| `youtube-4k` | 16:9 | 3840x2160 | 4K export |
|
||||||
|
| `shorts` | 9:16 | 1080x1920 | Vertical, auto-reframe slides |
|
||||||
|
| `podcast` | - | Audio only | MP3/M4A export for podcast feeds |
|
||||||
|
| `square` | 1:1 | 1080x1080 | Instagram/LinkedIn |
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Auto-adjust cutout positions per format
|
||||||
|
- Smart slide reframing for vertical (zoom to content area)
|
||||||
|
- Separate output folders per format
|
||||||
|
- Batch export to multiple formats: `--format youtube,shorts,podcast`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Teleprompter Script Generation
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> teleprompter`
|
||||||
|
|
||||||
|
Extract clean narration text for teleprompter display.
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Strip all markers from manuscript
|
||||||
|
- Keep only spoken text
|
||||||
|
- Output formats:
|
||||||
|
- `--format txt` - Plain text
|
||||||
|
- `--format html` - Scrollable HTML page with large font
|
||||||
|
- `--format json` - For teleprompter apps
|
||||||
|
- Optional: Include slide thumbnails as visual cues
|
||||||
|
- Configurable font size and scroll speed hints
|
||||||
|
|
||||||
|
**Example HTML output:**
|
||||||
|
```html
|
||||||
|
<div class="teleprompter">
|
||||||
|
<p class="cue">[SLIDE: Introduction]</p>
|
||||||
|
<p>Hello and welcome to this tutorial on video editing with Gnommo.</p>
|
||||||
|
<p class="cue">[SLIDE: What is Gnommo?]</p>
|
||||||
|
<p>Gnommo is a code-first video editing pipeline...</p>
|
||||||
|
</div>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Recording Checklist Generator
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> checklist`
|
||||||
|
|
||||||
|
Generate a pre-recording checklist based on project configuration.
|
||||||
|
|
||||||
|
**Output includes:**
|
||||||
|
- [ ] Camera settings (resolution, fps from project.json)
|
||||||
|
- [ ] Lighting setup (if green screen detected in videos.json)
|
||||||
|
- [ ] Audio check (microphone levels)
|
||||||
|
- [ ] Props/demos needed (parsed from `[video:...]` markers)
|
||||||
|
- [ ] Slide count and estimated duration
|
||||||
|
- [ ] Teleprompter ready
|
||||||
|
- [ ] Recording space clear
|
||||||
|
|
||||||
|
**Customizable via `checklist_template.md` in project folder.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Audio Normalization
|
||||||
|
|
||||||
|
**Automatic during render or standalone command:**
|
||||||
|
`gnommo -p <project> normalize`
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Target: -14 LUFS (YouTube standard)
|
||||||
|
- Apply loudness normalization to narration track
|
||||||
|
- Preserve dynamic range (avoid over-compression)
|
||||||
|
- Normalize intro/outro audio to match
|
||||||
|
- Option: `--target-lufs -14`
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- Use FFmpeg `loudnorm` filter
|
||||||
|
- Two-pass normalization for accurate results
|
||||||
|
- Report before/after levels
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Project Templates
|
||||||
|
|
||||||
|
**Command:** `gnommo init <project-name> --template <template>`
|
||||||
|
|
||||||
|
**Built-in templates:**
|
||||||
|
| Template | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `tutorial` | Talking head + slides, square slide layout |
|
||||||
|
| `explainer` | Full-screen slides, minimal presenter |
|
||||||
|
| `review` | Product review format, multiple camera angles |
|
||||||
|
| `talking-head` | Full-screen presenter, no slides |
|
||||||
|
| `screencast` | Screen recording with small presenter PIP |
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Templates stored in `~/.gnommo/templates/` or `shared_assets/templates/`
|
||||||
|
- Each template includes:
|
||||||
|
- `project.json` with preset cutouts and settings
|
||||||
|
- `manuscript.txt` skeleton with example markers
|
||||||
|
- Sample `videos.json` structure
|
||||||
|
- User can create custom templates: `gnommo template save <name>`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Batch Processing
|
||||||
|
|
||||||
|
**Command:** `gnommo batch render project1 project2 project3`
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Process multiple projects in sequence
|
||||||
|
- Continue on failure (don't stop batch for one failed project)
|
||||||
|
- Summary report at end:
|
||||||
|
```
|
||||||
|
BATCH COMPLETE
|
||||||
|
━━━━━━━━━━━━━━
|
||||||
|
✓ project1 - rendered in 5:23
|
||||||
|
✓ project2 - rendered in 4:17
|
||||||
|
✗ project3 - failed (missing slide S12)
|
||||||
|
```
|
||||||
|
- Options:
|
||||||
|
- `--parallel 2` - Run N renders in parallel
|
||||||
|
- `--skip-existing` - Skip if `out/final.mp4` exists
|
||||||
|
- `--format youtube,shorts` - Render all formats for each project
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Progress Dashboard
|
||||||
|
|
||||||
|
**Command:** `gnommo status` or `gnommo -p <project> status`
|
||||||
|
|
||||||
|
Display pipeline status for all projects or specific project.
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
PROJECT STATUS
|
||||||
|
━━━━━━━━━━━━━━
|
||||||
|
Project Import Preprocess Transcribe Render Output
|
||||||
|
─────────────────────────────────────────────────────────────
|
||||||
|
video1 ✓ ✓ ✓ ✓ final.mp4 (12:34)
|
||||||
|
video2 ✓ ✓ ✓ ✗ -
|
||||||
|
video3 ✓ ✗ - - -
|
||||||
|
video4 ✗ - - - -
|
||||||
|
```
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Scan all project directories
|
||||||
|
- Check for existence of intermediate files
|
||||||
|
- Show file timestamps and durations
|
||||||
|
- Highlight what needs to be done next
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Recording Session Mode (Future)
|
||||||
|
|
||||||
|
**Command:** `gnommo -p <project> session`
|
||||||
|
|
||||||
|
Live recording assistant mode.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Display current slide on secondary monitor
|
||||||
|
- Show teleprompter text overlay
|
||||||
|
- Keyboard shortcuts to advance slides
|
||||||
|
- Real-time recording with proper settings
|
||||||
|
- Auto-stop at end of manuscript
|
||||||
|
- Voice command support: "next slide", "pause"
|
||||||
|
|
||||||
|
**Note:** This is a stretch goal requiring significant UI work.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Priority
|
||||||
|
|
||||||
|
### Phase 1 - Core YouTube Workflow (High Impact)
|
||||||
|
1. **Video Description Generator** (citations + Pexels attribution)
|
||||||
|
2. **YouTube Chapter Markers**
|
||||||
|
3. **Subtitle/Caption Export**
|
||||||
|
4. **Audio Normalization**
|
||||||
|
|
||||||
|
### Phase 2 - Content Creation Efficiency
|
||||||
|
5. **Thumbnail Generation**
|
||||||
|
6. **Intro/Outro Templates**
|
||||||
|
7. **Teleprompter Script Generation**
|
||||||
|
8. **Recording Checklist Generator**
|
||||||
|
|
||||||
|
### Phase 3 - Scale & Automation
|
||||||
|
9. **Project Templates**
|
||||||
|
10. **Multi-Platform Format Presets**
|
||||||
|
11. **Batch Processing**
|
||||||
|
12. **Progress Dashboard**
|
||||||
|
|
||||||
|
### Phase 4 - Advanced
|
||||||
|
13. **Recording Session Mode**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- All new commands should follow existing CLI pattern: `gnommo -p <project> <command>`
|
||||||
|
- Output files go to `out/` subdirectory by default
|
||||||
|
- All features should support `--dry-run` where applicable
|
||||||
|
- Verbose mode (`-v`) should show detailed progress
|
||||||
Reference in New Issue
Block a user