Adding -insecure flag to hermes dashboard

This commit is contained in:
2026-04-25 11:40:33 +02:00
parent 3628c89481
commit f6d701b125
485 changed files with 167500 additions and 5 deletions
@@ -0,0 +1,3 @@
---
description: Skills for working with media content — YouTube transcripts, GIF search, music generation, and audio visualization.
---
@@ -0,0 +1,86 @@
---
name: gif-search
description: Search and download GIFs from Tenor using curl. No dependencies beyond curl and jq. Useful for finding reaction GIFs, creating visual content, and sending GIFs in chat.
version: 1.1.0
author: Hermes Agent
license: MIT
prerequisites:
env_vars: [TENOR_API_KEY]
commands: [curl, jq]
metadata:
hermes:
tags: [GIF, Media, Search, Tenor, API]
---
# GIF Search (Tenor API)
Search and download GIFs directly via the Tenor API using curl. No extra tools needed.
## Setup
Set your Tenor API key in your environment (add to `~/.hermes/.env`):
```bash
TENOR_API_KEY=your_key_here
```
Get a free API key at https://developers.google.com/tenor/guides/quickstart — the Google Cloud Console Tenor API key is free and has generous rate limits.
## Prerequisites
- `curl` and `jq` (both standard on macOS/Linux)
- `TENOR_API_KEY` environment variable
## Search for GIFs
```bash
# Search and get GIF URLs
curl -s "https://tenor.googleapis.com/v2/search?q=thumbs+up&limit=5&key=${TENOR_API_KEY}" | jq -r '.results[].media_formats.gif.url'
# Get smaller/preview versions
curl -s "https://tenor.googleapis.com/v2/search?q=nice+work&limit=3&key=${TENOR_API_KEY}" | jq -r '.results[].media_formats.tinygif.url'
```
## Download a GIF
```bash
# Search and download the top result
URL=$(curl -s "https://tenor.googleapis.com/v2/search?q=celebration&limit=1&key=${TENOR_API_KEY}" | jq -r '.results[0].media_formats.gif.url')
curl -sL "$URL" -o celebration.gif
```
## Get Full Metadata
```bash
curl -s "https://tenor.googleapis.com/v2/search?q=cat&limit=3&key=${TENOR_API_KEY}" | jq '.results[] | {title: .title, url: .media_formats.gif.url, preview: .media_formats.tinygif.url, dimensions: .media_formats.gif.dims}'
```
## API Parameters
| Parameter | Description |
|-----------|-------------|
| `q` | Search query (URL-encode spaces as `+`) |
| `limit` | Max results (1-50, default 20) |
| `key` | API key (from `$TENOR_API_KEY` env var) |
| `media_filter` | Filter formats: `gif`, `tinygif`, `mp4`, `tinymp4`, `webm` |
| `contentfilter` | Safety: `off`, `low`, `medium`, `high` |
| `locale` | Language: `en_US`, `es`, `fr`, etc. |
## Available Media Formats
Each result has multiple formats under `.media_formats`:
| Format | Use case |
|--------|----------|
| `gif` | Full quality GIF |
| `tinygif` | Small preview GIF |
| `mp4` | Video version (smaller file size) |
| `tinymp4` | Small preview video |
| `webm` | WebM video |
| `nanogif` | Tiny thumbnail |
## Notes
- URL-encode the query: spaces as `+`, special chars as `%XX`
- For sending in chat, `tinygif` URLs are lighter weight
- GIF URLs can be used directly in markdown: `![alt](url)`
@@ -0,0 +1,170 @@
---
name: heartmula
description: Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support.
version: 1.0.0
metadata:
hermes:
tags: [music, audio, generation, ai, heartmula, heartcodec, lyrics, songs]
related_skills: [audiocraft]
---
# HeartMuLa - Open-Source Music Generation
## Overview
HeartMuLa is a family of open-source music foundation models (Apache-2.0) that generates music conditioned on lyrics and tags. Comparable to Suno for open-source. Includes:
- **HeartMuLa** - Music language model (3B/7B) for generation from lyrics + tags
- **HeartCodec** - 12.5Hz music codec for high-fidelity audio reconstruction
- **HeartTranscriptor** - Whisper-based lyrics transcription
- **HeartCLAP** - Audio-text alignment model
## When to Use
- User wants to generate music/songs from text descriptions
- User wants an open-source Suno alternative
- User wants local/offline music generation
- User asks about HeartMuLa, heartlib, or AI music generation
## Hardware Requirements
- **Minimum**: 8GB VRAM with `--lazy_load true` (loads/unloads models sequentially)
- **Recommended**: 16GB+ VRAM for comfortable single-GPU usage
- **Multi-GPU**: Use `--mula_device cuda:0 --codec_device cuda:1` to split across GPUs
- 3B model with lazy_load peaks at ~6.2GB VRAM
## Installation Steps
### 1. Clone Repository
```bash
cd ~/ # or desired directory
git clone https://github.com/HeartMuLa/heartlib.git
cd heartlib
```
### 2. Create Virtual Environment (Python 3.10 required)
```bash
uv venv --python 3.10 .venv
. .venv/bin/activate
uv pip install -e .
```
### 3. Fix Dependency Compatibility Issues
**IMPORTANT**: As of Feb 2026, the pinned dependencies have conflicts with newer packages. Apply these fixes:
```bash
# Upgrade datasets (old version incompatible with current pyarrow)
uv pip install --upgrade datasets
# Upgrade transformers (needed for huggingface-hub 1.x compatibility)
uv pip install --upgrade transformers
```
### 4. Patch Source Code (Required for transformers 5.x)
**Patch 1 - RoPE cache fix** in `src/heartlib/heartmula/modeling_heartmula.py`:
In the `setup_caches` method of the `HeartMuLa` class, add RoPE reinitialization after the `reset_caches` try/except block and before the `with device:` block:
```python
# Re-initialize RoPE caches that were skipped during meta-device loading
from torchtune.models.llama3_1._position_embeddings import Llama3ScaledRoPE
for module in self.modules():
if isinstance(module, Llama3ScaledRoPE) and not module.is_cache_built:
module.rope_init()
module.to(device)
```
**Why**: `from_pretrained` creates model on meta device first; `Llama3ScaledRoPE.rope_init()` skips cache building on meta tensors, then never rebuilds after weights are loaded to real device.
**Patch 2 - HeartCodec loading fix** in `src/heartlib/pipelines/music_generation.py`:
Add `ignore_mismatched_sizes=True` to ALL `HeartCodec.from_pretrained()` calls (there are 2: the eager load in `__init__` and the lazy load in the `codec` property).
**Why**: VQ codebook `initted` buffers have shape `[1]` in checkpoint vs `[]` in model. Same data, just scalar vs 0-d tensor. Safe to ignore.
### 5. Download Model Checkpoints
```bash
cd heartlib # project root
hf download --local-dir './ckpt' 'HeartMuLa/HeartMuLaGen'
hf download --local-dir './ckpt/HeartMuLa-oss-3B' 'HeartMuLa/HeartMuLa-oss-3B-happy-new-year'
hf download --local-dir './ckpt/HeartCodec-oss' 'HeartMuLa/HeartCodec-oss-20260123'
```
All 3 can be downloaded in parallel. Total size is several GB.
## GPU / CUDA
HeartMuLa uses CUDA by default (`--mula_device cuda --codec_device cuda`). No extra setup needed if the user has an NVIDIA GPU with PyTorch CUDA support installed.
- The installed `torch==2.4.1` includes CUDA 12.1 support out of the box
- `torchtune` may report version `0.4.0+cpu` — this is just package metadata, it still uses CUDA via PyTorch
- To verify GPU is being used, look for "CUDA memory" lines in the output (e.g. "CUDA memory before unloading: 6.20 GB")
- **No GPU?** You can run on CPU with `--mula_device cpu --codec_device cpu`, but expect generation to be **extremely slow** (potentially 30-60+ minutes for a single song vs ~4 minutes on GPU). CPU mode also requires significant RAM (~12GB+ free). If the user has no NVIDIA GPU, recommend using a cloud GPU service (Google Colab free tier with T4, Lambda Labs, etc.) or the online demo at https://heartmula.github.io/ instead.
## Usage
### Basic Generation
```bash
cd heartlib
. .venv/bin/activate
python ./examples/run_music_generation.py \
--model_path=./ckpt \
--version="3B" \
--lyrics="./assets/lyrics.txt" \
--tags="./assets/tags.txt" \
--save_path="./assets/output.mp3" \
--lazy_load true
```
### Input Formatting
**Tags** (comma-separated, no spaces):
```
piano,happy,wedding,synthesizer,romantic
```
or
```
rock,energetic,guitar,drums,male-vocal
```
**Lyrics** (use bracketed structural tags):
```
[Intro]
[Verse]
Your lyrics here...
[Chorus]
Chorus lyrics...
[Bridge]
Bridge lyrics...
[Outro]
```
### Key Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--max_audio_length_ms` | 240000 | Max length in ms (240s = 4 min) |
| `--topk` | 50 | Top-k sampling |
| `--temperature` | 1.0 | Sampling temperature |
| `--cfg_scale` | 1.5 | Classifier-free guidance scale |
| `--lazy_load` | false | Load/unload models on demand (saves VRAM) |
| `--mula_dtype` | bfloat16 | Dtype for HeartMuLa (bf16 recommended) |
| `--codec_dtype` | float32 | Dtype for HeartCodec (fp32 recommended for quality) |
### Performance
- RTF (Real-Time Factor) ≈ 1.0 — a 4-minute song takes ~4 minutes to generate
- Output: MP3, 48kHz stereo, 128kbps
## Pitfalls
1. **Do NOT use bf16 for HeartCodec** — degrades audio quality. Use fp32 (default).
2. **Tags may be ignored** — known issue (#90). Lyrics tend to dominate; experiment with tag ordering.
3. **Triton not available on macOS** — Linux/CUDA only for GPU acceleration.
4. **RTX 5080 incompatibility** reported in upstream issues.
5. The dependency pin conflicts require the manual upgrades and patches described above.
## Links
- Repo: https://github.com/HeartMuLa/heartlib
- Models: https://huggingface.co/HeartMuLa
- Paper: https://arxiv.org/abs/2601.10547
- License: Apache-2.0
@@ -0,0 +1,82 @@
---
name: songsee
description: Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.
version: 1.0.0
author: community
license: MIT
metadata:
hermes:
tags: [Audio, Visualization, Spectrogram, Music, Analysis]
homepage: https://github.com/steipete/songsee
prerequisites:
commands: [songsee]
---
# songsee
Generate spectrograms and multi-panel audio feature visualizations from audio files.
## Prerequisites
Requires [Go](https://go.dev/doc/install):
```bash
go install github.com/steipete/songsee/cmd/songsee@latest
```
Optional: `ffmpeg` for formats beyond WAV/MP3.
## Quick Start
```bash
# Basic spectrogram
songsee track.mp3
# Save to specific file
songsee track.mp3 -o spectrogram.png
# Multi-panel visualization grid
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
# Time slice (start at 12.5s, 8s duration)
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
# From stdin
cat track.mp3 | songsee - --format png -o out.png
```
## Visualization Types
Use `--viz` with comma-separated values:
| Type | Description |
|------|-------------|
| `spectrogram` | Standard frequency spectrogram |
| `mel` | Mel-scaled spectrogram |
| `chroma` | Pitch class distribution |
| `hpss` | Harmonic/percussive separation |
| `selfsim` | Self-similarity matrix |
| `loudness` | Loudness over time |
| `tempogram` | Tempo estimation |
| `mfcc` | Mel-frequency cepstral coefficients |
| `flux` | Spectral flux (onset detection) |
Multiple `--viz` types render as a grid in a single image.
## Common Flags
| Flag | Description |
|------|-------------|
| `--viz` | Visualization types (comma-separated) |
| `--style` | Color palette: `classic`, `magma`, `inferno`, `viridis`, `gray` |
| `--width` / `--height` | Output image dimensions |
| `--window` / `--hop` | FFT window and hop size |
| `--min-freq` / `--max-freq` | Frequency range filter |
| `--start` / `--duration` | Time slice of the audio |
| `--format` | Output format: `jpg` or `png` |
| `-o` | Output file path |
## Notes
- WAV and MP3 are decoded natively; other formats require `ffmpeg`
- Output images can be inspected with `vision_analyze` for automated audio analysis
- Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines
@@ -0,0 +1,134 @@
---
name: spotify
description: Control Spotify — play music, search the catalog, manage playlists and library, inspect devices and playback state. Loads when the user asks to play/pause/queue music, search tracks/albums/artists, manage playlists, or check what's playing. Assumes the Hermes Spotify toolset is enabled and `hermes auth spotify` has been run.
version: 1.0.0
author: Hermes Agent
license: MIT
prerequisites:
tools: [spotify_playback, spotify_devices, spotify_queue, spotify_search, spotify_playlists, spotify_albums, spotify_library]
metadata:
hermes:
tags: [spotify, music, playback, playlists, media]
related_skills: [gif-search]
---
# Spotify
Control the user's Spotify account via the Hermes Spotify toolset (7 tools). Setup guide: https://hermes-agent.nousresearch.com/docs/user-guide/features/spotify
## When to use this skill
The user says something like "play X", "pause", "skip", "queue up X", "what's playing", "search for X", "add to my X playlist", "make a playlist", "save this to my library", etc.
## The 7 tools
- `spotify_playback` — play, pause, next, previous, seek, set_repeat, set_shuffle, set_volume, get_state, get_currently_playing, recently_played
- `spotify_devices` — list, transfer
- `spotify_queue` — get, add
- `spotify_search` — search the catalog
- `spotify_playlists` — list, get, create, add_items, remove_items, update_details
- `spotify_albums` — get, tracks
- `spotify_library` — list/save/remove with `kind: "tracks"|"albums"`
Playback-mutating actions require Spotify Premium; search/library/playlist ops work on Free.
## Canonical patterns (minimize tool calls)
### "Play <artist/track/album>"
One search, then play by URI. Do NOT loop through search results describing them unless the user asked for options.
```
spotify_search({"query": "miles davis kind of blue", "types": ["album"], "limit": 1})
→ got album URI spotify:album:1weenld61qoidwYuZ1GESA
spotify_playback({"action": "play", "context_uri": "spotify:album:1weenld61qoidwYuZ1GESA"})
```
For "play some <artist>" (no specific song), prefer `types: ["artist"]` and play the artist context URI — Spotify handles smart shuffle. If the user says "the song" or "that track", search `types: ["track"]` and pass `uris: [track_uri]` to play.
### "What's playing?" / "What am I listening to?"
Single call — don't chain get_state after get_currently_playing.
```
spotify_playback({"action": "get_currently_playing"})
```
If it returns 204/empty (`is_playing: false`), tell the user nothing is playing. Don't retry.
### "Pause" / "Skip" / "Volume 50"
Direct action, no preflight inspection needed.
```
spotify_playback({"action": "pause"})
spotify_playback({"action": "next"})
spotify_playback({"action": "set_volume", "volume_percent": 50})
```
### "Add to my <playlist name> playlist"
1. `spotify_playlists list` to find the playlist ID by name
2. Get the track URI (from currently playing, or search)
3. `spotify_playlists add_items` with the playlist_id and URIs
```
spotify_playlists({"action": "list"})
→ found "Late Night Jazz" = 37i9dQZF1DX4wta20PHgwo
spotify_playback({"action": "get_currently_playing"})
→ current track uri = spotify:track:0DiWol3AO6WpXZgp0goxAV
spotify_playlists({"action": "add_items",
"playlist_id": "37i9dQZF1DX4wta20PHgwo",
"uris": ["spotify:track:0DiWol3AO6WpXZgp0goxAV"]})
```
### "Create a playlist called X and add the last 3 songs I played"
```
spotify_playback({"action": "recently_played", "limit": 3})
spotify_playlists({"action": "create", "name": "Focus 2026"})
→ got playlist_id back in response
spotify_playlists({"action": "add_items", "playlist_id": <id>, "uris": [<3 uris>]})
```
### "Save / unsave / is this saved?"
Use `spotify_library` with the right `kind`.
```
spotify_library({"kind": "tracks", "action": "save", "uris": ["spotify:track:..."]})
spotify_library({"kind": "albums", "action": "list", "limit": 50})
```
### "Transfer playback to my <device>"
```
spotify_devices({"action": "list"})
→ pick the device_id by matching name/type
spotify_devices({"action": "transfer", "device_id": "<id>", "play": true})
```
## Critical failure modes
**`403 Forbidden — No active device found`** on any playback action means Spotify isn't running anywhere. Tell the user: "Open Spotify on your phone/desktop/web player first, start any track for a second, then retry." Don't retry the tool call blindly — it will fail the same way. You can call `spotify_devices list` to confirm; an empty list means no active device.
**`403 Forbidden — Premium required`** means the user is on Free and tried to mutate playback. Don't retry; tell them this action needs Premium. Reads still work (search, playlists, library, get_state).
**`204 No Content` on `get_currently_playing`** is NOT an error — it means nothing is playing. The tool returns `is_playing: false`. Just report that to the user.
**`429 Too Many Requests`** = rate limit. Wait and retry once. If it keeps happening, you're looping — stop.
**`401 Unauthorized` after a retry** — refresh token revoked. Tell the user to run `hermes auth spotify` again.
## URI and ID formats
Spotify uses three interchangeable ID formats. The tools accept all three and normalize:
- URI: `spotify:track:0DiWol3AO6WpXZgp0goxAV` (preferred)
- URL: `https://open.spotify.com/track/0DiWol3AO6WpXZgp0goxAV`
- Bare ID: `0DiWol3AO6WpXZgp0goxAV`
When in doubt, use full URIs. Search results return URIs in the `uri` field — pass those directly.
Entity types: `track`, `album`, `artist`, `playlist`, `show`, `episode`. Use the right type for the action — `spotify_playback.play` with a `context_uri` expects album/playlist/artist; `uris` expects an array of track URIs.
## What NOT to do
- **Don't call `get_state` before every action.** Spotify accepts play/pause/skip without preflight. Only inspect state when the user asked "what's playing" or you need to reason about device/track.
- **Don't describe search results unless asked.** If the user said "play X", search, grab the top URI, play it. They'll hear it's wrong if it's wrong.
- **Don't retry on `403 Premium required` or `403 No active device`.** Those are permanent until user action.
- **Don't use `spotify_search` to find a playlist by name** — that searches the public Spotify catalog. User playlists come from `spotify_playlists list`.
- **Don't mix `kind: "tracks"` with album URIs** in `spotify_library` (or vice versa). The tool normalizes IDs but the API endpoint differs.
@@ -0,0 +1,72 @@
---
name: youtube-content
description: >
Fetch YouTube video transcripts and transform them into structured content
(chapters, summaries, threads, blog posts). Use when the user shares a YouTube
URL or video link, asks to summarize a video, requests a transcript, or wants
to extract and reformat content from any YouTube video.
---
# YouTube Content Tool
Extract transcripts from YouTube videos and convert them into useful formats.
## Setup
```bash
pip install youtube-transcript-api
```
## Helper Script
`SKILL_DIR` is the directory containing this SKILL.md file. The script accepts any standard YouTube URL format, short links (youtu.be), shorts, embeds, live links, or a raw 11-character video ID.
```bash
# JSON output with metadata
python3 SKILL_DIR/scripts/fetch_transcript.py "https://youtube.com/watch?v=VIDEO_ID"
# Plain text (good for piping into further processing)
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --text-only
# With timestamps
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --timestamps
# Specific language with fallback chain
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --language tr,en
```
## Output Formats
After fetching the transcript, format it based on what the user asks for:
- **Chapters**: Group by topic shifts, output timestamped chapter list
- **Summary**: Concise 5-10 sentence overview of the entire video
- **Chapter summaries**: Chapters with a short paragraph summary for each
- **Thread**: Twitter/X thread format — numbered posts, each under 280 chars
- **Blog post**: Full article with title, sections, and key takeaways
- **Quotes**: Notable quotes with timestamps
### Example — Chapters Output
```
00:00 Introduction — host opens with the problem statement
03:45 Background — prior work and why existing solutions fall short
12:20 Core method — walkthrough of the proposed approach
24:10 Results — benchmark comparisons and key takeaways
31:55 Q&A — audience questions on scalability and next steps
```
## Workflow
1. **Fetch** the transcript using the helper script with `--text-only --timestamps`.
2. **Validate**: confirm the output is non-empty and in the expected language. If empty, retry without `--language` to get any available transcript. If still empty, tell the user the video likely has transcripts disabled.
3. **Chunk if needed**: if the transcript exceeds ~50K characters, split into overlapping chunks (~40K with 2K overlap) and summarize each chunk before merging.
4. **Transform** into the requested output format. If the user did not specify a format, default to a summary.
5. **Verify**: re-read the transformed output to check for coherence, correct timestamps, and completeness before presenting.
## Error Handling
- **Transcript disabled**: tell the user; suggest they check if subtitles are available on the video page.
- **Private/unavailable video**: relay the error and ask the user to verify the URL.
- **No matching language**: retry without `--language` to fetch any available transcript, then note the actual language to the user.
- **Dependency missing**: run `pip install youtube-transcript-api` and retry.
@@ -0,0 +1,56 @@
# Output Format Examples
## Chapters
```
00:00 Introduction
02:15 Background and motivation
05:30 Main approach
12:45 Results and evaluation
18:20 Limitations and future work
21:00 Q&A
```
## Summary
A 5-10 sentence overview covering the video's main points, key arguments, and conclusions. Written in third person, present tense.
## Chapter Summaries
```
## 00:00 Introduction (2 min)
The speaker introduces the topic of X and explains why it matters for Y.
## 02:15 Background (3 min)
A review of prior work in the field, covering approaches A, B, and C.
```
## Thread (Twitter/X)
```
1/ Just watched an incredible talk on [topic]. Here are the key takeaways: 🧵
2/ First insight: [point]. This matters because [reason].
3/ The surprising part: [unexpected finding]. Most people assume [common belief], but the data shows otherwise.
4/ Practical takeaway: [actionable advice].
5/ Full video: [URL]
```
## Blog Post
Full article with:
- Title
- Introduction paragraph
- H2 sections for each major topic
- Key quotes (with timestamps)
- Conclusion / takeaways
## Quotes
```
"The most important thing is not the model size, but the data quality." — 05:32
"We found that scaling past 70B parameters gave diminishing returns." — 12:18
```
@@ -0,0 +1,124 @@
#!/usr/bin/env python3
"""
Fetch a YouTube video transcript and output it as structured JSON.
Usage:
python fetch_transcript.py <url_or_video_id> [--language en,tr] [--timestamps]
Output (JSON):
{
"video_id": "...",
"language": "en",
"segments": [{"text": "...", "start": 0.0, "duration": 2.5}, ...],
"full_text": "complete transcript as plain text",
"timestamped_text": "00:00 first line\n00:05 second line\n..."
}
Install dependency: pip install youtube-transcript-api
"""
import argparse
import json
import re
import sys
def extract_video_id(url_or_id: str) -> str:
"""Extract the 11-character video ID from various YouTube URL formats."""
url_or_id = url_or_id.strip()
patterns = [
r'(?:v=|youtu\.be/|shorts/|embed/|live/)([a-zA-Z0-9_-]{11})',
r'^([a-zA-Z0-9_-]{11})$',
]
for pattern in patterns:
match = re.search(pattern, url_or_id)
if match:
return match.group(1)
return url_or_id
def format_timestamp(seconds: float) -> str:
"""Convert seconds to HH:MM:SS or MM:SS format."""
total = int(seconds)
h, remainder = divmod(total, 3600)
m, s = divmod(remainder, 60)
if h > 0:
return f"{h}:{m:02d}:{s:02d}"
return f"{m}:{s:02d}"
def fetch_transcript(video_id: str, languages: list = None):
"""Fetch transcript segments from YouTube.
Returns a list of dicts with 'text', 'start', and 'duration' keys.
Compatible with youtube-transcript-api v1.x.
"""
try:
from youtube_transcript_api import YouTubeTranscriptApi
except ImportError:
print("Error: youtube-transcript-api not installed. Run: pip install youtube-transcript-api",
file=sys.stderr)
sys.exit(1)
api = YouTubeTranscriptApi()
if languages:
result = api.fetch(video_id, languages=languages)
else:
result = api.fetch(video_id)
# v1.x returns FetchedTranscriptSnippet objects; normalize to dicts
return [
{"text": seg.text, "start": seg.start, "duration": seg.duration}
for seg in result
]
def main():
parser = argparse.ArgumentParser(description="Fetch YouTube transcript as JSON")
parser.add_argument("url", help="YouTube URL or video ID")
parser.add_argument("--language", "-l", default=None,
help="Comma-separated language codes (e.g. en,tr). Default: auto")
parser.add_argument("--timestamps", "-t", action="store_true",
help="Include timestamped text in output")
parser.add_argument("--text-only", action="store_true",
help="Output plain text instead of JSON")
args = parser.parse_args()
video_id = extract_video_id(args.url)
languages = [l.strip() for l in args.language.split(",")] if args.language else None
try:
segments = fetch_transcript(video_id, languages)
except Exception as e:
error_msg = str(e)
if "disabled" in error_msg.lower():
print(json.dumps({"error": "Transcripts are disabled for this video."}))
elif "no transcript" in error_msg.lower():
print(json.dumps({"error": f"No transcript found. Try specifying a language with --language."}))
else:
print(json.dumps({"error": error_msg}))
sys.exit(1)
full_text = " ".join(seg["text"] for seg in segments)
timestamped = "\n".join(
f"{format_timestamp(seg['start'])} {seg['text']}" for seg in segments
)
if args.text_only:
print(timestamped if args.timestamps else full_text)
return
result = {
"video_id": video_id,
"segment_count": len(segments),
"duration": format_timestamp(segments[-1]["start"] + segments[-1]["duration"]) if segments else "0:00",
"full_text": full_text,
}
if args.timestamps:
result["timestamped_text"] = timestamped
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()