yt2bb — YouTube to Bilibili Video Repurposing
Overview
Six-step pipeline: download → transcribe → translate → merge → burn subtitles → generate publish info. Produces a video with hardcoded bilingual (EN/ZH) subtitles and a publish_info.md with Bilibili upload metadata.
When to Use
- User provides a YouTube URL (single video or playlist) and wants a Bilibili-ready version
- User needs bilingual EN-ZH subtitles burned into video
- User wants to repurpose English video content for Chinese audience
Quick Reference
| Step | Tool | Command | Output |
|---|---|---|---|
| 0. Update | git | Auto-check for skill updates | — |
| 1. Download | yt-dlp | yt-dlp --cookies-from-browser chrome -f ... -o ... | {slug}.mp4 |
| 2. Transcribe | whisper* | srt_utils.py check-whisper then transcribe | {slug}_{lang}.srt |
| 2.5 Validate | srt_utils.py | srt_utils.py validate / fix | {slug}_{lang}.srt (fixed) |
| 3. Translate | AI | SRT-aware batch translation | {slug}_zh.srt |
| 4. Merge | srt_utils.py | srt_utils.py merge ... | {slug}_bilingual.srt |
| 4.5 Style | srt_utils.py | srt_utils.py to_ass --preset netflix|clean|glow | {slug}_bilingual.ass |
| 5. Burn | ffmpeg | ffmpeg -c:v libx264 -vf ass=... | {slug}_bilingual.mp4 |
| 6. Publish | AI | Analyze content, generate metadata | publish_info.md |
Update check
Throttle to one check per 24 hours per installation; never mutate the skill directory without explicit user consent. SKILL_DIR resolved here is reused by later pipeline steps for script paths.
-
If
<this-skill-dir>/.last_updateexists and is less than 24 hours old, skip this step entirely. -
Otherwise, fetch the latest tag from upstream:
git -C <this-skill-dir> ls-remote --tags origin 'v*' 2>/dev/null \ | awk '{print $2}' | sed 's|refs/tags/||' \ | sort -V | tail -1 -
Compare with this skill's
metadata.versionfrom the frontmatter. If the upstream tag is strictly newer (semver), tell the user one line and ask:"A newer version of this skill is available: vX.Y.Z → vA.B.C. Want me to
git pull?"If they say yes, run
git -C <this-skill-dir> pull --ff-only. Refresh.last_updateeither way so the prompt doesn't repeat for 24 hours. -
If upstream is the same or older, refresh
.last_updatesilently and continue. -
On any failure (offline, not a git checkout — e.g. ClawHub-installed copy, read-only path, no permission), swallow the error silently and continue with the user's task. Do not mention the failure.
Resolve SKILL_DIR for use by later pipeline steps:
# Find skill directory (works across Claude Code, OpenClaw, Hermes, Pi)
SKILL_DIR="$(find ~/.claude/skills ~/.openclaw/skills ~/.hermes/skills ~/.pi/agent/skills ~/.agents/skills ~/myagents/myskills -maxdepth 2 -name 'yt2bb' -type d 2>/dev/null | head -1)"
Pipeline Details
Step 1: Download
Single video:
slug="video-name" # or: slug=$(python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title")
mkdir -p "${slug}"
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "${slug}/${slug}.mp4" "https://www.youtube.com/watch?v=VIDEO_ID"
Playlist / series:
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "%(playlist_index)03d-%(title)s/%(playlist_index)03d-%(title)s.mp4" \
"https://www.youtube.com/playlist?list=PLAYLIST_ID"
After downloading, rename each folder to a clean slug and run Steps 2–6 for each video sequentially.
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]": ensure mp4 output, avoid webm%(playlist_index)03d: zero-padded index to preserve playlist order- If
--cookies-from-browserfails, export cookies first — see Troubleshooting
Step 2: Transcribe
First run the environment check to detect your platform and get a tailored whisper command:
python3 "$SKILL_DIR/srt_utils.py" check-whisper
This auto-detects OS, GPU (CUDA/Metal/CPU), memory, and installed backends, then recommends the best backend + model for your hardware. If memory detection is unavailable, it falls back conservatively instead of assuming a low-memory machine. Use the command it prints.
Manual fallback (openai-whisper, works everywhere):
src_lang="en" # Change to ja/ko/es/etc. based on source video
whisper_model="medium" # check-whisper recommends the best model for your hardware
whisper "${slug}/${slug}.mp4" \
--model "$whisper_model" \
--language "$src_lang" \
--word_timestamps True \
--condition_on_previous_text False \
--output_format srt \
--max_line_width 40 --max_line_count 1 \
--output_dir "${slug}"
mv "${slug}/${slug}.srt" "${slug}/${slug}_${src_lang}.srt"
Supported backends:
| Backend | Best for | Install |
|---|---|---|
mlx-whisper | macOS Apple Silicon (fastest) | pip install mlx-whisper |
whisper-ctranslate2 | Windows/Linux CUDA, or CPU (~4x faster) | pip install whisper-ctranslate2 |
openai-whisper | Universal fallback | pip install openai-whisper |
Model selection (auto-recommended by check-whisper):
tiny— fast draft, low accuracy, CPU-friendly (~1 GB)medium— default, good balance (~5 GB)large-v3— best accuracy, recommended for JA/KO/ZH source (~10 GB)
Notes:
--language: explicitly set to avoid misdetection; supportsen,ja,ko,es, etc.--word_timestamps True: more precise subtitle timing--condition_on_previous_text False: prevent hallucination loops- If output is garbled or repeated, add anti-hallucination flags — see Troubleshooting
Step 2.5: Validate & Fix (optional)
python3 "$SKILL_DIR/srt_utils.py" validate "${slug}/${slug}_${src_lang}.srt"
# If issues found:
python3 "$SKILL_DIR/srt_utils.py" fix "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_${src_lang}.srt"
Step 3: Translate
Read {slug}_{src_lang}.srt and translate to Chinese. Critical rules:
These rules are modeled on the Netflix Simplified Chinese Timed Text Style Guide; follow them to produce broadcast-grade subtitles.
- Keep SRT format intact — preserve index numbers, timestamps (
-->lines) exactly as-is - 1:1 entry mapping — every source entry must produce exactly one translated entry (same count)
- Optimize for bottom subtitles — keep each Chinese entry to 1 line whenever possible so the final bilingual subtitle stays compact near the bottom of the frame
- Max 16 full-width characters per line (Netflix SC spec). Prefer 12–16; if a cue is very short (< 1 s) compress further so reading speed stays ≤ 9 characters/second
- Shorten with judgment, not mechanically — remove filler words, repeated subjects, weak interjections, and redundant politeness before dropping key meaning
- Match subtitle duration — the line must feel readable within the time on screen; if the cue is very short, compress more aggressively
- No trailing punctuation on Chinese cues — drop ending
。,!,?; keep mid-sentence,,、,;only when they add clarity - Use full-width Chinese punctuation inside cues (
,。!?、;:); use 「」 for inner quotes, not""or'' - Half-width digits and Latin — numbers, units, product names, and code identifiers stay half-width (
GPT-4,30fps,2026); only punctuation is full-width - Line-break discipline — never break after function words (
的,了,吗,呢,吧,啊); never split an English phrasal unit across a line break; keep modifiers with their heads - Keep terminology consistent — technical terms, names, product names, and recurring phrases should be translated the same way across batches. Maintain an inline glossary if needed
- Adapt, don't transliterate — preserve register, tone, and intent over literal word matching; idioms become natural Chinese equivalents
- Translate in batches of 10 entries —