YouTube — Transcript Extraction & Content Reformatting

Name: youtube-transcript
Rating: 5 (1515 reviews)
Author: browser-act

YouTube video URL → timestamped transcript → summary / chapters / thread / blog / quotes

Language

All process output to user (progress updates, process notifications) follows the user's language.

Objective

Extract the full transcript from a YouTube video's built-in transcript panel, then transform it into the output format the user requests.

Prerequisites

Target YouTube video page is already open in the browser: https://www.youtube.com/watch?v={VIDEO_ID}

Pre-execution Checks

1. Tool Readiness

If browser-act has been confirmed available in the current session → skip this step.

Invoke browser-act via Skill tool to load usage. If installation or configuration issues arise, follow its guidance to resolve then retry.

Capability Components

This Skill's operational boundary = what the user can manually do in their browser. It only reads data already displayed to the user on the page, never bypassing authentication or access controls. JS code is encapsulated in Python files under the scripts/ directory, invoked via eval "$(python scripts/xxx.py)". Use the bash tool for execution.

DOM: Check transcript availability and list languages

eval "$(python scripts/get-languages.py)"

No parameters. Reads ytInitialPlayerResponse from the current page.

Output example:

{
  "available_languages": [
    {"code": "en", "name": "English", "kind": "manual", "is_auto": false},
    {"code": "en", "name": "English (auto-generated)", "kind": "asr", "is_auto": true}
  ],
  "count": 2
}

Returns {"error": true, "message": "..."} when transcripts are disabled or page is not a YouTube video.

DOM: Open transcript panel

eval "$(python scripts/open-transcript-panel.py)"

No parameters. Clicks the "Show transcript" button below the video (handles multiple UI language variants automatically for robustness).

Must call wait stable after this to allow the panel to fully load.

Output example:

{"success": true, "label": "内容转文字"}

DOM: Extract all transcript segments

eval "$(python scripts/extract-transcript-segments.py)"

No parameters. Scrolls the open transcript panel to trigger lazy loading for long videos, then extracts all segments.

Output example:

{
  "segment_count": 24,
  "segments": [
    {"ts": "0:18", "text": "We're no strangers to love"},
    {"ts": "0:27", "text": "You know the rules and so do I"}
  ],
  "full_text": "We're no strangers to love You know the rules...",
  "timestamped_text": "0:18 We're no strangers to love\n0:27 You know the rules..."
}

Composite: Full transcript fetch workflow

navigate https://www.youtube.com/watch?v={VIDEO_ID} → wait stable
eval "$(python scripts/get-languages.py)" — confirm transcripts are available; note the language list
eval "$(python scripts/open-transcript-panel.py)" — open the panel
wait stable — wait for panel content to load
eval "$(python scripts/extract-transcript-segments.py)" — extract all segments

Use timestamped_text from the output as input for the Transform step below.

Transform: Content Reformatting

After fetching the transcript, transform it based on what the user requests. If the user did not specify a format, default to the Full Document — output all five sections in order.

Summary: Concise 5–10 sentence overview of the entire video
Chapters: Group by topic shifts, output timestamped chapter list
Thread: Twitter/X thread format — numbered posts, each under 280 characters
Blog post: Full article with title, H2 sections per major topic, key quotes, and takeaways
Quotes: Notable quotes with their timestamps

Default Full Document output order (when no specific format is requested):

Summary
Chapters
Thread
Blog Post
Quotes

Workflow

Fetch transcript using the Composite component above.
Validate: confirm segment_count >= 1. If empty, tell the user the video has transcripts disabled.
Chunk if needed: if full_text exceeds ~50,000 characters, split timestamped_text into overlapping chunks (~40K characters with 2K overlap) and summarize each chunk before merging.
Transform into the requested format(s) using the timestamped_text field. If no format specified, produce all five sections.
Verify: re-read the output for coherence, correct timestamps (if chapters), and completeness before presenting.

Example — Chapters Output

0:00 Introduction — host opens with the problem statement
3:45 Background — prior work and why existing solutions fall short
12:20 Core method — walkthrough of the proposed approach
24:10 Results — benchmark comparisons and key takeaways
31:55 Q&A — audience questions on scalability and next steps

Example — Thread Output

1/ Just watched an incredible video on [topic]. Key takeaways 🧵

2/ First insight: [point]. This matters because [reason].

3/ The surprising part: [finding]. Most assume [belief], but this shows otherwise.

4/ Practical takeaway: [action].

5/ Full video: [URL]

Error Handling

Transcripts disabled: get-languages.py returns error; tell user and suggest checking if captions are available on the video page
Private/unavailable video: page will not load correctly; relay the error and ask user to verify the URL
Transcript button not found: usually means the user is not on a video page, or the page hasn't finished loading; navigate to the URL and retry
No segments after panel opens: retry open-transcript-panel.py + wait stable + extract-transcript-segments.py once

Known Limitations

Language selection: the transcript panel shows the language YouTube defaults to for the user's region. Switching to a specific language requires changing the caption language in the player's CC settings first; automatic language switching is not implemented.
Auto-generated transcripts (kind: asr) may have lower accuracy than manual captions.
Videos that require login to view will not have a transcript panel accessible.

Execution Efficiency

Batch orchestration: Write a bash script to loop through video URLs serially within a single session — navigate to each video, run the 3-step composite workflow, save result, then move to the next. Do not parallelize within one browser. To increase throughput for large batches, open multiple stealth browser sessions and distribute URLs across them.
Test before batch execution: After writing a batch script, first test with 1–2 videos to confirm the full workflow runs correctly; only then run the full batch.
Reduce redundant pre-operations: Pre-execution checks (tool readiness) only need to run once per session; skip them for subsequent videos in the same batch.
Error resumption: Save each video's result immediately after extraction; on failure, resume from the failed video rather than starting over.

Success Criteria

segment_count >= 1 AND full_text length > 0

Experience Notes

Path: {working-directory}/browser-act-skill-forge-memories/youtube-content-youtube-transcript.memory.md

Before execution: If the file exists, read it first — it records unexpected situations encountered during past executions (e.g., a strategy has become ineffective); adjust strategy order accordingly.

After execution: If an unexpected situation is encountered (strategy became ineffective, page redesigned, anti-scraping upgraded, better path discovered), append a line: {YYYY-MM-DD}: {what happened} → {conclusion}

Normal execution does not write to the file.

youtube-transcript

How to add

Drop this on your repo README

Related skills

doc-coauthoring

algorithmic-art

seo-aeo-blog-writer

wordpress-centric-high-seo-optimized-blogwriting-skill

Get new Escrita e Conteúdo skills every Monday

YouTube — Transcript Extraction & Content Reformatting

Language

Objective

Prerequisites

Pre-execution Checks

1. Tool Readiness

Capability Components

DOM: Check transcript availability and list languages

DOM: Open transcript panel

DOM: Extract all transcript segments

Composite: Full transcript fetch workflow

Transform: Content Reformatting

Workflow

Example — Chapters Output

Example — Thread Output

Error Handling

Known Limitations

Execution Efficiency

Success Criteria

Experience Notes

Comments · No comments