YouTube Transcript Analysis API Skill
📖 Brief
This skill provides an end-to-end YouTube video transcript extraction and deep content analysis service. By extracting video transcripts and then systematically analyzing them, users can understand competitors' core value propositions, target audience profiles, pain point strategies, and content gaps — all without manually watching hours of video.
This skill works in two phases:
- Phase 1 — Transcript Extraction: Uses BrowserAct API to extract raw transcript data (supports single video and batch modes).
- Phase 2 — Deep Analysis: The Agent performs structured 8-dimension analysis on the extracted transcripts.
✨ Features
- No hallucinations, ensuring stable and accurate data extraction: Pre-set workflows avoid AI generative hallucinations.
- No CAPTCHA issues: No need to handle reCAPTCHA or other verification challenges.
- No IP restrictions or geo-blocking: No need to handle regional IP restrictions or geofencing.
- Faster execution: Tasks execute faster compared to purely AI-driven browser automation solutions.
- Extremely high cost-efficiency: Significantly reduces data acquisition costs compared to AI solutions that consume massive amounts of tokens.
🔑 API Key Guide
Before running, you must check the BROWSERACT_API_KEY environment variable. If it is not set, do not take other actions first; you should ask and wait for the user to provide it.
Agent must inform the user:
"Since you haven't configured the BrowserAct API Key yet, please go to the BrowserAct Console to get your Key."
🛠️ Input Parameters
The Agent should determine the extraction mode based on the user's needs:
Mode A: Single Video Analysis
Use when the user provides a specific YouTube video URL.
- TargetURL
- Type:
string - Description: The URL of the YouTube video to extract and analyze.
- Example:
https://www.youtube.com/watch?v=st534T7-mdE - Required: Yes
- Type:
Mode B: Batch Video Analysis
Use when the user wants to search and analyze multiple videos by keyword.
-
KeyWords
- Type:
string - Description: The keyword to search for on YouTube.
- Example:
AI Automation,SaaS Marketing - Required: Yes
- Type:
-
Upload_date
- Type:
string - Description: Filter for the upload date of the videos.
- Example:
This week - Default:
This week
- Type:
-
Datelimit
- Type:
number - Description: The number of videos to extract and analyze.
- Example:
3 - Default:
3
- Type:
Optional Analysis Parameters
These parameters are set by the user's intent, not script arguments:
-
Analysis Language
- Type:
string - Description: The language the analysis report should be written in. Defaults to the same language as the user's request.
- Example:
Chinese,English
- Type:
-
Analysis Focus
- Type:
string - Description: The user may specify an analysis focus. The Agent must dynamically adjust the depth of specific dimensions based on this focus. For example:
- Competitor Analysis -> Deep dive into Dim 7 (Business Model) and Dim 8 (Gaps).
- Viral Deconstruction -> Deep dive into Dim 1 (Hook), Dim 4 (Emotional Arc), and Dim 5 (Viral Drivers).
- Audience Research -> Deep dive into Dim 3 (Persona & Intent) and Dim 4 (Pain Points).
- Default: All 8 dimensions balanced.
- Example:
Competitor Analysis,Viral Deconstruction,Audience Research
- Type:
🚀 Invocation Method
The Agent should execute the unified extraction script based on the mode:
Mode A — Single Video:
python -u ./scripts/youtube_transcript_analysis_api.py single "TargetURL"
Mode B — Batch Videos:
python -u ./scripts/youtube_transcript_analysis_api.py batch "keywords" "Upload_date" Datelimit
⏳ Running Status Monitoring
Since this task involves automated browser operations, it may take a long time (several minutes). The script will continuously output status logs with timestamps while running (e.g., [14:30:05] Task Status: running).
Agent guidelines:
- While waiting for the script to return results, please keep an eye on the terminal output.
- As long as the terminal continues to output new status logs, it means the task is running normally. Do not misjudge it as a deadlock or unresponsiveness.
- If the status remains unchanged for a long time or the script stops outputting without returning a result, only then consider triggering the retry mechanism.
Post-Extraction Workflow
After the script completes and returns transcript data, the Agent must proceed with two additional steps:
Step 1: Present Video Metadata — Display the extracted metadata to the user. (Note: Do NOT output the full raw transcript text in your response, as it is too long. Use it internally for your analysis.)
Step 2: Perform Concise 8-Dimension Analysis — Analyze the transcript across the 8 dimensions. ⚠️ CRITICAL: The analysis MUST be extremely concise, bullet-point driven, and free of filler words. Directly state the facts, evidence, and actionable insights without verbose explanations. Use the same language as the user's request.
📊 Data Output
After successful execution, the output includes two parts:
Part 1: Video Metadata
The script returns the following fields for each video:
video_title: The title of the YouTube videovideo_url: The direct link to the original videopublisher: The name of the channel publishing the videochannel_link: The URL of the publisher's YouTube channelvideo_likes_count: The number of likes the video has receivedtranscript: The complete extracted transcript/subtitles of the video (used internally for analysis, do not display full text)
Part 2: 8-Dimension Analysis
After presenting raw data, the Agent must produce structured analysis on the transcript content across the following 8 dimensions:
Dimension 1: Content Structure & Hook
Analyze the video's narrative architecture:
- Opening Hook: What is the core hook in the first 30 seconds? Quote it and explain the hook logic (e.g., curiosity gap, bold claim).
- Narrative Framework: Identify the overall structure (e.g., Problem-Agitate-Solve, Hero's Journey, Listicle).
- Pacing & Time Allocation: Proportion of intro vs. core content vs. pitch/CTA.
Dimension 2: Core Messaging
Extract the central message:
- Single Core Viewpoint: What is the ONE key thesis the video conveys?
- Supporting Arguments: How is the viewpoint supported? (Data, analogies, personal experience).
- Conclusion Clarity: Is the conclusion clear and memorable?
Dimension 3: Audience Persona & Intent
Identify the intended viewer and their mindset:
- Target Viewer Profile & Level: Who is this for? (Beginner, Expert) What prior knowledge is assumed?
- Viewer Intent: Why are they watching? (To learn a skill, be entertained, make a buying decision, or validate existing beliefs?)
Dimension 4: Pain Points & Emotional Arc
Map the emotional journey and problems addressed:
- Explicit & Implicit Pain Points: What specific problems are stated or implied? Quote exact words.
- Emotional Arc: How does the content shift the viewer's emotion? (e.g., from anxiety/confusion to clarity/relief/empowerment). This emotional shift drives retention and sharing.
Dimension 5: Viral & Engagement Drivers
Analyze the spreading mechanism:
- Shareability Factors: Why is this video shared? (Controversial takes, highly relatable scenarios, title/thumbnail alignment inferred from script).
- Memorable/Quotable Phrasing: Extract unique expressions, catchy concepts, or "aha" moments that stick in the mind.
Dimension 6: Evidence & Credibility
Evaluate trust-building elements:
- Authority Signals: Data cite