YouTube Transcript Analysis API Skill

📖 Brief

This skill provides an end-to-end YouTube video transcript extraction and deep content analysis service. By extracting video transcripts and then systematically analyzing them, users can understand competitors' core value propositions, target audience profiles, pain point strategies, and content gaps — all without manually watching hours of video.

This skill works in two phases:

Phase 1 — Transcript Extraction: Uses BrowserAct API to extract raw transcript data (supports single video and batch modes).
Phase 2 — Deep Analysis: The Agent performs structured 8-dimension analysis on the extracted transcripts.

✨ Features

No hallucinations, ensuring stable and accurate data extraction: Pre-set workflows avoid AI generative hallucinations.
No CAPTCHA issues: No need to handle reCAPTCHA or other verification challenges.
No IP restrictions or geo-blocking: No need to handle regional IP restrictions or geofencing.
Faster execution: Tasks execute faster compared to purely AI-driven browser automation solutions.
Extremely high cost-efficiency: Significantly reduces data acquisition costs compared to AI solutions that consume massive amounts of tokens.

🔑 API Key Guide

Before running, you must check the BROWSERACT_API_KEY environment variable. If it is not set, do not take other actions first; you should ask and wait for the user to provide it. Agent must inform the user:

"Since you haven't configured the BrowserAct API Key yet, please go to the BrowserAct Console to get your Key."

🛠️ Input Parameters

The Agent should determine the extraction mode based on the user's needs:

Mode A: Single Video Analysis

Use when the user provides a specific YouTube video URL.

TargetURL
- Type: string
- Description: The URL of the YouTube video to extract and analyze.
- Example: https://www.youtube.com/watch?v=st534T7-mdE
- Required: Yes

Mode B: Batch Video Analysis

Use when the user wants to search and analyze multiple videos by keyword.

KeyWords
- Type: string
- Description: The keyword to search for on YouTube.
- Example: AI Automation, SaaS Marketing
- Required: Yes
Upload_date
- Type: string
- Description: Filter for the upload date of the videos.
- Example: This week
- Default: This week
Datelimit
- Type: number
- Description: The number of videos to extract and analyze.
- Example: 3
- Default: 3

Optional Analysis Parameters

These parameters are set by the user's intent, not script arguments:

Analysis Language
- Type: string
- Description: The language the analysis report should be written in. Defaults to the same language as the user's request.
- Example: Chinese, English
Analysis Focus
- Type: string
- Description: The user may specify an analysis focus. The Agent must dynamically adjust the depth of specific dimensions based on this focus. For example:
  - Competitor Analysis -> Deep dive into Dim 7 (Business Model) and Dim 8 (Gaps).
  - Viral Deconstruction -> Deep dive into Dim 1 (Hook), Dim 4 (Emotional Arc), and Dim 5 (Viral Drivers).
  - Audience Research -> Deep dive into Dim 3 (Persona & Intent) and Dim 4 (Pain Points).
- Default: All 8 dimensions balanced.
- Example: Competitor Analysis, Viral Deconstruction, Audience Research

🚀 Invocation Method

The Agent should execute the unified extraction script based on the mode:

Mode A — Single Video:

python -u ./scripts/youtube_transcript_analysis_api.py single "TargetURL"

Mode B — Batch Videos:

python -u ./scripts/youtube_transcript_analysis_api.py batch "keywords" "Upload_date" Datelimit

⏳ Running Status Monitoring

Since this task involves automated browser operations, it may take a long time (several minutes). The script will continuously output status logs with timestamps while running (e.g., [14:30:05] Task Status: running). Agent guidelines:

While waiting for the script to return results, please keep an eye on the terminal output.
As long as the terminal continues to output new status logs, it means the task is running normally. Do not misjudge it as a deadlock or unresponsiveness.
If the status remains unchanged for a long time or the script stops outputting without returning a result, only then consider triggering the retry mechanism.

Post-Extraction Workflow

After the script completes and returns transcript data, the Agent must proceed with two additional steps:

Step 1: Present Video Metadata — Display the extracted metadata to the user. (Note: Do NOT output the full raw transcript text in your response, as it is too long. Use it internally for your analysis.)

Step 2: Perform Concise 8-Dimension Analysis — Analyze the transcript across the 8 dimensions. ⚠️ CRITICAL: The analysis MUST be extremely concise, bullet-point driven, and free of filler words. Directly state the facts, evidence, and actionable insights without verbose explanations. Use the same language as the user's request.

📊 Data Output

After successful execution, the output includes two parts:

Part 1: Video Metadata

The script returns the following fields for each video:

video_title: The title of the YouTube video
video_url: The direct link to the original video
publisher: The name of the channel publishing the video
channel_link: The URL of the publisher's YouTube channel
video_likes_count: The number of likes the video has received
transcript: The complete extracted transcript/subtitles of the video (used internally for analysis, do not display full text)

Part 2: 8-Dimension Analysis

After presenting raw data, the Agent must produce structured analysis on the transcript content across the following 8 dimensions:

Dimension 1: Content Structure & Hook

Analyze the video's narrative architecture:

Opening Hook: What is the core hook in the first 30 seconds? Quote it and explain the hook logic (e.g., curiosity gap, bold claim).
Narrative Framework: Identify the overall structure (e.g., Problem-Agitate-Solve, Hero's Journey, Listicle).
Pacing & Time Allocation: Proportion of intro vs. core content vs. pitch/CTA.

Dimension 2: Core Messaging

Extract the central message:

Single Core Viewpoint: What is the ONE key thesis the video conveys?
Supporting Arguments: How is the viewpoint supported? (Data, analogies, personal experience).
Conclusion Clarity: Is the conclusion clear and memorable?

Dimension 3: Audience Persona & Intent

Identify the intended viewer and their mindset:

Target Viewer Profile & Level: Who is this for? (Beginner, Expert) What prior knowledge is assumed?
Viewer Intent: Why are they watching? (To learn a skill, be entertained, make a buying decision, or validate existing beliefs?)

Dimension 4: Pain Points & Emotional Arc

Map the emotional journey and problems addressed:

Explicit & Implicit Pain Points: What specific problems are stated or implied? Quote exact words.
Emotional Arc: How does the content shift the viewer's emotion? (e.g., from anxiety/confusion to clarity/relief/empowerment). This emotional shift drives retention and sharing.

Dimension 5: Viral & Engagement Drivers

Analyze the spreading mechanism:

Shareability Factors: Why is this video shared? (Controversial takes, highly relatable scenarios, title/thumbnail alignment inferred from script).
Memorable/Quotable Phrasing: Extract unique expressions, catchy concepts, or "aha" moments that stick in the mind.

Dimension 6: Evidence & Credibility

Evaluate trust-building elements:

Authority Signals: Data cite

youtube-transcript-analysis-api-skill

How to add

Drop this on your repo README

Related skills

MoneyPrinterTurbo

weather-svg-creator

azure-keyvault-secrets-rust

azure-monitor-ingestion-py

Get new Automação skills every Monday