SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

social-media-scraper

Marketing

Skill that pulls and analyzes all data from social media posts. Fetches content from Instagram (reel, post, story), TikTok, Twitter/X and YouTube links. Collects all metadata including text, description, comments, like/share counts, date info, hashtags. Automatically transcribes with faster-whisper and applies visual analysis with Gemini Vision on posts containing video or audio. MANDATORY TRIGGER

1estrelas
Ver no GitHub ↗Autor: elbis330Licença: MIT

Social Media Scraper

A skill that pulls all data from social media posts, analyzes them, and performs transcription + visual analysis for video/audio content.

First-Time Setup

When this skill is invoked for the first time or when the user says "install", "configure", "setup", run the setup mode first.

Setup Detection

Configuration file: ~/.social-media-scraper.env. If the file does not exist or the user requests reinstallation, ask the interactive questions below.

test -f ~/.social-media-scraper.env || echo "First-time setup required"

Setup Questions

Ask the questions one by one, letting the user answer each. Based on the answers, create the ~/.social-media-scraper.env file and install only the tools for the selected platforms.

Question 1 — Platforms

"Which platforms do you want to use? Instagram, TikTok, Twitter/X, YouTube — pick all of them or only the ones you want. (default: all)"

Normalize the answer as a comma-separated lowercase list: instagram,tiktok,twitter,youtube.

Question 2 — Gemini visual analysis

"Should Gemini visual video analysis be enabled? This reads on-screen text, products, and scenes — it provides visual context that Whisper cannot translate. Enabling it requires a free Gemini API key (https://aistudio.google.com/apikey). Enable it? (default: yes)"

If yes: ask "Paste your Gemini API key:". Write the key into ~/.social-media-scraper.env but do not print to terminal, do not commit, do not log. Protect the file with chmod 600.

If no: mark GEMINI_ENABLED=false, do not install google-genai.

Question 3 — Transcription language

"What should the default transcription language be? Auto-detect (recommended — 99 languages), Turkish, English, or another ISO 639-1 language code (e.g. fr, de, es). (default: auto)"

Store the value as auto, tr, en or an ISO code.

Configuration File Format

~/.social-media-scraper.env:

# Active platforms (comma-separated)
PLATFORMS=instagram,tiktok,twitter,youtube

# Gemini video analysis
GEMINI_ENABLED=true
GEMINI_API_KEY=AIza...

# Transcription
TRANSCRIPTION_LANG=auto
WHISPER_MODEL=medium

Installation Steps (based on answers)

Install only the tools for the selected platforms:

# Always required
pip install faster-whisper --break-system-packages
# ffmpeg: macOS → brew install ffmpeg | Linux → apt install ffmpeg

# Only if selected
# instagram:
pip install instaloader --break-system-packages
# tiktok or youtube:
pip install yt-dlp --break-system-packages
# twitter:
npm install -g @steipete/bird
# if gemini enabled:
pip install google-genai --break-system-packages

After Setup

When setup is complete, give the user a short summary (which platforms are active, whether Gemini is enabled, what language) and ask for a sample link to test.

Reading the Current Configuration

When the skill is running, read the ~/.social-media-scraper.env file and behave according to the selections. E.g. if GEMINI_ENABLED=false, skip the visual analysis step. If TRANSCRIPTION_LANG=tr, pass language="tr" to Whisper.

set -a
source ~/.social-media-scraper.env 2>/dev/null
set +a

If a link from a platform not in the PLATFORMS list arrives, ask the user "This platform is not installed, would you like to add it?".

General Flow

  1. The user shares a social media link
  2. The platform is auto-detected (from the URL)
  3. All data is fetched with the appropriate tool for the platform
  4. If video/audio content exists: download → extract audio with ffmpeg → transcribe with faster-whisper → analyze visuals/screen with Gemini → delete temporary files
  5. Transcription + visual analysis are merged into a full understanding
  6. Present results to the user in a clean and readable way

Platform Detection

Detect the platform by looking at the URL:

  • instagram.com or instagr.am → Instagram
  • tiktok.com → TikTok
  • x.com or twitter.com → Twitter/X
  • youtube.com or youtu.be → YouTube

Per-Platform Tools

Twitter/X

Priority order:

  1. bird CLI (npm package: @steipete/bird) — most comprehensive, tweet + reply thread + media info
  2. Jina Reader (curl -s "https://r.jina.ai/TWEET_URL") — fallback method
  3. Reading via browser — last resort

bird CLI usage:

bird --urls "TWEET_URL"

If bird is not installed: npm install -g @steipete/bird bird may need Chrome cookies to work, it auto-detects them.

Instagram

Priority order:

  1. instaloader (pip package) — reel, post, story download and metadata
  2. instagrapi (pip package) — more comprehensive API, may require login
  3. yt-dlp — fallback video download

instaloader usage:

pip install instaloader --break-system-packages
instaloader -- -SHORTCODE

The shortcode is extracted from the URL: instagram.com/reel/SHORTCODE/ or instagram.com/p/SHORTCODE/

Fetching metadata (Python):

import instaloader
L = instaloader.Instaloader()
post = instaloader.Post.from_shortcode(L.context, "SHORTCODE")
print(f"Caption: {post.caption}")
print(f"Likes: {post.likes}")
print(f"Comment count: {post.comments}")
print(f"Date: {post.date}")
print(f"Hashtags: {post.caption_hashtags}")

TikTok

Priority order:

  1. Download video + metadata with yt-dlp
  2. Fetch page content with Jina Reader

yt-dlp usage:

yt-dlp --write-info-json --write-comments -o "downloads/%(id)s.%(ext)s" "TIKTOK_URL"

If yt-dlp is not installed: pip install yt-dlp --break-system-packages Cookies may be required for TikTok: yt-dlp --cookies-from-browser chrome "URL"

YouTube

Video + metadata + comments with yt-dlp:

yt-dlp --write-info-json --write-comments --skip-download -o "downloads/%(id)s.%(ext)s" "YOUTUBE_URL"

If video transcription is needed (when no captions are available):

yt-dlp -f "bestaudio" -o "downloads/audio.%(ext)s" "YOUTUBE_URL"

Transcription (For Video/Audio Content)

Automatically transcribe every post containing video or audio. The user does not need to ask separately.

Steps:

  1. Download the video (with the appropriate tool per platform)
  2. Extract audio with ffmpeg:
ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
  1. Transcribe with faster-whisper:
from faster_whisper import WhisperModel
model = WhisperModel("medium", compute_type="int8")
segments, info = model.transcribe("audio.wav")
print(f"Detected language: {info.language} ({info.language_probability:.0%})")
for segment in segments:
    print(f"[{segment.start:.1f}s → {segment.end:.1f}s] {segment.text}")
  1. Delete temporary video and audio files (to save space)

If faster-whisper is not installed: pip install faster-whisper --break-system-packages If ffmpeg is not installed: brew install ffmpeg (macOS) or apt install ffmpeg (Linux)

Video Analysis (Gemini Vision)

For posts containing video, transcribing the audio alone is not enough. On-screen text, displayed products, interfaces, logos, gestures, scene transitions — all of these are part of the meaning. Whisper only translates speech; use Gemini to analyze what appears on screen.

Flow

  1. Download the video (yt-dlp / instaloader / etc.)
  2. Transcribe audio with Whisper (section above)
  3. Upload the video with Gemini File API, analyze it
  4. Merge the two sources and present to the user
  5. Delete temporary files

API Key

The Gemini API key is read from the GEMINI_API_KEY environment variable. You can get an API key for free from Google AI Studio.

Setup:

export GEMINI_API_KEY="your_api_key_here"

To add it permanently to your shell config:

echo 'export GEMINI_API_KEY="your_api_key_here"' >> ~/.zshrc   # macOS / zsh
echo 'export GEMINI_API_KEY="your_api_key_here"' >> ~/.bashrc  # Linux / bash

If you use a .env file:

export GEMINI_API_KEY=$(grep "^GEMINI_API_KEY=" .env | cut -d= -f2- | tr -d '"' | tr -d "'" | tr -d ' ')

Como adicionar

/plugin marketplace add elbis330/social-media-scraper-skill

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.