X (Twitter) — Tweet Search & Scraper

Name: x-tweet-search
Rating: 5 (2753 reviews)
Author: browser-act

Search query / handle / URL → structured tweet list (text, author, metrics, media, entities)

Language

All process output to user (progress updates, process notifications) follows the user's language.

Objective

Collect tweets matching a search query, from specific user profiles, or from direct URLs, extracting complete structured data for each tweet.

Prerequisites

Target page is already open in the browser: https://x.com/search?q=... or https://x.com/{handle} or a direct tweet/list URL
User must be logged in to X (user avatar or username visible in left sidebar)

Pre-execution Checks

1. Tool Readiness

If browser-act has been confirmed available in the current session → skip this step.

Invoke browser-act via Skill tool to load usage. If installation or configuration issues arise, follow its guidance to resolve then retry.

2. Login Verification

If login status for X has been confirmed in the current session → skip this step.

Otherwise: open https://x.com and observe the left sidebar:

User avatar or "@username" visible at the bottom → logged in, continue
"Sign in" or "Log in" button visible → not logged in, inform user that X login is required and assist with the login flow

User refuses or cannot log in → terminate execution.

Capability Components

This Skill's operational boundary = what the user can manually do in their browser. It reads tweet data already rendered in the X DOM, never bypassing authentication. JS is encapsulated in scripts/ files, invoked via eval "$(python scripts/xxx.py)". Use the bash tool for execution.

DOM: Tweet list extraction (React Fiber)

Extracts all currently visible tweets from the page via React internal state (React Fiber). Works on search pages, profile pages, list pages, and any X page rendering tweet articles.

Wait for tweets to appear before extracting:

wait --selector "article[data-testid='tweet']" --state attached --timeout 15000

Extract: eval "$(python scripts/extract-tweets.py)"

Returns a JSON array. Each element:

[
  {
    "id": "2059255862548738182",          // tweet ID
    "url": "https://x.com/NASA/status/2059255862548738182",  // direct link
    "text": "Full tweet text including hashtags and URLs",   // full_text field
    "created_at": "2026-05-26T12:50:31.000Z",               // ISO 8601
    "lang": "en",                         // ISO 639-1 language code, null if unknown
    "author_id": "11348282",              // author user ID
    "author_name": "NASA",                // display name
    "author_screen_name": "NASA",         // @handle (without @)
    "author_profile_image": "https://pbs.twimg.com/profile_images/.../photo.jpg",
    "author_followers": 92080161,         // follower count
    "author_following": 305,              // following count
    "author_verified": false,             // legacy blue checkmark
    "author_blue_verified": true,         // X Blue / Gold / Gray checkmark
    "author_location": "Washington, D.C.", // profile location, null if not set
    "author_description": "Explore the universe...",  // bio, null if empty
    "like_count": 82579,
    "retweet_count": 11952,
    "reply_count": 4230,
    "quote_count": 1850,
    "bookmark_count": 12400,
    "view_count": 25923006,               // null if not available
    "is_retweet": false,
    "is_quote": false,
    "is_reply": false,
    "in_reply_to_tweet_id": null,         // parent tweet ID if is_reply=true
    "in_reply_to_user": null,             // @handle of replied-to user
    "conversation_id": "2059255862548738182",
    "hashtags": ["AI", "Space"],          // without #
    "urls": ["https://example.com/article"],  // expanded URLs from entities
    "mentions": ["SpaceX", "ESA"],        // @handles without @
    "media": [
      {
        "type": "video",                  // "photo", "video", "animated_gif"
        "url": "https://pbs.twimg.com/amplify_video_thumb/.../img/thumb.jpg",
        "alt_text": null,
        "video_variants": [
          {"bitrate": 2176000, "url": "https://video.twimg.com/.../1280x720/video.mp4"},
          {"bitrate": 832000,  "url": "https://video.twimg.com/.../640x360/video.mp4"}
        ]
      }
    ],
    "source_name": "Twitter for iPhone",  // client used to post
    "source_url": "http://twitter.com/download/iphone"
  }
]

URL Construction Guide

Input type → URL mapping

searchTerms (keyword / advanced query):

Sort Latest: https://x.com/search?q={url_encoded_query}&src=typed_query&f=live
Sort Top: https://x.com/search?q={url_encoded_query}&src=typed_query
Sort Latest+Top: run both URLs in sequence, deduplicate by tweet ID

twitterHandles (scrape a user's profile tweets):

Option A (profile page): https://x.com/{handle} — shows all tweets/retweets
Option B (search): use from:{handle} as the search query — more filter-compatible

startUrls (direct URLs): navigate to the URL as-is. Supported types:

Tweet URL: https://x.com/{user}/status/{id} — single tweet conversation
Profile URL: https://x.com/{handle} — profile timeline
Search URL: https://x.com/search?q=... — use directly
List URL: https://x.com/i/lists/{list_id} — list timeline

Filter parameters → query operators

Append these operators to the base query string (space-separated):

Parameter	Query operator	Example
`tweetLanguage`	`lang:{code}`	`lang:en`
`onlyVerifiedUsers`	`filter:verified`
`onlyTwitterBlue`	`filter:blue_verified`
`onlyImage`	`filter:images`
`onlyVideo`	`filter:videos`
`onlyQuote`	`filter:quote`
`author`	`from:{handle}`	`from:NASA`
`inReplyTo`	`to:{handle}`	`to:NASA`
`mentioning`	`@{handle}`	`@NASA`
`minimumRetweets`	`min_retweets:{n}`	`min_retweets:100`
`minimumFavorites`	`min_faves:{n}`	`min_faves:500`
`minimumReplies`	`min_replies:{n}`	`min_replies:10`
`start`	`since:{YYYY-MM-DD}`	`since:2024-01-01`
`end`	`until:{YYYY-MM-DD}`	`until:2024-06-01`
`geotaggedNear` + `withinRadius`	`near:"{location}" within:{radius}`	`near:"New York" within:15mi`
`geocode`	`geocode:{lat},{lon},{radius}`	`geocode:40.7,-74.0,10km`
`-filter:retweets`	exclude retweets

Example: Scrape English tweets from NASA since 2024 with ≥100 likes, excluding retweets:

query = "from:NASA lang:en since:2024-01-01 min_faves:100 -filter:retweets"
url = "https://x.com/search?q=" + encodeURIComponent(query) + "&src=typed_query&f=live"

Pagination

DOM Pagination (scroll to load more):

X dynamically appends new tweet articles to the DOM as the user scrolls. Tweets already rendered remain in the DOM (no virtualization for typical result sets < ~500 tweets).

Loop pattern:

Record current tweet count: extract → note len(results)
scroll down --amount 2000
wait stable
Extract again → compare IDs, add new ones to collection
Termination conditions:
- New extraction returns 0 new tweet IDs (end of results reached)
- collected >= max_items (if limit specified)
- Same tweet IDs returned 3 consecutive times (no more data loading)

Deduplication: track seen IDs in a Python set, filter before appending to output.

Success Criteria

result count >= 1 and id field non-null rate = 100% and text field non-null rate = 100%

Known Limitations

React Fiber key: The __reactFiber prefix includes a session-specific hash (e.g., __reactFiber$ozawbbp0gp). The script uses startsWith('__reactFiber') which is stable across deployments; only fails if React is replaced with a different framework.
DOM virtualization: For very large result sets (500+ tweets), X may virtualize older DOM nodes to reclaim memory. If extraction suddenly returns fewer tweets than expected after extended scrolling, the remaining tweets may have been removed from D

x-tweet-search

How to add

Drop this on your repo README

Related skills

understand-dashboard

understand-chat

understand-domain

dev-browser

Get new Pesquisa e Web skills every Monday