Customer Research

Guide for gathering and synthesizing real customer intelligence — from online communities, review sites, video comments, and social platforms — using the Hyper MCP scraper toolkit.

The goal is always the same: surface what customers actually say (in their own words), not what you assume they say.

Out of scope — defer to other skills

Request	Send them to
Researching competitor brands (site, ads, search rank)	`competitor-intel`
Writing copy informed by the research	`copywriting`
Optimizing a page using VOC insights	`page-cro`
Keyword research and SERP analysis	`seo-research`

Requirements

Hyper MCP installed. https://app.hyperfx.ai/mcp
Apify scrapers toolkit enabled at https://app.hyperfx.ai/integrations — provides Reddit, Twitter, YouTube, TikTok, and Instagram scrapers.

Not all scrapers need to be active for every run — enable the ones relevant to your ICP (Reddit and one review site is the minimum). If a scraper tool is missing from the tool list, skip that source and continue with the others.

Tool surface

Tool	Purpose
`scrape_reddit`	Mine posts and comments from subreddits or by keyword
`search_tweets`	Search X/Twitter with advanced operators and engagement filters
`youtube_top_videos`	Find the top YouTube videos on a topic — use as input for comment mining
`youtube_comments_search`	Pull comments from specific YouTube video URLs
`youtube_transcript`	Fetch the full transcript of a YouTube video for language/topic extraction
`scrape_tiktok_videos`	Search TikTok by keyword or hashtag — find trending conversations and comments
`web_scrape_page`	Scrape review pages (G2, Capterra, Trustpilot, app stores)
`firecrawl_scrape_url`	Cleaner extraction for JS-heavy review pages
`search_google_results`	Find discussion threads, forum posts, and `site:` searches
`scrape_instagram_posts`	Pull recent posts from specific brand or community accounts

Critical rules

Always capture verbatim language. Don't paraphrase customer quotes — the exact words are what gets used in copy and messaging. Extract and preserve them.
Scrape before summarizing. Don't rely on your training data to describe what customers say about a product. Actually fetch the sources.
Label confidence on every insight. High = 3+ independent sources, unprompted. Medium = 2 sources or prompted only. Low = single source. Never present a Low-confidence finding as a conclusion.
Mind the bias of each source. Reddit skews technical and skeptical. Review sites skew toward power users and people with strong opinions. Support tickets skew toward problems. Factor this in before generalizing.
Don't invent persona details. If you don't have data for a persona field, leave it blank rather than filling it in with assumptions.
youtube_transcript is slow (~15–30s). It spins up an isolated sandbox. Only use it for videos where the language in the spoken content (not comments) is what matters.

Two modes

Most research combines both modes. Establish which applies before starting.

Mode 1 — Analyze existing assets

The user provides raw material: interview transcripts, survey responses, NPS verbatims, support tickets, win/loss notes. No tool calls needed — the job is extraction and synthesis.

Read references/synthesis-templates.md for the extraction framework, persona template, and VOC quote bank format. Then produce the requested deliverable.

Mode 2 — Go find research online

The user needs intel from online communities, review sites, and social platforms. This is where MCP tools do the heavy lifting.

See references/source-playbooks.md for per-source tool call examples and signal extraction tips.

Mode 2 workflow

Bias toward action. If the user's message includes a product name (or URL) and a recognizable goal (research competitors, build a persona, understand churn, find VOC language), skip the questions, state your plan in one sentence, and start Step 1. Only ask when something essential is genuinely missing — product identity or target segment, for example. Don't ask all five questions before doing anything.

Step 1 — Pick sources based on ICP type

Before calling anything, decide which sources are worth hitting for this specific audience:

ICP	Required	Supplement if time allows
B2B SaaS, technical buyers	Reddit (role subs) + G2/Capterra	YouTube tutorials, X/Twitter
SMB / founders	Reddit (r/entrepreneur, r/smallbusiness) + G2/Capterra	YouTube, X/Twitter
Developer / DevOps	Reddit (r/devops, r/programming) + G2/Capterra	YouTube, Hacker News
B2C / consumer	Reddit hobby subs + app store reviews (1–3 star)	YouTube comments, TikTok
Enterprise	G2 Enterprise filter + X/Twitter	LinkedIn, YouTube

Minimum viable run: Reddit + one review site. Add supplementary sources only when the minimum doesn't produce enough signal, or when the ICP table above calls for them.

For platform-by-platform tool call examples, read references/source-playbooks.md.

Step 2 — Run targeted scrapes

Pull from at least 2 sources. Single-source findings are low confidence by definition.

Reddit — the highest-signal source for most ICPs:

scrape_reddit(
    searches=["[product category] frustrations", "[competitor name] problems"],
    sort="top",
    time="year",
    max_items=50,
    skip_comments=False,
    search_posts=True,
    search_comments=True
)

For specific subreddits, pair with start_urls:

scrape_reddit(
    start_urls=["https://www.reddit.com/r/marketing/"],
    searches=["CRM"],
    sort="top",
    time="year",
    max_items=30
)

YouTube comments — rich qualitative data:

# Step 1: find the relevant videos
youtube_top_videos(query="[product category] honest review", max_results=5, sort_by="views")

# Step 2: mine comments from the top results
youtube_comments_search(
    start_urls=["https://www.youtube.com/watch?v=VIDEO_ID_1", "https://www.youtube.com/watch?v=VIDEO_ID_2"],
    max_comments=100,
    comments_sort_by="0"   # "0" = top comments, "1" = newest
)

X/Twitter — complaints, frustrations, and niche conversations:

search_tweets(
    search_terms='"[product name]" frustrating OR broken OR switched OR canceled',
    max_items=50,
    min_faves=5
)

Review sites (G2, Capterra, Trustpilot):

# G2 reviews for a specific product
web_scrape_page(
    url="https://www.g2.com/products/[product-slug]/reviews",
    ai_query="Extract the top complaints and pain points from customer reviews. Include verbatim quotes.",
    use_proxy=True
)

TikTok — consumer conversations and trending frustrations:

scrape_tiktok_videos(
    search_queries=["[product category] problems", "[competitor name] review"],
    results_per_page=30
)

Google discovery — find threads and communities you haven't thought of:

search_google_results(
    query='site:reddit.com "[product category]" "I switched" OR "I quit" OR "stopped using"',
    num_results=20
)

Step 3 — Extract signal from raw data

For each source, extract into this structure:

Field	What to capture
Verbatim quote	Exact words — do not paraphrase
Source	Platform, URL, date
Sentiment	Positive / negative / neutral / frustrated
Theme	Pain / trigger / outcome / alternative / language
Profile signals	Role, company size, industry hints from context

Step 4 — Synthesize across sources

After pulling from 3+ sources, synthesize into the research report format in references/synthesis-templates.md. The report includes:

Top themes ranked by frequency × intensity
VOC quote bank organize

customer-research

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

learn-codebase

remove-deadcode

sendgrid-automation

seo

Recibe nuevas skills de Marketing todos los lunes