Customer Research
Guide for gathering and synthesizing real customer intelligence — from online communities, review sites, video comments, and social platforms — using the Hyper MCP scraper toolkit.
The goal is always the same: surface what customers actually say (in their own words), not what you assume they say.
Out of scope — defer to other skills
| Request | Send them to |
|---|---|
| Researching competitor brands (site, ads, search rank) | competitor-intel |
| Writing copy informed by the research | copywriting |
| Optimizing a page using VOC insights | page-cro |
| Keyword research and SERP analysis | seo-research |
Requirements
- Hyper MCP installed. https://app.hyperfx.ai/mcp
- Apify scrapers toolkit enabled at https://app.hyperfx.ai/integrations — provides Reddit, Twitter, YouTube, TikTok, and Instagram scrapers.
Not all scrapers need to be active for every run — enable the ones relevant to your ICP (Reddit and one review site is the minimum). If a scraper tool is missing from the tool list, skip that source and continue with the others.
Tool surface
| Tool | Purpose |
|---|---|
scrape_reddit | Mine posts and comments from subreddits or by keyword |
search_tweets | Search X/Twitter with advanced operators and engagement filters |
youtube_top_videos | Find the top YouTube videos on a topic — use as input for comment mining |
youtube_comments_search | Pull comments from specific YouTube video URLs |
youtube_transcript | Fetch the full transcript of a YouTube video for language/topic extraction |
scrape_tiktok_videos | Search TikTok by keyword or hashtag — find trending conversations and comments |
web_scrape_page | Scrape review pages (G2, Capterra, Trustpilot, app stores) |
firecrawl_scrape_url | Cleaner extraction for JS-heavy review pages |
search_google_results | Find discussion threads, forum posts, and site: searches |
scrape_instagram_posts | Pull recent posts from specific brand or community accounts |
Critical rules
- Always capture verbatim language. Don't paraphrase customer quotes — the exact words are what gets used in copy and messaging. Extract and preserve them.
- Scrape before summarizing. Don't rely on your training data to describe what customers say about a product. Actually fetch the sources.
- Label confidence on every insight. High = 3+ independent sources, unprompted. Medium = 2 sources or prompted only. Low = single source. Never present a Low-confidence finding as a conclusion.
- Mind the bias of each source. Reddit skews technical and skeptical. Review sites skew toward power users and people with strong opinions. Support tickets skew toward problems. Factor this in before generalizing.
- Don't invent persona details. If you don't have data for a persona field, leave it blank rather than filling it in with assumptions.
youtube_transcriptis slow (~15–30s). It spins up an isolated sandbox. Only use it for videos where the language in the spoken content (not comments) is what matters.
Two modes
Most research combines both modes. Establish which applies before starting.
Mode 1 — Analyze existing assets
The user provides raw material: interview transcripts, survey responses, NPS verbatims, support tickets, win/loss notes. No tool calls needed — the job is extraction and synthesis.
Read references/synthesis-templates.md for the extraction framework, persona template, and VOC quote bank format. Then produce the requested deliverable.
Mode 2 — Go find research online
The user needs intel from online communities, review sites, and social platforms. This is where MCP tools do the heavy lifting.
See references/source-playbooks.md for per-source tool call examples and signal extraction tips.
Mode 2 workflow
Bias toward action. If the user's message includes a product name (or URL) and a recognizable goal (research competitors, build a persona, understand churn, find VOC language), skip the questions, state your plan in one sentence, and start Step 1. Only ask when something essential is genuinely missing — product identity or target segment, for example. Don't ask all five questions before doing anything.
Step 1 — Pick sources based on ICP type
Before calling anything, decide which sources are worth hitting for this specific audience:
| ICP | Required | Supplement if time allows |
|---|---|---|
| B2B SaaS, technical buyers | Reddit (role subs) + G2/Capterra | YouTube tutorials, X/Twitter |
| SMB / founders | Reddit (r/entrepreneur, r/smallbusiness) + G2/Capterra | YouTube, X/Twitter |
| Developer / DevOps | Reddit (r/devops, r/programming) + G2/Capterra | YouTube, Hacker News |
| B2C / consumer | Reddit hobby subs + app store reviews (1–3 star) | YouTube comments, TikTok |
| Enterprise | G2 Enterprise filter + X/Twitter | LinkedIn, YouTube |
Minimum viable run: Reddit + one review site. Add supplementary sources only when the minimum doesn't produce enough signal, or when the ICP table above calls for them.
For platform-by-platform tool call examples, read references/source-playbooks.md.
Step 2 — Run targeted scrapes
Pull from at least 2 sources. Single-source findings are low confidence by definition.
Reddit — the highest-signal source for most ICPs:
scrape_reddit(
searches=["[product category] frustrations", "[competitor name] problems"],
sort="top",
time="year",
max_items=50,
skip_comments=False,
search_posts=True,
search_comments=True
)
For specific subreddits, pair with start_urls:
scrape_reddit(
start_urls=["https://www.reddit.com/r/marketing/"],
searches=["CRM"],
sort="top",
time="year",
max_items=30
)
YouTube comments — rich qualitative data:
# Step 1: find the relevant videos
youtube_top_videos(query="[product category] honest review", max_results=5, sort_by="views")
# Step 2: mine comments from the top results
youtube_comments_search(
start_urls=["https://www.youtube.com/watch?v=VIDEO_ID_1", "https://www.youtube.com/watch?v=VIDEO_ID_2"],
max_comments=100,
comments_sort_by="0" # "0" = top comments, "1" = newest
)
X/Twitter — complaints, frustrations, and niche conversations:
search_tweets(
search_terms='"[product name]" frustrating OR broken OR switched OR canceled',
max_items=50,
min_faves=5
)
Review sites (G2, Capterra, Trustpilot):
# G2 reviews for a specific product
web_scrape_page(
url="https://www.g2.com/products/[product-slug]/reviews",
ai_query="Extract the top complaints and pain points from customer reviews. Include verbatim quotes.",
use_proxy=True
)
TikTok — consumer conversations and trending frustrations:
scrape_tiktok_videos(
search_queries=["[product category] problems", "[competitor name] review"],
results_per_page=30
)
Google discovery — find threads and communities you haven't thought of:
search_google_results(
query='site:reddit.com "[product category]" "I switched" OR "I quit" OR "stopped using"',
num_results=20
)
Step 3 — Extract signal from raw data
For each source, extract into this structure:
| Field | What to capture |
|---|---|
| Verbatim quote | Exact words — do not paraphrase |
| Source | Platform, URL, date |
| Sentiment | Positive / negative / neutral / frustrated |
| Theme | Pain / trigger / outcome / alternative / language |
| Profile signals | Role, company size, industry hints from context |
Step 4 — Synthesize across sources
After pulling from 3+ sources, synthesize into the research report format in references/synthesis-templates.md. The report includes:
- Top themes ranked by frequency × intensity
- VOC quote bank organize