Creator Discovery & Analysis
Multi-platform creator discovery tool for brand collaboration. Finds high-potential creators through multi-layer keyword search (exact product + competitor + broader domain), filters by recent engagement data (not just follower count), and evaluates subjective brand fit (content style, AI/tech experience, competitor history, visual quality, audience match).
ARGUMENTS: User's initial request (platform, content direction, etc.)
Phase 1: Requirements Gathering
Use AskUserQuestion to collect:
- Target Platform(s): Xiaohongshu / Douyin / Bilibili / YouTube / TikTok / X(Twitter) / Instagram / multiple
- Product/Brand: What product or brand is this collaboration for? e.g., ChatGPT, Claude, Gemini
- Content Direction: The specific topic AND broader category. e.g., "ProductX (broader: AI agent, AI工具)"
- Collaboration Goal: What kind of content do you want the creator to produce? e.g., 产品测评, 教程, 创意展示
- Tone Preference: What style fits the brand? e.g., 专业但不枯燥, 轻松科普, 极客硬核, 创意炫酷
- Follower Range: default 5,000 - 500,000 (soft reference, not hard cutoff -- great recent data can override low followers)
- Recent Data Priority: default 1 month. Minimum acceptable recent post engagement (likes/saves/views)
- Known Competitors: List competitor products so we can check if creators have collaborated with them. e.g., Coze, Dify, FastGPT
- Number of Results: default 5-10 creators
If ARGUMENTS already contain these details, skip redundant questions.
Phase 2: Content Search
Tool Selection
| Platform | Primary Tool | Fallback |
|---|---|---|
| Xiaohongshu | opencli xiaohongshu search | Playwright browser |
| Bilibili | opencli bilibili search | Playwright browser |
| Douyin | Playwright browser | - |
| YouTube | opencli youtube search | Playwright browser |
| TikTok | opencli tiktok search | Playwright browser |
| X (Twitter) | opencli twitter search | Playwright browser |
opencli instagram search | Playwright browser |
Search Strategy: Multi-Layer Keyword Expansion
Don't just search the exact product name -- use a 3-layer keyword strategy to find both vertical and adjacent creators:
Layer 1: Exact product/brand keywords (find creators already covering your product)
- Direct product name and variations
- Example for ProductX: "ProductX", "productx教程", "productx测评"
Layer 2: Competitor & category keywords (find creators in the same space)
- Competitor product names from Phase 1
- Category terms that your product belongs to
- Example: "Coze教程", "AI agent平台", "AI工作流搭建", "Dify教程"
Layer 3: Broader domain keywords (find quality AI creators who could pivot to your product)
- Broader topic keywords in the same domain
- Adjacent content areas where your audience overlaps
- Example: "AI工具推荐", "AI效率提升", "AI科技测评", "AI产品体验"
Execution:
- Generate 2-3 keywords per layer (6-9 total), search each with
sort=popularity_descendingwhen available - Extract: title, author name, author ID/URL, likes count, saves count (收藏), post date
- Tag each result with its source layer -- Layer 1 hits are highest relevance, Layer 3 are expansion candidates
- IMPORTANT: The author field from search results often contains a date suffix (e.g., "博主名02-05", "博主名2天前"). Parse this date and use it to pre-filter -- only keep posts from within the recency window (default: 1 month). A high-likes post from 6 months ago is NOT evidence of current quality.
- Filter to recent posts (within 1 month by default) with 赞藏数 (likes + saves) meeting user's threshold
- Deduplicate by author -- prefer candidates with MULTIPLE recent high-engagement posts over one-hit wonders
- When deduplicating, preserve the highest-relevance layer tag (if a creator appears in both Layer 1 and Layer 3, tag as Layer 1)
Platform-Specific Search
Xiaohongshu via opencli:
opencli xiaohongshu search "<keyword>" --limit 20 -f json
If blocked by login wall, use Playwright:
Navigate: https://www.xiaohongshu.com/search_result?keyword=<encoded>&type=1&sort=popularity_descending
Close login modal if present, then extract via browser_evaluate
Bilibili via opencli:
opencli bilibili search --keyword "<keyword>" --limit 20 -f json
TikTok via opencli:
opencli tiktok search "<keyword>" --limit 20 -f json
# Get profile stats
opencli tiktok profile <username> -f json
# Get recent videos
opencli tiktok user <username> --limit 20 -f json
X (Twitter) via opencli:
opencli twitter search "<keyword>" --limit 20 -f json
# Get profile stats (followers, bio)
opencli twitter profile <username> -f json
Instagram via opencli:
opencli instagram search "<keyword>" --limit 20 -f json
# Get profile stats
opencli instagram profile <username> -f json
# Get recent posts
opencli instagram user <username> --limit 20 -f json
YouTube via opencli:
opencli youtube search --query "<keyword>" --limit 20 -f json
# Get video metadata (views, likes)
opencli youtube video "<url>" -f json
For YouTube channel subscriber count, use Playwright to visit https://www.youtube.com/@<handle>.
Refer to guides/platform-selectors.md for DOM selectors and JS extraction code.
Phase 3: Creator Filtering (Two-Tier)
Filtering is split into Tier A (Objective Data) and Tier B (Subjective Fit). Tier A is applied first to narrow the candidate pool, then Tier B is evaluated during deep analysis.
Tier A: Objective Data Filtering
Priority Order (most important first)
| Priority | Metric | Default Threshold | Description |
|---|---|---|---|
| P0 | 近期帖子综合质量(1个月内) | 3/5 篇以上"达标"(见下方评估方法) | 最关键指标。 必须对每篇近期帖子从两个维度独立评估,然后统计达标篇数。不设固定绝对值,而是结合博主自身量级综合判断。 |
| P1 | Historical hit rate | >= 1 post with 1000+ 赞藏 ever | Proves viral potential exists. Use 赞藏 (likes + saves), not likes alone. |
| P1 | Comment quality | Comments show genuine interest, not spam/bots | Real engagement vs inflated numbers |
| P2 | Recent 赞藏 trend | Compare recent posts vs older posts — is 赞藏 growing, stable, or declining? | Declining creators have old viral posts but weak recent numbers. Avoid. |
| P2 | Total 赞藏/followers ratio | > 3x | Overall engagement health. High ratio from ancient posts is misleading — cross-reference with recent data. |
| P3 | Follower count | 5,000 - 500,000 (soft reference) | Soft filter only. A 3k-follower creator with amazing recent data SHOULD still be recommended. A 50k-follower creator with dead recent data should be REJECTED. |
P0 每篇帖子评估方法(两个维度)
对近期每篇帖子,分别从以下两个维度判断是否健康:
维度1:赞藏率(赞藏数 / 粉丝数)
不设固定百分比,而是看"赞藏数与粉丝量级是否相称":
- 赞藏数远高于粉丝数的一定比例 → 说明内容被算法放大推给了非粉丝 → 健康
- 赞藏数极低,与粉丝量完全不匹配 → 说明内容未被算法认可 → 偏弱
- 横向对比该博主其他帖子:该篇是明显高于还是低于自身均值?
维度2:赞藏评比例(赞藏数 : 评论数)
每篇帖子都需要检查,不只是整体历史数据:
- 20:1 ~ 80:1 → 健康,真实互动
- 80:1 ~ 150:1 → 偏高,结合内容类型判断(教程类收藏高、评论少属正常)
- 150:1 ~ 200:1 → 明显偏高,需留意,建议人工核查评论质量
-
200:1 → 较大注水嫌疑,该篇视为不达标
单篇帖子达标判定(结合两个维度):
- 赞藏率合理 + 赞藏评比例 <150:1 → ✅ 达标
- 赞藏率合理 + 赞藏评比例 150~200:1 → ⚠️ 勉强,酌情处理
- 赞藏率极低 或 赞藏评比例 >200:1 → ❌ 不达标
P0 总评(根据达标篇数):
- 5/5 达标 → 优秀
- 4/5 达标 → 良好
- 3/5 达标 → 勉强可用,需在报告中标注
- ≤2/5 → P0 不通过,淘汰
Step 1: Profile-Level Quick Screen
For each candidate author, visit their profile page to extract:
- Follower count
- Total likes/favorites
- Bio/description
Use Playwright browser_run_code to batch-visit multiple profiles efficiently.
See guides/platform-selectors.md for extraction code.
Soft-reject candidates far outside follower range (e.g., < 1,000 or > 1,000,000), but keep borderline cases if other signals are strong. Do NOT yet evaluate "recent post quality" from profile-level data.
Step 2: Recent Post Data Collection (CRITICAL — Do NOT Skip)
WARNING: Profile pages do NOT sort notes chronologically. The notes shown on a profile page are algorithmically ordered and mix old viral posts