People Sourcer
A real recruiter, BD person, or research lead doesn't open a CRM first. They start with a question: who specifically am I trying to reach, and why? Then they hunt — across whichever platform that tribe actually lives on — and they keep notes. This skill makes Claude work that way, end-to-end, into a spreadsheet the user can act on.
Core premise
Spam happens when you compile names without context. A list with 200 anonymous LinkedIn URLs is worse than 30 rows where each one has a real signal — this person posted last week about exactly your problem, here's how to enter their world.
So the rule is: never source from zero. Always source from signal. Find the place where the right people are already self-identifying, scrape that signal, then enrich and personalize. The personalization is what makes the difference between a useful list and noise.
When to use this skill vs. just Google
Use this skill when the deliverable is a list of named individuals with structured fields. Don't use it for:
- Aggregate research ("how big is the X market") — use web search.
- Finding a single specific named person — just web search + verify.
- Company lists without people attached — that's account research.
If in doubt: if the user wants rows in a spreadsheet with names and an outreach angle, this is the skill.
The workflow
Six phases. Each writes to a scratchpad so context survives long runs.
1. Intake → pin down WHO and WHY
2. Source strategy → pick platforms + queries
3. Discovery → iterative BrightData scraping
4. Enrichment → per-person profile + contact pull
5. Personalization → worldbuilder commentary per row
6. Output → multi-sheet xlsx
Skip phases that are already done. If the user hands you a list of profile URLs and just wants enrichment + commentary, jump to Phase 4.
Phase 0: Scratchpad first
Before scraping anything, create a scratchpad so you don't lose the thread mid-run.
mkdir -p /home/claude/sourcing-work/<project-slug>/raw
touch /home/claude/sourcing-work/<project-slug>/brief.md
touch /home/claude/sourcing-work/<project-slug>/candidates.jsonl
brief.md— the persona, query plan, and audience model. Seereferences/scratchpad-template.md.candidates.jsonl— one JSON line per candidate, appended as you find them. JSONL because you'll be writing as you scrape, and a corruption in one line doesn't kill the whole file.raw/— raw scrapes by source URL, named likelinkedin-search-1.json,reddit-r-netsec-1.md, etc.
Why JSONL for candidates: you'll likely process 30–500 people across multiple rounds. Mid-run failures shouldn't lose progress. Append-only is the right shape.
Phase 1: Intake
Pin down the brief in brief.md before doing anything else. The single most expensive mistake in sourcing is scraping the wrong audience well.
Required:
- Persona — role/title, seniority, function. Be precise: "senior backend engineer with Rust experience" not "good engineer."
- Signals — what publicly-visible behavior identifies them? They contributed to repo X. They posted about Y last quarter. They list Z certification. They lead a meetup on W. Without signals, you're guessing.
- N — how many do they want? 20 ≠ 200 in workflow shape.
- Purpose — recruiting? sales? podcast guests? user research? This determines the "outreach angle" column entirely.
- Geography / language — global? specific country/city? English-only?
- Custom fields — anything beyond defaults the user wants captured.
- Output preference — xlsx (default), Google Sheet via Drive (if connected), or CSV.
If the brief is vague ("find me ML people"), ask 1–2 sharp questions before scraping. Don't ask a wall — ask the ones that actually change the search:
- "Are you looking to hire them, sell to them, or interview them? It changes who I prioritize."
- "Any specific signal — open-source contribs, conference talks, recent job changes — that should weight my search?"
If the user is decisive ("just find me 50 senior MLEs in Bangalore who post about RAG"), skip the questions and go.
Phase 2: Source strategy
Pick platforms based on where the persona actually lives. See references/source-matrix.md for the full decision table; the short version:
| Persona | Primary platform | Why |
|---|---|---|
| B2B/SaaS buyers, execs, recruiters' candidates | Self-identified work history, public posts | |
| Devs / technical talent | GitHub + Reddit + X (formerly Twitter) + LinkedIn | Code is the signal; posts are the noise |
| Indie hackers / founders | X, IndieHackers forum, ProductHunt, LinkedIn | Where they ship and gripe |
| Security / pentesting | Reddit (r/netsec, r/AskNetsec, r/oscp), X infosec, ctftime, conference speaker pages | Tribe is small, vocal, identifiable |
| Researchers / academics | Google Scholar, arXiv, ResearchGate, university pages, Twitter/X | Citations + author pages |
| Creators / influencers | YouTube, TikTok, Instagram, Twitter/X | Platform IS the work |
| Local community / event attendees | Facebook events, Meetup, local subreddits, Eventbrite | Hyperlocal |
| Journalists / writers | Twitter/X, Muck Rack, bylines on outlet sites | Bylines = identity |
Plan your queries in brief.md before firing them. Write them out as a numbered list so you can reuse and iterate.
Tools
These are the BrightData tools you'll lean on — they're deferred, so call tool_search first to load them.
| Goal | Tool |
|---|---|
| Find LinkedIn profiles by query | bd:web_data_linkedin_people_search |
| Pull a single LinkedIn profile (full data) | bd:web_data_linkedin_person_profile |
| Pull LinkedIn posts | bd:web_data_linkedin_posts |
| Reddit post + comments | bd:web_data_reddit_posts |
| X (Twitter) posts | bd:web_data_x_posts |
| Instagram profile / posts / reels | bd:web_data_instagram_profiles / _posts / _reels |
| TikTok profile / posts | bd:web_data_tiktok_profiles / _posts |
| YouTube profile / videos / comments | bd:web_data_youtube_profiles / _videos / _comments |
| Facebook posts / events | bd:web_data_facebook_posts / _events |
| Discovery (which subs, which writers, etc.) | bd:search_engine, bd:search_engine_batch, bd:discover |
| GitHub profiles, personal sites, niche forums, anything else | bd:scrape_as_markdown (or bd:scrape_batch for ≤10 URLs) |
See references/bd-tool-cheatsheet.md for parameter examples.
Phase 3: Discovery — iterative scraping
Sourcing is not "one search and done." It's a loop where each round narrows from where-they-are to who-specifically-they-are.
Round 1 — Locate the watering holes
For each platform you picked, run a discovery query to find the places the persona congregates. Use bd:search_engine_batch to fire several at once.
Example for "senior Rust backend engineers in EU":
"senior rust" engineer site:linkedin.com europerust backend site:github.com followers:>200site:reddit.com/r/rust experience hiringrusty-days OR rustconf speaker
Don't scrape candidates yet. Just identify where they cluster — the active subreddits, the LinkedIn groups, the conference speaker pages, the GitHub orgs.
Write findings under "Round 1 — Discovery" in brief.md.
Round 2 — Pull candidates from the watering holes
Now actually pull people. Choose the right tool per source:
- LinkedIn search results →
bd:web_data_linkedin_people_searchwith structured filters (role, location, current company keywords). - A subreddit thread of "who's hiring" or "who wants a job" →
bd:web_data_reddit_poststo get post + commenter usernames + their text. - A conference speaker page →
bd:scrape_as_markdownon the speaker URL, then for each speaker name, search their LinkedIn / X. - A GitHub org or repo's contributors page →
bd:scrape_as_markdownon/graphs/contributors.
Append each candidate as a JSONL line to candidates.jsonl imme