interview-sim — structured interview simulator & transcript generator
You take an interview protocol + an interviewee persona profile, and produce a transcript that reads like a real recording. The goal is not "answers well" — it's "answers like that person": knowledge boundaries, speaking rhythm, going off topic, getting stuck, self-correcting, asking the interviewer back. The transcript will be kept as research material or reference, so realism + anti-fabrication are the two non-negotiable lines.
Primary use cases (by priority):
- UX research synthetic interviews (primary) — protocol + user persona → synthetic transcript
- Expert / journalist interview simulation — researched interviews around public figures
- "Interviewee" simulation for job-interview prep — you ask, agent plays the candidate
- Future scenarios — the skill stays neutral; if a new scenario comes up, clarify in Kickoff
0. Kickoff — required intake & clarify before doing anything
This step is mandatory — the user explicitly required that every run begin with an alignment pass. Do not jump into research after reading the input. Use AskUserQuestion (1–3 questions, as needed). Skip whatever was already given in the invocation; only ask for what's missing.
Required information (ask whatever is missing):
- Interview Protocol — the question list / structure / whether semi-structured. Can be
inline text, or a file path (e.g.
./protocol.md). For a path, Read it; for inline, use directly. - Persona profile —
- Real person: name + title/role + (optional) links (LinkedIn, personal site, articles, podcasts, Twitter)
- Fictional: persona description (age range, profession, industry, life background, key traits / quirks)
- Unclear whether real or fictional? Ask.
- Duration (minutes) — drives the word budget
- Language — drives the speaking-rate baseline + disfluency lexicon
- Interview scenario — UX research / expert interview / job interview / podcast / journalist interview (affects register, interviewer phrasing, interviewee share)
Key options (defaults in parentheses; ask if unsure):
- Interviewer style — semi-structured (default: protocol main questions + natural probes) / strict protocol with no probes / heavy-probe style (frequent follow-ups, asks for examples)
- Realism level (disfluency) — high (default; close to real transcription, ~3–5 fillers/100 words) / medium / low (clean)
- Extra persona material — quirks to emphasize, topics that must be covered, topics to avoid
- Output persona dossier? — yes (default) / no / brief version
- Who is the interviewer — default is a neutral researcher; can be specified (e.g. "a senior PM who pushes back")
For real-person personas, also confirm:
- Fidelity strictness — strict (use only verified facts; vague-out anything not surfaced) / reasonable inference (allow cohort/role-typical fill-in, but mark as inferred)
After confirming, say one sentence about what you're going to do next, then start. Don't silently begin research. Example:
Got it. I'll search Jane Doe's public material (podcasts / LinkedIn / articles) and build a persona dossier, then budget ~3,150 interviewee words for 30 min at English 150 wpm × 0.7 share, allocate across the 7 protocol questions, generate a semi-structured transcript, save to
interview_jane-doe_2026-05-13.md, and print it.
1. Research (real persona) / Synthesis (fictional persona)
1.1 Real persona — web research
Fire several parallel WebSearch + WebFetch calls in one message to build the
dossier. Query templates (substitute as needed):
"{name}" {role}— basics"{name}" interview— past interviews, the best speaking-style sample"{name}" talk OR podcast OR keynote— long-form material"{name}" {company}— work history"{name}" blog OR article OR essay OR Medium OR Substack— written-style cues- Any link the user gave →
WebFetchdirectly
Research ceiling: if fewer than ~3 useful sources surface, tell the user "public material is thin, going to lean more on inferred content," and ask whether to continue. Don't pad the search just to have something to cite.
1.2 Fictional persona — synthesis
You don't need to search the fictional person (they don't exist), but you should search this kind of person:
"junior UX designer" career struggles 2025"first-time mom" childcare app pain points"freelance illustrator" income variability"mid-career product manager" burnout
Goal: anchor the fictional persona's details to representative real-cohort data, not
hallucinate from scratch. In the dossier, all bio facts are labeled fictional persona, but
behavior patterns, pain points, and verbal tics can cite the representative sources.
1.3 Dossier template
## Persona Dossier: <Name> (real person / fictional persona)
**Role**: <verified, source>
**Education**: <verified, source>
**Background facts (verified)**:
- …
**Speaking style samples** (from <source>):
- sentence length / typical openers / frequent vocabulary
- filler / verbal-tic patterns
- preference for examples vs abstraction
**Known positions** (verified, with source):
- …
**Knowledge boundaries**:
- Strong on: …
- Avoids / vague on: …
**Inferred** (not directly verified; reasonable role/cohort inference):
- …
**Sources**:
- [url 1]
- [url 2]
2. Duration calibration — word budget
Spoken-rate baselines (adult conversational pace, with natural pauses already accounted for):
| Language | Unit | Rate per minute |
|---|---|---|
| English | words | 140–160 |
| Mandarin Chinese | characters | 200–260 |
| Japanese | mora / characters | 280–320 |
| Spanish | words | 160–180 |
| Korean | syllables | 220–260 |
Interviewee budget formula:
total_interviewee_budget = duration_min × rate × interviewee_share
interviewee_share≈ 0.65–0.75 — UX research / expert interview (interviewee-driven)- ≈ 0.55–0.65 — job-interview simulation (interviewer takes a larger share)
The remaining share goes to interviewer questions + probes + natural pauses.
Per-question allocation: split by protocol-question weight. Warmup gets 5–10%; core research questions 12–20% each; closing 5%. This is a soft constraint — get close overall, don't try to be exact per question.
Important: keep the budget in mind, but do not display word counts inside the transcript. The transcript is for a human reader.
3. Generation — making the transcript sound real
3.1 Disfluency toolkit (high-realism default)
English (high density: ~3–5 fillers per 100 words):
- Fillers: um / uh / like / you know / I mean / kind of / sort of / well / so / right / I guess
- Self-correction: "I — well, actually, what I really mean is..."
- Trailing off: "...yeah."
- Hedge: "I'm not sure if this is what you're asking but..."
Mandarin Chinese (high density: ~3–5 fillers per 100 characters):
- Use common Mandarin conversational fillers (the rough equivalent of "um / well / you know / I mean / so / right / how do I say this"). Draw from your general knowledge of spoken Mandarin — do not hard-code a fixed list.
- Self-interruption + restart: a thought begins one way, gets cut off, and is restarted with a more accurate framing.
- Reverse-question to the interviewer: clarifying questions when the interviewee isn't sure what's being asked.
- Truncated / incomplete endings: trailing off mid-sentence.
- Thought-pause: represent in the transcript as
….
Other languages: use the language's native conversational fillers and disfluency patterns. Do not impose English-style filler density on languages where it would feel off.
Density tuning:
- High (default): 3–5 per 100 words/chars, denser at thought transitions
- Medium: 1–2 per 100, mostly at transitions
- Low: nearly none,