When this skill is activated, always start your first response with the 🧢 emoji.
Interview Design
Structured interview design is the discipline of building hiring processes that produce consistent, defensible, and predictive hiring decisions. The core insight is that unstructured conversations are notoriously unreliable predictors of job performance - structured processes with explicit rubrics dramatically improve both accuracy and fairness. This skill covers the full lifecycle: scoping the interview loop, writing rubrics, building coding challenges, calibrating interviewers, and running debriefs that lead to confident decisions.
When to use this skill
Trigger this skill when the user:
- Needs to design an interview loop or process for a role
- Wants to create scoring rubrics or evaluation criteria
- Asks how to build a coding challenge or take-home assignment
- Needs help writing behavioral interview questions
- Wants to design a system design interview round
- Is trying to assess culture fit in a structured, defensible way
- Needs to run calibration sessions with a panel
- Asks how to run an effective debrief meeting
Do NOT trigger this skill for:
- Preparing as a candidate to pass interviews (different audience, different goal)
- Compensation benchmarking or offer negotiation (use a compensation skill instead)
Key principles
-
Structured beats unstructured - Consistent questions asked in the same order with pre-defined scoring criteria outperform free-form conversations every time. Interviewers who "go with their gut" introduce bias, not signal.
-
Score independently before debrief - Every interviewer must submit a written score and evidence summary before the panel debrief. Verbal-only debrief allows the first strong opinion to anchor everyone else. Written scores first.
-
Test for the actual job - Every interview exercise should map to a real task the candidate will perform in the role. If a backend engineer will never sort arrays on the job, don't test array sorting in isolation. Use job-relevant problems.
-
Rubrics prevent drift - Without a rubric, two interviewers evaluating the same candidate will produce wildly different scores. A rubric aligns on what "strong" and "weak" looks like before the first candidate walks in.
-
Debrief is where decisions happen - The debrief meeting is not a vote-counting exercise. It is a structured discussion to surface new evidence, resolve disagreements, and reach a confident collective judgment. The hiring manager owns the final call.
Core concepts
Interview types map to different evaluation needs. Coding interviews assess problem-solving and technical mechanics. System design interviews assess architectural thinking at scale. Behavioral interviews (using STAR) assess past behavior as a proxy for future behavior. Values/culture interviews assess alignment with how the team operates. Take-homes assess real-world execution and follow-through. Most loops include 3-5 rounds covering different dimensions so no single round carries all the weight.
Rubric design is the practice of defining expected performance at multiple
levels (typically 1-4 or Strong No / No / Yes / Strong Yes) before interviews begin.
A good rubric specifies concrete behaviors, not adjectives. "Breaks problem into
subproblems, names variables clearly, asks clarifying questions before coding" is
a rubric. "Good technical skills" is not. See references/rubric-templates.md for
ready-to-use rubric templates.
Signal vs noise distinguishes real predictors of job performance from irrelevant factors. Signal: how a candidate structures ambiguity, responds to hints, explains trade-offs. Noise: how polished their communication style is, whether they went to a brand-name school, how quickly they reached the solution. Train interviewers to write down evidence (what the candidate said/did) rather than impressions ("seemed smart").
Calibration is the practice of running mock interviews with known candidates (or invented personas) so interviewers practice applying the rubric consistently before live interviews begin. A calibration session where two interviewers score the same response and then compare notes surfaces misalignment early.
Common tasks
Design a structured interview loop
Start by mapping the role's core competencies - typically 4-6 dimensions that predict success. Common dimensions for engineering roles:
| Dimension | Who covers it |
|---|---|
| Technical fundamentals | Coding round 1 |
| System design / architecture | System design round |
| Problem-solving approach | Coding round 2 |
| Collaboration / communication | Bar raiser or cross-functional |
| Values and culture | Hiring manager or peer |
| Past impact and trajectory | Behavioral / resume deep-dive |
Rules for a well-designed loop:
- Every dimension is covered by exactly one round (no redundancy)
- No interviewer covers more than one dimension (keeps each fresh)
- The loop can be completed in one business day on-site or two days virtual
- Assign a "bar raiser" - someone outside the immediate team with veto power
Create scoring rubrics - template
Use a 4-level rubric for each dimension. The key is defining the middle levels precisely - candidates cluster there, and those are the hard decisions.
Dimension: [Name, e.g., "Problem Decomposition"]
Weight: [High / Medium / Low]
4 - Strong Yes
Candidate independently breaks problem into clean subproblems. Names
intermediate data structures without prompting. Explains trade-offs of
multiple approaches before choosing. Handles edge cases proactively.
3 - Yes
Candidate breaks problem into subproblems with minor prompting. Solves
the core problem correctly. Handles most edge cases when prompted.
Explains the primary trade-off.
2 - No
Candidate solves simple version but struggles to generalize. Requires
significant prompting to identify subproblems. Misses important edge
cases. Does not discuss trade-offs unless directly asked.
1 - Strong No
Candidate cannot decompose the problem independently. Solution is
incorrect or incomplete. Does not respond to hints. Cannot explain
what their own code does.
See references/rubric-templates.md for complete rubrics for coding,
system design, behavioral, and culture fit rounds.
Build a take-home coding challenge
Take-homes reveal real-world execution that 45-minute whiteboard problems cannot. Design one that:
- Scopes to 2-3 hours max - Respect candidate time. If it takes a senior engineer 2 hours, calibrate down. State the expected time in the instructions.
- Uses a realistic problem - "Build a rate limiter for our API" beats "implement a binary search tree." Domain-adjacent problems reveal how candidates think about the actual work.
- Provides a starter repo - Give candidates a repo with the scaffolding, CI, and test runner already wired. Evaluating candidates on setup skills is noise.
- Defines evaluation criteria upfront - Include a
EVALUATION.mdin the repo that lists exactly what reviewers will look for: correctness, test coverage, code clarity, README quality. - Has a follow-up interview - Schedule a 30-minute code walkthrough. This prevents submitting work that isn't the candidate's own and surfaces how they think about their own decisions.
Evaluation checklist for reviewers:
- Does the solution solve the stated problem?
- Are edge cases handled?
- Is the code readable without explanation?
- Are there tests, and are they meaningful?
- Does the README explain design decisions?
- Are there obvious improvements the candidate noted themselves?
Design behavioral interview questions - STAR format
Behavioral questions follow the pattern: "Tell me about a time when..." The STAR framework (Situation, Task, Action, Result) gives candidates a structure and gives interviewers a rubric