Orchestration Log: When this skill is activated, append a log entry to
outputs/orchestration_log.md:### Skill Activation: Qualitative Engine **Timestamp:** [current date/time] **Actor:** AI Agent (qualitative-engine) **Input:** [brief description of the analysis request] **Output:** [brief description — e.g., "23 interviews summarized, 14 first-order codes identified"]
Qualitative Engine
Core Principle: Summary-First, Full-Text-on-Demand
CRITICAL: Qualitative data (interview transcripts) can easily overflow the context window. Follow this strict protocol:
- NEVER load all transcripts into context simultaneously
- ALWAYS generate structured summaries first (Phase 1)
- Work from summaries for coding and analysis (Phases 2-4)
- Load full transcripts ONLY when extracting specific verbatim quotes (Phase 5)
- Load at most 2-3 full transcripts at a time when quote-hunting
Utility Script
The scripts/process_interviews.py script provides:
from scripts.process_interviews import (
load_interviews, # Read all .md files from interviews/
build_index, # Generate INDEX.md with metadata
chunk_interview, # Split long transcripts into chunks
search_interviews, # Keyword search across all interviews
save_index, # Save index file
save_summary, # Save individual summaries
)
Phase 1: Structured Summarization
Purpose
Transform each full transcript into a compact structured summary (~300 words) that preserves analytical value while reducing context consumption by 80-90%.
Step 1: Build Interview Index
interviews = load_interviews("interviews/")
index = build_index(interviews)
save_index(index, "interviews/INDEX.md")
Step 2: Generate Summaries
For EACH interview, read the full transcript and produce a summary in this exact format:
# Summary: [Interviewee Name / Title]
**Date:** [date] | **Role:** [professional role] | **Organization:** [org] | **Duration:** [if available]
## Context
[1-2 sentences: Who is this person? Why were they interviewed? What is their relevance?]
## Key Statements (verbatim quotes)
1. "[Direct quote — max 2 sentences]" — on [topic]
2. "[Direct quote — max 2 sentences]" — on [topic]
3. "[Direct quote — max 2 sentences]" — on [topic]
[3-5 quotes that capture the most analytically valuable statements]
## Core Themes Discussed
- **[Theme A]:** [2-3 sentence summary of their position/experience]
- **[Theme B]:** [2-3 sentence summary]
- **[Theme C]:** [2-3 sentence summary]
## Unique Insights
[1-2 sentences: What does this interviewee say that NO other interviewee says?
What is their unique contribution to the data set?]
## Relevance to Research Questions
- **RQ1:** [How does this interview inform RQ1? One sentence.]
- **RQ2:** [How does this interview inform RQ2? One sentence.]
- **RQ3:** [How does this interview inform RQ3? One sentence.]
[Adapt RQs from framing.md]
Step 3: Save Summaries
save_summary(summary_text, "interviews/summaries/[filename]_summary.md")
Context Budget Rule
- Process interviews one at a time: read transcript → generate summary → save → move to next
- After all summaries are generated, the summaries/ directory becomes the primary data source
- Total summary corpus: ~23 interviews × 300 words = ~7,000 words (fits easily in context)
Phase 2: Initial Coding (First-Order Codes)
Purpose
Identify empirical codes grounded in the data — what interviewees actually say.
Input
Load ALL summary files (not full transcripts):
interviews/summaries/*.md
Coding Process
- Read all summaries in sequence
- Identify recurring patterns across interviews:
- What topics come up repeatedly?
- What language do interviewees use?
- What problems, solutions, or experiences are described?
- Generate first-order codes — stay close to the data:
- Use informant-centric language (in-vivo codes where possible)
- Each code = a specific empirical observation, not an abstract concept
- Target: 20-40 first-order codes
Output Format: Codebook v1
# Codebook v1 — First-Order Codes
**Date:** [date]
**Interviews coded:** [N]
**Total codes:** [N]
| Code ID | Code Label | Description | Example Quote | Frequency |
|---------|-----------|-------------|---------------|-----------|
| C01 | [label] | [what this code captures] | "[short quote]" — [interviewee] | [N interviews] |
| C02 | [label] | [what this code captures] | "[short quote]" — [interviewee] | [N interviews] |
| ... | | | | |
Save to: outputs/codebook_v1.md
Code-to-Interview Matrix
| Code | Interview 1 | Interview 2 | Interview 3 | ... | Total |
|------|------------|------------|------------|-----|-------|
| C01 | ✓ | ✓ | | ... | N |
| C02 | | ✓ | ✓ | ... | N |
Save to: outputs/code_matrix.md
Phase 3: Thematic Grouping (Second-Order Themes)
Purpose
Group first-order codes into higher-level analytical themes.
Process
- Review the codebook — look for clusters of related codes
- Group codes into themes — each theme aggregates 2-5 first-order codes
- Name themes analytically — researcher language, not informant language
- Target: 5-10 second-order themes
For Gioia Method
If using the Gioia methodology, produce the three-level data structure:
# Gioia Data Structure
| First-Order Codes (Informant) | Second-Order Themes (Researcher) | Aggregate Dimensions |
|------------------------------|----------------------------------|---------------------|
| "We just got the tool and figured it out" | Ad-hoc AI adoption | **Unstructured Implementation** |
| "Nobody trained us on how to use it" | Missing capability building | |
| "We spent 2 hours on what used to take 2 days" | Efficiency gains from AI | **Process Transformation** |
| "The routine work basically disappeared" | Routine task elimination | |
| ... | | |
Save to: outputs/gioia_data_structure.md
For Thematic Analysis (Braun & Clarke)
# Theme Map
## Theme 1: [Name]
**Definition:** [What this theme captures]
**Codes included:** C01, C05, C12
**Prevalence:** [N] of [N] interviews
**Key insight:** [1 sentence]
## Theme 2: [Name]
...
Save to: outputs/theme_map.md
For Mayring Content Analysis
# Category System
## Main Category 1: [Name]
### Sub-Category 1.1: [Name]
**Definition:** [precise definition]
**Anchor example:** "[quote]" — [interviewee]
**Coding rule:** [when to apply this code]
### Sub-Category 1.2: [Name]
...
Save to: outputs/category_system.md
Phase 4: Cross-Case Analysis
Purpose
Systematic comparison across interviews to identify patterns and outliers.
Cross-Case Theme Matrix
# Cross-Case Analysis
| Interviewee | Role | Theme 1 | Theme 2 | Theme 3 | Theme 4 | Theme 5 |
|------------|------|---------|---------|---------|---------|---------|
| [Name 1] | [Role] | Strong | Moderate | Absent | Strong | Weak |
| [Name 2] | [Role] | Weak | Strong | Strong | Absent | Moderate |
| ... | | | | | | |
## Pattern Analysis
- **Universal themes** (present in >80% of interviews): [list]
- **Majority themes** (50-80%): [list]
- **Minority/emerging themes** (20-50%): [list]
- **Outlier insights** (<20%, but analytically important): [list]
## Role-Based Patterns
- **Analysts** tend to emphasize: [themes]
- **Senior management** tends to emphasize: [themes]
- **Technology roles** tend to emphasize: [themes]
## Contradictions and Tensions
- [Theme X] vs. [Theme Y]: [describe the tension and which interviewees represent each side]
Save to: outputs/cross_case_analysis.md
Phase 5: Evidence Retrieval (Quote Mining)
Purpose
Go back to FULL transcripts to extract verbatim quotes for specifi