AI-Native Product Development
"AI products aren't deterministic. They require continuous calibration, not just A/B tests."
This skill covers AI-Native Product Development — the overlay that modifies discovery, architecture, and delivery when AI is at the core. It addresses the unique challenges of building products where AI agents perform tasks autonomously.
Part of: Modern Product Operating Model — a collection of composable product skills.
Related skills: product-strategy, product-discovery, product-architecture, product-delivery, product-leadership
When to Use This Skill
Use this skill when:
- Building AI agents that act on behalf of users
- Adding LLM-powered features to existing products
- Designing human-AI interaction patterns
- Deciding how much autonomy to give AI
- Setting up eval strategies and calibration loops
- Managing the "agency-control tradeoff"
Not needed for: Traditional software products, ML models used only for backend optimization (no user-facing autonomy)
What Makes AI Products Different
Traditional Software vs. AI Products
| Dimension | Traditional Software | AI-Native Products |
|---|---|---|
| Behavior | Deterministic | Probabilistic |
| Testing | Unit tests, QA | Evals, calibration |
| Correctness | Binary (works or doesn't) | Spectrum (good enough?) |
| User role | Operator | Delegator + Reviewer |
| Failure mode | Error messages | Plausible but wrong outputs |
| Iteration | Ship → Measure → Iterate | Ship → Observe → Calibrate |
| Trust building | Feature completeness | Demonstrated reliability |
The Core Challenge
AI products must navigate a fundamental tension:
More autonomy = More value (fewer steps, faster outcomes)
More autonomy = More risk (errors affect real work)
This is the Agency-Control Tradeoff.
Framework: The CCCD Loop
Credit: Aishwarya Goel & Kiriti Gavini
AI products require a Continuous Calibration and Confidence Development (CCCD) loop:
┌─────────────────────────────────────────────────────────────────┐
│ CCCD LOOP │
│ │
│ CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE │
│ ↓ ↓ ↓ ↓ │
│ Eval and Build user Observe AI Update evals │
│ adjust AI trust over interactions and models │
│ behavior time at scale │
└─────────────────────────────────────────────────────────────────┘
CCCD Components:
| Component | Purpose | Activities |
|---|---|---|
| Calibrate | Tune AI behavior to match user expectations | Run evals, adjust prompts/models, set guardrails |
| Confidence | Build appropriate user trust | Show AI reasoning, enable verification, demonstrate reliability |
| Continuous Discovery | Observe AI-user interactions at scale | Log interactions, identify failure patterns, surface edge cases |
| → Back to Calibrate | Update based on learnings | Improve evals, retrain, adjust prompts |
The Agency-Control Progression
Five Levels of AI Agency
| Level | Description | AI Does | User Does | Example |
|---|---|---|---|---|
| 1. Assist | AI suggests, user executes | Generates options | Chooses and acts | Autocomplete, suggestions |
| 2. Recommend | AI ranks, user approves | Analyzes and recommends | Reviews and approves | "AI recommends these 3 actions" |
| 3. Execute with confirmation | AI acts after approval | Prepares action | Confirms before execution | "Send this email?" → Yes/No |
| 4. Execute with notification | AI acts, notifies after | Acts autonomously | Reviews outcomes | "I scheduled the meeting and sent invites" |
| 5. Fully autonomous | AI acts without notification | Handles end-to-end | Sets goals, reviews exceptions | AI handles routine tasks silently |
Progression Strategy
Start lower, earn higher:
Level 1 → Build trust → Level 2 → Demonstrate reliability → Level 3 → ...
Graduation Criteria:
| From Level | To Level | Requires |
|---|---|---|
| 1 → 2 | Assist → Recommend | User accepts suggestions > 70% |
| 2 → 3 | Recommend → Execute with confirm | User approves recommendations > 80% |
| 3 → 4 | Execute+confirm → Execute+notify | User confirms without edit > 90% |
| 4 → 5 | Execute+notify → Autonomous | User overrides < 5%, high-stakes scenarios excluded |
Never fully autonomous for:
- Irreversible actions (delete, send, purchase)
- High-stakes decisions (financial, legal, health)
- Novel situations outside training distribution
- Actions affecting third parties
AI-Native Discovery
Standard discovery practices need adaptation for AI products.
Modified Discovery Focus
| Standard Discovery | AI-Native Adaptation |
|---|---|
| "What job are you trying to do?" | + "How much do you want to delegate?" |
| "What's your current workflow?" | + "Which steps are you comfortable AI handling?" |
| "What would success look like?" | + "What errors would be unacceptable?" |
| "Show me how you do this today" | + "Show me how you verify AI work today" |
AI-Specific Discovery Questions
Delegation appetite:
- "Which parts of this task feel tedious vs. require your judgment?"
- "If AI made an error here, what would the consequences be?"
- "How would you want to verify AI's work?"
Trust calibration:
- "What would AI need to demonstrate before you'd trust it to [action]?"
- "Have you used AI tools before? What built or broke your trust?"
- "Would you prefer AI to do more but occasionally err, or do less perfectly?"
Failure tolerance:
- "What kinds of errors are annoying vs. damaging?"
- "How quickly do you need to catch and fix AI mistakes?"
- "What's your 'undo' option if AI gets it wrong?"
Observing AI Interactions
In addition to interviews, AI discovery includes:
| Method | What to Look For |
|---|---|
| Session recordings | Where do users override AI? Where do they accept blindly? |
| Interaction logs | Patterns in edits, rejections, corrections |
| Feedback analysis | Explicit signals (thumbs down, ratings) |
| Support tickets | AI-related complaints and confusion |
AI-Native Architecture
Solution Brief Additions
For AI features, add to standard solution brief:
AI-SPECIFIC SECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]
FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]
EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]
CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]
CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]
AI Bet Categories
In addition to standard bet categories:
| Category | Description | Example |
|---|---|---|
| Capability expansion | AI can handle new task types | "AI can now summarize documents" |
| Agency graduation | Move to higher autonomy level | "AI sends emails without confirmation" |
| Calibration improvement | Better accuracy/reliability | "Reduce hallucination rate from 5% to 2%" |
| Confidence building | Better user trust | "Show AI reasoning before action" |
| Guardrail strengthening | Prevent harmful outputs | "Add content policy enforcement" |
AI-Native Delivery
Eval Strategy (Replaces Traditional Testing)
Eval Types:
| Eval Type |