AI-Native Product Development

"AI products aren't deterministic. They require continuous calibration, not just A/B tests."

This skill covers AI-Native Product Development — the overlay that modifies discovery, architecture, and delivery when AI is at the core. It addresses the unique challenges of building products where AI agents perform tasks autonomously.

Part of: Modern Product Operating Model — a collection of composable product skills.

Related skills: product-strategy, product-discovery, product-architecture, product-delivery, product-leadership

When to Use This Skill

Use this skill when:

Building AI agents that act on behalf of users
Adding LLM-powered features to existing products
Designing human-AI interaction patterns
Deciding how much autonomy to give AI
Setting up eval strategies and calibration loops
Managing the "agency-control tradeoff"

Not needed for: Traditional software products, ML models used only for backend optimization (no user-facing autonomy)

What Makes AI Products Different

Traditional Software vs. AI Products

Dimension	Traditional Software	AI-Native Products
Behavior	Deterministic	Probabilistic
Testing	Unit tests, QA	Evals, calibration
Correctness	Binary (works or doesn't)	Spectrum (good enough?)
User role	Operator	Delegator + Reviewer
Failure mode	Error messages	Plausible but wrong outputs
Iteration	Ship → Measure → Iterate	Ship → Observe → Calibrate
Trust building	Feature completeness	Demonstrated reliability

The Core Challenge

AI products must navigate a fundamental tension:

More autonomy = More value (fewer steps, faster outcomes)
More autonomy = More risk (errors affect real work)

This is the Agency-Control Tradeoff.

Framework: The CCCD Loop

Credit: Aishwarya Goel & Kiriti Gavini

AI products require a Continuous Calibration and Confidence Development (CCCD) loop:

┌─────────────────────────────────────────────────────────────────┐
│                        CCCD LOOP                                │
│                                                                 │
│    CALIBRATE → CONFIDENCE → CONTINUOUS DISCOVERY → CALIBRATE   │
│         ↓           ↓              ↓                 ↓         │
│     Eval and    Build user    Observe AI       Update evals    │
│     adjust AI    trust over   interactions     and models      │
│     behavior     time         at scale                         │
└─────────────────────────────────────────────────────────────────┘

CCCD Components:

Component	Purpose	Activities
Calibrate	Tune AI behavior to match user expectations	Run evals, adjust prompts/models, set guardrails
Confidence	Build appropriate user trust	Show AI reasoning, enable verification, demonstrate reliability
Continuous Discovery	Observe AI-user interactions at scale	Log interactions, identify failure patterns, surface edge cases
→ Back to Calibrate	Update based on learnings	Improve evals, retrain, adjust prompts

The Agency-Control Progression

Five Levels of AI Agency

Level	Description	AI Does	User Does	Example
1. Assist	AI suggests, user executes	Generates options	Chooses and acts	Autocomplete, suggestions
2. Recommend	AI ranks, user approves	Analyzes and recommends	Reviews and approves	"AI recommends these 3 actions"
3. Execute with confirmation	AI acts after approval	Prepares action	Confirms before execution	"Send this email?" → Yes/No
4. Execute with notification	AI acts, notifies after	Acts autonomously	Reviews outcomes	"I scheduled the meeting and sent invites"
5. Fully autonomous	AI acts without notification	Handles end-to-end	Sets goals, reviews exceptions	AI handles routine tasks silently

Progression Strategy

Start lower, earn higher:

Level 1 → Build trust → Level 2 → Demonstrate reliability → Level 3 → ...

Graduation Criteria:

From Level	To Level	Requires
1 → 2	Assist → Recommend	User accepts suggestions > 70%
2 → 3	Recommend → Execute with confirm	User approves recommendations > 80%
3 → 4	Execute+confirm → Execute+notify	User confirms without edit > 90%
4 → 5	Execute+notify → Autonomous	User overrides < 5%, high-stakes scenarios excluded

Never fully autonomous for:

Irreversible actions (delete, send, purchase)
High-stakes decisions (financial, legal, health)
Novel situations outside training distribution
Actions affecting third parties

AI-Native Discovery

Standard discovery practices need adaptation for AI products.

Modified Discovery Focus

Standard Discovery	AI-Native Adaptation
"What job are you trying to do?"	+ "How much do you want to delegate?"
"What's your current workflow?"	+ "Which steps are you comfortable AI handling?"
"What would success look like?"	+ "What errors would be unacceptable?"
"Show me how you do this today"	+ "Show me how you verify AI work today"

AI-Specific Discovery Questions

Delegation appetite:

"Which parts of this task feel tedious vs. require your judgment?"
"If AI made an error here, what would the consequences be?"
"How would you want to verify AI's work?"

Trust calibration:

"What would AI need to demonstrate before you'd trust it to [action]?"
"Have you used AI tools before? What built or broke your trust?"
"Would you prefer AI to do more but occasionally err, or do less perfectly?"

Failure tolerance:

"What kinds of errors are annoying vs. damaging?"
"How quickly do you need to catch and fix AI mistakes?"
"What's your 'undo' option if AI gets it wrong?"

Observing AI Interactions

In addition to interviews, AI discovery includes:

Method	What to Look For
Session recordings	Where do users override AI? Where do they accept blindly?
Interaction logs	Patterns in edits, rejections, corrections
Feedback analysis	Explicit signals (thumbs down, ratings)
Support tickets	AI-related complaints and confusion

AI-Native Architecture

Solution Brief Additions

For AI features, add to standard solution brief:

AI-SPECIFIC SECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AGENCY LEVEL
Target: [Level 1-5]
Graduation path: [How might this evolve?]

FAILURE MODES
• [Failure mode 1]: [Consequence] → [Mitigation]
• [Failure mode 2]: [Consequence] → [Mitigation]

EVAL STRATEGY
• [Eval type 1]: [What we measure, how often]
• [Eval type 2]: [What we measure, how often]

CALIBRATION PLAN
• Initial calibration: [Approach]
• Ongoing calibration: [Cadence, triggers]

CONFIDENCE BUILDING
• How AI explains itself: [Approach]
• How users verify: [Mechanisms]
• Trust-building milestones: [Progression]

AI Bet Categories

In addition to standard bet categories:

Category	Description	Example
Capability expansion	AI can handle new task types	"AI can now summarize documents"
Agency graduation	Move to higher autonomy level	"AI sends emails without confirmation"
Calibration improvement	Better accuracy/reliability	"Reduce hallucination rate from 5% to 2%"
Confidence building	Better user trust	"Show AI reasoning before action"
Guardrail strengthening	Prevent harmful outputs	"Add content policy enforcement"

AI-Native Delivery

Eval Strategy (Replaces Traditional Testing)

Eval Types:

| Eval Type |

ai-native-product

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Get new Design e Frontend skills every Monday