Quality review
Use this skill after finishing a PRD, spec, or implementation plan — before implementation starts. It is NOT a bug hunt. It is a targeted check that the artifact, if shipped as written, will feel like a premium product — on the level of CleanMyMac, Raycast, Linear, Things, Stripe Dashboard — and not like a hobby project.
Invoke with: /superpowers-gstack:quality-review
When to invoke
Automatically after completing:
- A PRD, spec, or design document
- An implementation plan
- Output from
writing-specs,writing-plans,plan-design-review,plan-eng-review, or any planning skill that produces an artifact ready to hand off to implementation
Run once before implementation. Re-run after substantial spec/plan revisions.
Relationship to pitfall-verification
quality-review is complementary, not overlapping, with pitfall-verification:
| Skill | Lens | Question |
|---|---|---|
pitfall-verification | Correctness | "Will this work?" (bugs, security, contracts, edge cases) |
quality-review | Perceived quality | "Will this feel good?" (silent failures, loading, empty/error states, polish) |
Recommended flow on a fresh artifact:
pitfall-verification→ fix bugsquality-review→ fix feel- Hand off to
writing-plans/ implementation
Both should be run. They catch different classes of issue.
Sequence
- Read the artifact in full. PRD/spec/plan — every section.
- Read the relevant existing code. Do not review spec-internal only; cross-check claims against the codebase. A spec that says "extend the existing PlanStore" needs to be verified against what
PlanStoreactually does today. - Identify peer apps in the domain. For macOS productivity apps: CleanMyMac, Raycast, Linear, Things, Setapp. For web: Linear, Notion, Stripe Dashboard, Vercel. The governing question is: "would this feature, as specified, feel at home next to <peer>?"
- Walk the 15 categories below. For each: question → risk surface → verify against code → verdict.
- Research when uncertain. If you don't know current best practice for a category (e.g. "what's the right way to do AI structured output in 2026?"), use WebSearch / WebFetch and cite concrete sources (Anthropic docs, Apple HIG, etc). Do not guess.
- Produce the verdict in the output format below.
The 15 categories
These are starting points, not exhaustive. Add domain-specific quality risks as they surface.
For each category: state the question, locate the risk surface in the artifact, verify by reading the code, and report N/A | HANDLED | NOT HANDLED — proposed fix.
1. Silent failures
Where in the flow can something fail without the user being told? AI calls returning nothing, parse errors that get "abandoned silently", retry loops with no UI signal, swallowed exceptions. Silent failure = user thinks the app is broken.
Risk surfaces: any try/catch with empty catch, any optional unwrap that defaults to "", any AI call without an error path.
2. Loading states / progress
Is there any operation that takes >500ms without a spinner, skeleton, progress text, or cancel button? Specifically check: AI calls, file I/O, shell commands, scan/index operations, network requests.
Risk surfaces: spec sections that describe an operation without a "during" UI state.
3. Empty states
What does the user see the first time a list/view is empty? A spec that does not define empty state = generic "no items" = bad first impression. Premium apps treat empty state as a teaching moment (Linear's empty inbox, Things' empty Today).
Risk surfaces: any new list/grid/table view in the spec.
4. Error recovery
When something fails, how does the user get back on rails? Inline banner with retry? Toast? Blank screen? Each error path should have an explicit recovery action — not just "show error".
Risk surfaces: every failure mode listed in the spec; every code path that throws.
5. State drift / source-of-truth conflict
If the app's stored state and reality (filesystem, system settings, external API) can diverge, how is reconciliation handled? Classic example: user changes something outside the app, app still shows the old state.
Risk surfaces: any feature that mirrors external state (file watchers, system prefs, external APIs, login items, calendar events).
6. Data-loss risk
Can an undo/revert/overwrite operation, run without a warning, destroy the user's manual edits? Pay special attention to snapshot-based revert flows and auto-fix that overwrites files.
Risk surfaces: any "revert", "restore", "auto-fix", "regenerate" action; any operation that writes to a path the user could have edited.
7. Discoverability
Are there features that are technically implemented but nobody will find? Hidden tab, audit log without UI, keyboard shortcut without a menu item, settings panel buried three levels deep.
Risk surfaces: every feature in the spec — ask "how would a new user encounter this?"
8. Multi-tenancy / scope isolation
If the app has workspaces, profiles, accounts, projects: is the new state correctly scoped, or is it global and leaking between tenants? This is one of the highest-impact pitfalls — global state in a workspaced app breaks the mental model on day one.
Risk surfaces: any *.shared singleton, any ~/Library/Application Support/<app>/*.json path that isn't workspace-scoped, any UserDefaults.standard write.
9. Persistent operations mid-session
Is time-dependent state (snooze, cache expiry, scheduled actions, "remind me in 1h") re-evaluated during an active session, or only at app launch? "Only on launch" = app feels lazy. Premium apps re-check on a timer or via system notifications.
Risk surfaces: any feature involving time, expiry, or scheduling.
10. Keyboard / native conventions
For macOS: ⌘W (close window), ⌘1-9 (tab switching), ⌘↩ (primary action), Space (preview), Esc (dismiss) — are platform conventions respected? For web: tab order, focus ring, Esc-to-dismiss, Enter-to-confirm? For iOS: swipe-to-go-back, edge gestures?
Risk surfaces: any new modal, sheet, list, or view introduced by the spec.
11. Animations / transitions
Does the spec rely on default framework animations (SwiftUI default, CSS default — both feel generic), or specify named easing (.spring, .snappy, cubic-bezier(...)) with rationale? Premium apps have signature motion. Generic ease-in-out reads as "AI-generated".
Risk surfaces: any sheet present/dismiss, any list reorder, any state change that is visible.
12. AI-specific pitfalls
If the artifact involves LLM calls:
- Is structured output (JSON) implemented via prompt-engineered "respond with JSON" (~5–15% failure rate) or via native tool_use / structured output API (~0%)? In 2026, the latter is table stakes.
- Is fence-stripping (
```json ... ```) and schema validation explicit? - Is there a cap on output size (token limit, character limit)?
- Is there a fallback when the model returns malformed output?
- Is prompt caching used where the prompt has stable prefixes (>1024 tokens)?
If unsure of current best practice — use WebSearch on Anthropic docs.
Risk surfaces: any AI call in the spec.
13. Privileged operations / sudo flows
If the app requires sudo, admin auth, system permissions (Full Disk Access, Accessibility, etc): is the flow framed as deliberate design — explanation sheet + consent + clear "why we need this" — or just a toast that pops up and disappears? "Just a toast" = feels cheap.
Risk surfaces: any TCC permission, any osascript with admin privileges, any Authorization Services call.
14. Localization-readiness
Does the spec use LocalizedStringKey (SwiftUI) / t() / equivalent for user-facing strings, or are strings hardcoded? Even if the app ships English-only today, hardcoded strings = future tech debt and immediate inconsistency with the rest of the codebase if it already localizes.
Risk