Quality review

Use this skill after finishing a PRD, spec, or implementation plan — before implementation starts. It is NOT a bug hunt. It is a targeted check that the artifact, if shipped as written, will feel like a premium product — on the level of CleanMyMac, Raycast, Linear, Things, Stripe Dashboard — and not like a hobby project.

Invoke with: /superpowers-gstack:quality-review

When to invoke

Automatically after completing:

A PRD, spec, or design document
An implementation plan
Output from writing-specs, writing-plans, plan-design-review, plan-eng-review, or any planning skill that produces an artifact ready to hand off to implementation

Run once before implementation. Re-run after substantial spec/plan revisions.

Relationship to pitfall-verification

quality-review is complementary, not overlapping, with pitfall-verification:

Skill	Lens	Question
`pitfall-verification`	Correctness	"Will this work?" (bugs, security, contracts, edge cases)
`quality-review`	Perceived quality	"Will this feel good?" (silent failures, loading, empty/error states, polish)

Recommended flow on a fresh artifact:

pitfall-verification → fix bugs
quality-review → fix feel
Hand off to writing-plans / implementation

Both should be run. They catch different classes of issue.

Sequence

Read the artifact in full. PRD/spec/plan — every section.
Read the relevant existing code. Do not review spec-internal only; cross-check claims against the codebase. A spec that says "extend the existing PlanStore" needs to be verified against what PlanStore actually does today.
Identify peer apps in the domain. For macOS productivity apps: CleanMyMac, Raycast, Linear, Things, Setapp. For web: Linear, Notion, Stripe Dashboard, Vercel. The governing question is: "would this feature, as specified, feel at home next to <peer>?"
Walk the 15 categories below. For each: question → risk surface → verify against code → verdict.
Research when uncertain. If you don't know current best practice for a category (e.g. "what's the right way to do AI structured output in 2026?"), use WebSearch / WebFetch and cite concrete sources (Anthropic docs, Apple HIG, etc). Do not guess.
Produce the verdict in the output format below.

The 15 categories

These are starting points, not exhaustive. Add domain-specific quality risks as they surface.

For each category: state the question, locate the risk surface in the artifact, verify by reading the code, and report N/A | HANDLED | NOT HANDLED — proposed fix.

1. Silent failures

Where in the flow can something fail without the user being told? AI calls returning nothing, parse errors that get "abandoned silently", retry loops with no UI signal, swallowed exceptions. Silent failure = user thinks the app is broken.

Risk surfaces: any try/catch with empty catch, any optional unwrap that defaults to "", any AI call without an error path.

2. Loading states / progress

Is there any operation that takes >500ms without a spinner, skeleton, progress text, or cancel button? Specifically check: AI calls, file I/O, shell commands, scan/index operations, network requests.

Risk surfaces: spec sections that describe an operation without a "during" UI state.

3. Empty states

What does the user see the first time a list/view is empty? A spec that does not define empty state = generic "no items" = bad first impression. Premium apps treat empty state as a teaching moment (Linear's empty inbox, Things' empty Today).

Risk surfaces: any new list/grid/table view in the spec.

4. Error recovery

When something fails, how does the user get back on rails? Inline banner with retry? Toast? Blank screen? Each error path should have an explicit recovery action — not just "show error".

Risk surfaces: every failure mode listed in the spec; every code path that throws.

5. State drift / source-of-truth conflict

If the app's stored state and reality (filesystem, system settings, external API) can diverge, how is reconciliation handled? Classic example: user changes something outside the app, app still shows the old state.

Risk surfaces: any feature that mirrors external state (file watchers, system prefs, external APIs, login items, calendar events).

6. Data-loss risk

Can an undo/revert/overwrite operation, run without a warning, destroy the user's manual edits? Pay special attention to snapshot-based revert flows and auto-fix that overwrites files.

Risk surfaces: any "revert", "restore", "auto-fix", "regenerate" action; any operation that writes to a path the user could have edited.

7. Discoverability

Are there features that are technically implemented but nobody will find? Hidden tab, audit log without UI, keyboard shortcut without a menu item, settings panel buried three levels deep.

Risk surfaces: every feature in the spec — ask "how would a new user encounter this?"

8. Multi-tenancy / scope isolation

If the app has workspaces, profiles, accounts, projects: is the new state correctly scoped, or is it global and leaking between tenants? This is one of the highest-impact pitfalls — global state in a workspaced app breaks the mental model on day one.

Risk surfaces: any *.shared singleton, any ~/Library/Application Support/<app>/*.json path that isn't workspace-scoped, any UserDefaults.standard write.

9. Persistent operations mid-session

Is time-dependent state (snooze, cache expiry, scheduled actions, "remind me in 1h") re-evaluated during an active session, or only at app launch? "Only on launch" = app feels lazy. Premium apps re-check on a timer or via system notifications.

Risk surfaces: any feature involving time, expiry, or scheduling.

10. Keyboard / native conventions

For macOS: ⌘W (close window), ⌘1-9 (tab switching), ⌘↩ (primary action), Space (preview), Esc (dismiss) — are platform conventions respected? For web: tab order, focus ring, Esc-to-dismiss, Enter-to-confirm? For iOS: swipe-to-go-back, edge gestures?

Risk surfaces: any new modal, sheet, list, or view introduced by the spec.

11. Animations / transitions

Does the spec rely on default framework animations (SwiftUI default, CSS default — both feel generic), or specify named easing (.spring, .snappy, cubic-bezier(...)) with rationale? Premium apps have signature motion. Generic ease-in-out reads as "AI-generated".

Risk surfaces: any sheet present/dismiss, any list reorder, any state change that is visible.

12. AI-specific pitfalls

If the artifact involves LLM calls:

Is structured output (JSON) implemented via prompt-engineered "respond with JSON" (~5–15% failure rate) or via native tool_use / structured output API (~0%)? In 2026, the latter is table stakes.
Is fence-stripping (```json ... ```) and schema validation explicit?
Is there a cap on output size (token limit, character limit)?
Is there a fallback when the model returns malformed output?
Is prompt caching used where the prompt has stable prefixes (>1024 tokens)?

If unsure of current best practice — use WebSearch on Anthropic docs.

Risk surfaces: any AI call in the spec.

13. Privileged operations / sudo flows

If the app requires sudo, admin auth, system permissions (Full Disk Access, Accessibility, etc): is the flow framed as deliberate design — explanation sheet + consent + clear "why we need this" — or just a toast that pops up and disappears? "Just a toast" = feels cheap.

Risk surfaces: any TCC permission, any osascript with admin privileges, any Authorization Services call.

14. Localization-readiness

Does the spec use LocalizedStringKey (SwiftUI) / t() / equivalent for user-facing strings, or are strings hardcoded? Even if the app ships English-only today, hardcoded strings = future tech debt and immediate inconsistency with the rest of the codebase if it already localizes.

Risk

quality-review

How to add

Drop this on your repo README

Related skills

MoneyPrinterTurbo

weather-svg-creator

telegram-bot-builder

segment-automation

Get new Automação skills every Monday