Feature Flagging
A senior engineer's playbook for using feature flags well, not just frequently.
Feature flags are infrastructure. Treated as such, they enable kill switches, gradual rollouts, A/B experiments, permission gates, and operational toggles without redeploys. Treated casually, they become the largest accumulating technical debt in your codebase: thousands of dead flags, conflicting evaluation logic, brittle targeting, and a permission surface no one fully understands.
This skill is the discipline that prevents the second outcome. It assumes you have a feature flag platform (LaunchDarkly, Flagsmith, Split.io, VWO FME, GrowthBook, Statsig, PostHog, Optimizely; the platform does not matter for the principles). It assumes your engineering team can implement targeting rules and SDK integration. The hard part is the operational discipline, and that is what is here.
When to use this skill: any time you are about to introduce a flag, modify a flag, audit existing flags, or design a flag-related governance policy.
What this skill is for
The skill spans the operational lifecycle of a flag from creation through retirement. Flag types and the discipline of not mixing them. Naming conventions that survive code review. Lifecycle expectations baked in at creation, not bolted on later. Targeting rules that compose without fragility. Rollout strategies that match the risk profile of the launch. Stale flag management on a quarterly cadence. Governance and permissions that balance access with audit. Performance considerations so flags do not become a latency tax. Testing patterns that cover both branches. Rollback discipline. Observability across rollouts.
The skill does not cover experiment design; for hypothesis writing, sample size, MDE, and the discipline of arriving at a defensible decision, see the experiment-design skill. It does not cover statistical analysis or variance reduction; those live in the experimentation-analytics skill. It does not cover platform-specific tooling; for MCP commands, auth, and platform-specific configuration, consult the chosen platform's documentation. This skill produces the operational shape; the platform implements it.
The five flag types
Mixing flag types is the root cause of most flag mess. A flag is one of five things; commit at creation and do not let it drift.
Release flag. Code is in production but disabled. The new feature ships dark, gets toggled on for a percentage, then ramps to 100. Lifetime: short, days to weeks. Lifecycle: clean removal after launch. Common name prefix: release_.
Experiment flag. Users are randomly assigned to variants; conversion is measured. The flag controls which variant a user sees. Lifetime: medium, one to six weeks. Lifecycle: variant chosen, code paths consolidated, flag removed. Common name prefix: exp_.
Operational flag. A kill switch or circuit breaker that lets ops disable a misbehaving feature without a redeploy. Lifetime: long-lived, often years. Lifecycle: usually never removed; remains as standby. Common name prefix: ops_.
Permission flag. Controls feature access by plan tier, customer cohort, or region. The free tier sees one set of features; the enterprise tier sees another. Lifetime: long-lived. Lifecycle: managed alongside billing and access infrastructure. Common name prefix: perm_.
Configuration flag. Lets some customers see different behavior based on contractual configuration. White-label tenants, regulated regions, custom rollout schedules per account. Lifetime: long-lived. Lifecycle: governed by sales and product agreements. Common name prefix: cfg_.
Each type has different lifecycle, governance, and removal expectations. Mixing them in one flag (the flag is both a kill switch AND a permission gate AND now we are using it for an experiment) is the most common source of flag mess. When the flag's purpose changes, create a new flag and migrate. Do not overload an existing one.
Flag naming conventions
A flag name encodes type, owner, and purpose. Without that encoding, a flag list at month nine is unreadable and the cleanup playbook from references/stale-flag-cleanup-playbook.md cannot tell what is safe to remove.
The convention this skill recommends is <type>_<owner>_<semantic_name>_<version_or_date>:
release_checkout_redesign_2026q2exp_billing_pricing_v2ops_search_circuit_breakerperm_enterprise_audit_logcfg_tenant_acme_custom_dashboard
Pick snake_case or kebab-case once, organization-wide, and stick with it. Mixing both in the same platform produces typo-driven bugs that bite at 3 AM. Vague names die a slow death: new_feature, temp_toggle, test_flag, pricing_update_v3. Within months, no one knows which pricing_update shipped and which is dead.
For deeper coverage including the table of typed prefixes, owner conventions, and the migration plan for existing badly-named flags, see references/flag-naming-conventions.md.
The flag lifecycle
Every flag has five life phases. Each phase has explicit entry and exit criteria. Skipping phases is how flag mess accumulates.
Birth. Flag created with explicit metadata: owner, type, target removal date (for release and experiment flags), rollout plan, monitoring approach. The metadata is not optional; without it the flag has no end-of-life.
Adolescence. The feature behind the flag is being built. Code paths exist for both the disabled (current production) branch and the enabled (new) branch. Both are tested. The flag remains off in production.
Launch. Production rollout begins. Percentage starts low (1 or 5 percent), monitored at each step, ramps if metrics hold. Ramp gates documented in references/flag-rollout-strategies.md.
Maturity. The flag is at 100 percent rollout. The new code path is the production path. Monitoring continues for at least 30 days to catch issues that did not show up during the ramp.
Death. The flag is removed from code (PR that deletes the gating logic) and removed from the platform. The audit trail records the removal.
The asymmetry: birth is fast (one PR creates the flag and gates the new code), death requires intentional cleanup. Most flag mess is unfinished death. Birth-and-death have to be planned together; the death plan is part of the birth metadata.
For the per-phase checklist, see references/flag-lifecycle-checklist.md.
Targeting rules and segmentation
A targeting rule is the boolean expression that determines whether a user, account, or request gets the treatment branch. There are four useful target dimensions:
- User attributes.
user.email_domain == "acme.com",user.signup_date > "2026-01-01",user.plan == "enterprise". - Account attributes.
account.tier == "enterprise",account.region == "EU",account.feature_x_enabled == true. - Request attributes.
request.country == "US",request.device_type == "mobile",request.api_version >= 3. - Time-based.
time > "2026-06-01T00:00:00Z",time < "2026-12-31T23:59:59Z".
Compose with AND, OR, and NOT. Keep the expressions simple. If your rule needs three nested clauses, your taxonomy is wrong. Either define a segment (a named group of users with shared attributes) and target the segment, or split into multiple flags.
The most common pitfall: targeting on attributes that change frequently. If the rule is user.last_login_date > "2026-04-01", a user who logs in on May 1 sees the treatment, then their value changes the next day, and they see the control again. The user experience whiplashes. Volatile attributes belong in segments computed off snapshots, not in live targeting rules.
For pattern catalog and anti-patterns, see [references/targeting-rule-patterns.md](references/targeting-