Feature Flags¶

A feature flag is a switch in the code that decides whether a particular piece of functionality is active at runtime. The most important thing it does is separate two activities that have historically been the same event: deploying code to production and exposing functionality to users.

Once those two are separated, a lot of things become easier. Releases stop being binary high-stakes events. Incomplete work can ship safely. Bad releases can be recovered from without a redeploy. New code can be exposed to small audiences first. Many of the practices that high-performing teams rely on quietly depend on flags being a routine part of the codebase.¹

What Flags Actually Do¶

A flag is just a conditional. The discipline is in what the conditional gates and how the team manages it over time.

A few distinct kinds of flag, with different lifecycles:

Release flags. Hide incomplete work that is on trunk but not yet ready for users. These flags are short-lived: once the feature is shipped, the flag and the dormant branch of code are removed.
Operational flags. Allow on-call engineers to disable a misbehaving feature, throttle a problematic code path, or shed load under pressure. These flags are long-lived by design and are part of the system's operational toolkit.
Permission flags. Gate functionality by user, team, plan, or environment. Often integrated with the product's authorization system rather than implemented as ad-hoc flags.
Experiment flags. Drive A/B tests and other controlled experiments. Short-lived by intent; the flag exists for the duration of the experiment.²

Mixing these together in the same codebase, with the same lifecycle, is one of the most common ways flag systems become hard to maintain.

What Flags Make Possible¶

Trunk-based development. Incomplete work can land on trunk behind a disabled flag rather than living on a long-lived branch. The benefits of frequent integration become available without the risk of shipping partial features.⁶
Deploy on one schedule, release on another. Code can be deployed daily while features are released by date, by audience, by experiment, or by readiness. The two schedules can be decoupled because the activation point is independent of the deployment.⁴
Safer rollouts. New functionality can be turned on for internal users first, then a small percentage of customers, then larger cohorts. Issues surface before they affect everyone.
Faster recovery. When a feature misbehaves in production, flipping a flag off is faster and less risky than rolling back a deployment. The bad code is still present; it is just inactive.
Real production testing. Some classes of bug only appear under real traffic. Flagged exposure to a small percentage of real users surfaces these without committing to a full rollout.⁵
Controlled experimentation. A/B testing requires the ability to expose different users to different code paths at the same time. Flags are the substrate that makes this practical.

Common Anti-Patterns¶

Stale flags. Flags that should have been removed are left in the codebase indefinitely. Each one is a permanent piece of conditional complexity, and the codebase accumulates them faster than anyone removes them.
Mixing flag types in the same system. Treating a long-lived operational flag the same way as a short-lived release flag means neither lifecycle is managed well.
No ownership of removal. A flag is added when a feature ships and forgotten when the feature stabilizes. The fix is to make removal a tracked task, with an owner and a date.
Flags as configuration. A flag is meant to be flipped during the lifecycle of work. A configuration value is meant to be set and forgotten. Conflating them muddies both the code and the operational tooling.
Untested off-paths. The disabled branch of a flag is rarely exercised in test environments. Six months later, nobody is sure whether turning the flag off would actually work. Flags require periodic validation in both states.
Production-only flag systems. A flag that can only be flipped in production cannot be exercised in CI or staging. The team learns about the failure modes of the off state when a customer reports them.

What This Looks Like in Practice¶

Track every flag's lifecycle. Each flag should have an owner, a purpose, and an expected lifespan. Long-running flags are explicitly long-running, not accidentally so.
Default to "flag and remove." Most release flags should be removed within weeks of full rollout. A team that does not routinely remove flags is a team that will eventually struggle to add new ones.
Test both states. CI should exercise the system with the flag both on and off, especially for release and operational flags.
Keep operational flags discoverable. On-call engineers should know which flags exist and what they do, ideally documented in the runbook for the system.
Use flags to enable trunk-based development, not to avoid the work of incremental design. A feature flag hiding a large incomplete change is not as good as the same feature broken into small changes that ship and integrate one at a time.³

Important distinction

Deploying software is not the same thing as releasing features. Flags are how the two get separated. A team that has not internalized this distinction is paying release risk every time it deploys.

Pete Hodgson, Feature Toggles (aka Feature Flags) (Martin Fowler's blog, 2017). The most thorough public taxonomy of feature flags, including the distinction between release, operational, permission, and experiment toggles, and the operational practices that keep flag systems maintainable: https://martinfowler.com/articles/feature-toggles.html ↩
Ron Kohavi, Diane Tang, and Ya Xu, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (Cambridge University Press, 2020). The definitive reference for experiment-flag practice, drawn from the authors' work at Microsoft, Google, and LinkedIn. Covers experiment design, sample sizing, common pitfalls (peeking, novelty effects, primacy effects), and the organizational and infrastructural conditions that make A/B testing trustworthy at scale. ↩
Jez Humble, BranchByAbstraction: https://martinfowler.com/bliki/BranchByAbstraction.html. The canonical technique for making a large change incrementally on trunk: introduce an abstraction over the existing implementation, build the new implementation behind it, then remove the abstraction. Often combined with feature flags, and the discipline that prevents flags from becoming long-lived covers for incomplete refactors. ↩
Dave Farley, Modern Software Engineering: Doing What Works to Build Better Software Faster (Addison-Wesley, 2021), and the companion Continuous Delivery video channel at https://www.youtube.com/@ContinuousDelivery. Farley (co-author of the original Continuous Delivery book) treats the deploy/release split as the central CD discipline: flags are how a team earns the right to deploy continuously without committing to release continuously. His material is the clearest practitioner-blog-style treatment of why decoupling the two schedules is load-bearing rather than incidental. ↩
Charity Majors, Liz Fong-Jones, and George Miranda, Observability Engineering (O'Reilly, 2022), and Majors' essays at https://charity.wtf, particularly her writing on "testing in production." The argument is that some failure modes only appear under real traffic, real data, and real concurrency, and that flag-gated exposure to a small fraction of production is a more honest test environment than any pre-production stage. Frames operational and release flags as part of the observability and debugging toolkit, not just the release toolkit. ↩
See Trunk-Based Development for the longer treatment of how flags enable safe integration of incomplete work. ↩