Continuous Improvement¶
Healthy software organizations are learning systems. They produce software, observe what that software does in the world, learn from what they observe, and use the learning to change how the next version gets produced. The loop runs continuously: every incident, every support ticket, every retro, every metric is potentially a signal about how the system or the work should change.
The phrase "continuous improvement" comes from the Toyota Production System and the broader lean / kaizen tradition.12 The argument was, and is, that improvement is most effective when it is small, frequent, and built into the work itself, rather than scheduled as a separate large initiative. The same argument applies cleanly to software.
The Sources of Signal¶
A working continuous-improvement loop draws on multiple sources, because each one reveals a different kind of problem:
- Support feedback. What users complain about, ask about, and get stuck on. The most direct evidence of where the product is failing them.
- Product analytics. What users actually do, as opposed to what they say they do. Drop-off points, feature adoption, completion rates, paths through the product.
- User research. Qualitative investigation of why behavior is what it is. The "why" behind the analytics numbers.
- Operational metrics. How the system is behaving in production. Latency, error rates, deploy frequency, lead time, mean time to recovery.4
- Incident reviews. Structured investigation of what went wrong, why, and what the system or team should change. Blameless postmortems are the modern discipline.
- Engineering retrospectives. The team's own observations about how they are working. Bottlenecks, recurring frustrations, small fixes that would pay back daily.
- Cross-team feedback. What other parts of the organization are seeing that this team is not. Sales calls, customer-success conversations, regulatory feedback.
A team that uses only one of these sources is leaving most of the signal on the table. A team that uses all of them, well, has a complete-enough picture to know where to invest.
Why Improvement Initiatives Usually Fail¶
Most large "continuous improvement" or "operational excellence" efforts in software organizations produce disappointing results. The patterns are predictable:
- Improvement as a separate project. A quarterly initiative, owned by a separate team, with a deck and a charter, disconnected from the work where the improvements would actually live.
- Top-down problem selection. Leadership picks the problems to solve, often based on what is visible from their altitude. The biggest problems are usually invisible from the top because the people closest to them are not asked.
- Heroics rewarded, prevention invisible. The engineer who fixes a production outage at 3am gets praised; the engineer whose code prevented the outage gets nothing, because the prevented outage is not a story.
- No mechanism for closure. Retros and postmortems generate action items that nobody owns or tracks. The same problem recurs every quarter. The same retro item appears in successive retros.
- Improvement without measurement. "We're doing this to improve things," but the team cannot tell, six months later, whether things actually improved. Without before-and-after measurement, the team is hoping rather than learning.
These are not failures of effort. They are failures of structure. Continuous improvement is not an initiative; it is a property of how the team operates day to day.
The Three Ways¶
The Phoenix Project articulates a useful three-part framing for the operational practice of continuous improvement, drawn from lean manufacturing:3
- Flow. Improving the throughput of work from idea to production. Reducing batch sizes, eliminating handoffs, removing bottlenecks.
- Feedback. Amplifying signal from downstream (operations, support, customers) back to upstream (engineering, product). Shorter feedback loops produce faster correction.
- Continuous learning and experimentation. Treating improvement itself as a practice, with explicit experiments, deliberate skill building, and a culture that makes it safe to try things that might not work.
Each Way is necessary; none is sufficient on its own. A team that has Flow but not Feedback is moving fast in the wrong direction. A team that has Feedback but not Learning notices the same problems repeatedly without ever changing how they work.
What This Looks Like in Practice¶
A few habits make improvement continuous rather than performative:
- Run blameless postmortems on incidents. The point is to learn what about the system allowed the incident, not who to blame. The fix is structural, not individual.
- Track action items to completion. Every retro and postmortem produces commitments. If nobody owns them and nobody tracks them, they produce nothing.
- Measure what matters and watch it over time. DORA's four key metrics (deploy frequency, lead time for changes, mean time to restore, change failure rate) are a strong baseline. Pick what fits the team and track it.4
- Reward prevention as much as recovery. The team's incentive system should treat "shipped a fix that prevented a class of outage" as at least equally valuable to "responded heroically to an outage."
- Make work visible. Improvements rarely happen if the team cannot see what is in flight, what is blocked, and where time is actually going. The first improvement many teams need is making their own work visible to themselves.5
- Improve in small steps. A team that runs a hundred small experiments a year learns more than one that runs one large reorganization. Small steps fail safely; large ones fail expensively.
Core idea
Modern software organizations are learning systems. The work is not just to produce software, but to produce a team that produces better software over time. A team that ships continuously but does not improve continuously is producing more of the same; a team that improves continuously is producing a different kind of work.
See also: Quality Is Designed In, Systems Thinking, Incident Response, Observability, Trunk-Based Development, Launch Is the Beginning.
-
W. Edwards Deming, The New Economics for Industry, Government, Education (MIT Press, 1993). Articulates the System of Profound Knowledge and the broader argument that continuous improvement is a property of the system rather than a result of individual effort. Read alongside Out of the Crisis for the foundational case. ↩
-
Eliyahu M. Goldratt, The Goal: A Process of Ongoing Improvement (North River Press, 1984). The originating novel of the Theory of Constraints, which provides the analytical basis for thinking about flow, bottlenecks, and where improvement effort actually pays off. ↩
-
Gene Kim, Kevin Behr, and George Spafford, The Phoenix Project (IT Revolution Press, 2013). Adapts lean manufacturing concepts to software operations, including the "Three Ways" (Flow, Feedback, Continuous Learning and Experimentation) that became foundational to the DevOps movement. Gene Kim et al., The DevOps Handbook (IT Revolution Press, 2016), is the framework companion to the Phoenix Project's narrative. ↩
-
Nicole Forsgren, Jez Humble, and Gene Kim, Accelerate: The Science of Lean Software and DevOps (IT Revolution Press, 2018). The empirical research behind the now-standard "DORA four key metrics" of engineering performance: deploy frequency, lead time for changes, mean time to restore, and change failure rate. The book makes the case that these metrics are not just operational indicators but reliable predictors of organizational performance. ↩↩
-
Dominica DeGrandis, Making Work Visible: Exposing Time Theft to Optimize Workflow (IT Revolution Press, 2017). On the practical discipline of visualizing knowledge work, applying WIP limits, and surfacing the "five thieves of time" (too much WIP, unknown dependencies, unplanned work, conflicting priorities, neglected work) that block improvement. ↩