Systems Thinking¶
Software products exist inside larger systems of people, processes, incentives, and constraints. The code is only the most visible piece. Around it sit teams, dependencies, customers, deadlines, regulators, vendors, and competitors. The behavior of the product is the behavior of the whole system, not just the behavior of the part the engineering team can directly change.
Systems thinking is the discipline of reasoning about that whole.1 It pushes back against the temptation to fix the part you can see and hope the rest holds.
The Whole Is Not the Sum of Its Parts¶
Most engineering organizations are built to optimize parts. Teams are scored on their own throughput. Tools measure individual contribution. Roadmaps describe a list of features, not a hypothesis about the system the features will produce.
The result is a familiar pattern: every team is doing well by its own measures, and the overall product is suffering. This is not a coordination failure to be fixed with more meetings. It is the predictable behavior of a system whose incentives are aimed at the wrong level. As Russell Ackoff argued repeatedly, a system is never the sum of its parts; it is the product of their interactions, and improving the parts in isolation can reliably degrade the whole.7
A few of the laws of this discipline are worth naming:2
- Today's problems come from yesterday's solutions. The fix the team applied last quarter is the source of the issue this quarter.
- The harder you push, the harder the system pushes back. Aggressive optimization of one variable tends to provoke compensating behavior elsewhere.
- Cause and effect are not closely related in time or space. The reason a customer is churning today may be a decision made eighteen months ago in a meeting most of the current team never attended.
Recognizing these patterns does not make them go away. It changes what counts as a useful intervention.
Flow Is Determined by the Bottleneck¶
Every system has a constraint: the part that limits the throughput of the whole. Improvements to the constraint improve the system. Improvements to anything else do not.3
In software organizations, the constraint is rarely the engineers writing code. It is more often one of:
- A single overloaded reviewer who becomes the gatekeeper for every change.
- A flaky test suite that turns every deploy into an hour-long ritual.
- An approval process that requires three meetings to move a ticket forward.
- A shared environment that only one team can occupy at a time.
- A product owner whose calendar is the only place clarification can happen.
Speeding up everything except the bottleneck makes the bottleneck worse: more work piles up in front of it, while the rest of the system runs faster than the system can absorb. This is one of the most common failures in well-intentioned engineering improvement efforts.4
The discipline is to find the constraint, protect it, and only invest in the rest of the system after the constraint has been addressed.
Most Problems Are in the System¶
Deming argued that the overwhelming majority of variation in any production system is attributable to the system itself, not to the individuals working inside it.56 Blaming an engineer for a slow review cycle, a tester for a missed bug, or a support agent for a long resolution time is almost always misattributing a system problem to a person.
This has practical implications:
- Performance problems are usually structural. When the same problem recurs across people in the same role, the role is the problem, not the people.
- Process problems are usually incentive problems. "Why doesn't anyone document anything?" is usually answered by "what gets rewarded around here?"8
- Quality problems are usually upstream. Bugs caught in production are most often the result of a choice made upstream, in design, in scoping, or in priorities. Fixing the bug without fixing the upstream cause produces another one in the same shape.
The fix for systemic problems is to change the system. The fix for individual problems is to support the individual. Confusing the two is the source of most failed improvement efforts.
Common Anti-Patterns¶
A few patterns reliably indicate that local optimization is winning:
- Optimizing ticket count over customer value. Teams ship more tickets and customers are no happier.
- Optimizing utilization over flow. Everyone is "fully booked," nothing is finishing.
- Optimizing speed over maintainability. Velocity rises this quarter, defect rate rises next quarter.
- Optimizing launch date over operational readiness. The product ships on time and the operations team carries the cost for the next year.
- Optimizing team metrics over product outcomes. Each team's dashboard is green, the product loses ground in the market.
Each of these is a system telling the truth about what it is being asked to produce. None of them are individual failures of judgment. They are predictable responses to the incentives in place.
What This Looks Like in Practice¶
A few habits keep systems thinking real rather than slogan:
- Look for the bottleneck before optimizing anywhere else. "What is the constraint?" is the most useful question in nearly every improvement conversation.
- Measure end-to-end, not per-team. Lead time, defect rate in production, customer-reported issue volume, and time-to-resolution describe the system. Per-team velocity describes a part.
- Trace recurring problems to their upstream cause. A problem that keeps coming back is almost always a symptom of a system that produces it on schedule.
- Be wary of "best practice" imported in isolation. A practice that worked in another organization is a practice that worked in another system. Whether it improves yours depends on your constraints, not theirs.
- Watch for the laws of the discipline. Faster is sometimes slower. More effort sometimes produces less throughput. Strict enforcement sometimes increases the behavior it was meant to suppress. These are not paradoxes. They are how systems behave when their feedback loops are misunderstood.
Key principle
The performance of the system matters more than the performance of isolated parts. An organization that optimizes its parts in isolation will reliably produce a worse system than one that optimizes for flow across the whole.
See also: Quality Is Designed In, Testing Is Not Quality, Continuous Improvement, Collaborative Engineering, The Build Trap, Further Reading.
-
Donella H. Meadows, Thinking in Systems: A Primer (Chelsea Green, 2008). The most accessible introduction to the language of systems thinking: stocks, flows, feedback loops, and leverage points. ↩
-
Peter M. Senge, The Fifth Discipline: The Art & Practice of the Learning Organization (Doubleday, 1990). The originating text for treating organizations as learning systems, and the source of the "laws of the fifth discipline" the three bullets above paraphrase. ↩
-
Eliyahu M. Goldratt, The Goal: A Process of Ongoing Improvement (North River Press, 1984). The novel that introduced the Theory of Constraints. Its central argument, that the throughput of a system is determined by its bottleneck and that non-bottleneck improvements do not improve the system, is foundational to most modern operations thinking. ↩
-
Gene Kim, Kevin Behr, and George Spafford, The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win (IT Revolution Press, 2013). Applies the Theory of Constraints, lean manufacturing, and systems thinking to a software organization in narrative form. The on-ramp to The DevOps Handbook for readers who prefer story to framework. ↩
-
W. Edwards Deming, Out of the Crisis (MIT Press, 1986). The originating text for the argument that quality is designed in by the system rather than inspected in by the workers, and the source of the often-cited claim that the overwhelming majority of problems in any production system are attributable to the system, not to the individuals working inside it. ↩
-
W. Edwards Deming, The New Economics for Industry, Government, Education (MIT Press, 1993). Deming's later articulation of his "System of Profound Knowledge," covering appreciation for a system, knowledge of variation, theory of knowledge, and psychology. ↩
-
Russell L. Ackoff, Re-Creating the Corporation: A Design of Organizations for the 21st Century (Oxford University Press, 1999), and the essays collected in Ackoff's Best: His Classic Writings on Management (Wiley, 1999). Ackoff's central argument, that the performance of a system depends on the interactions among its parts rather than on the parts taken separately, and that performance management of the parts in isolation reliably degrades the whole, is one of the cleanest articulations of why local optimization fails. ↩
-
Alfie Kohn, Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes (Houghton Mifflin, 1993). The canonical critique of extrinsic-reward systems, arguing that incentive structures designed to motivate behavior often undermine the intrinsic motivation they depend on. A useful pairing with Deming on why performance-evaluation systems frequently produce the behaviors they were meant to suppress. ↩