Support and Triage¶
Supporting software is part of building software. The team that ships a product, then hands it to a separate organization to support, has split a single feedback loop in two and lost most of the signal the loop was supposed to produce. Even when the support team is genuinely separate, the engineering team has to treat the support stream as a primary input rather than as someone else's problem.
A working support function is one of the most underappreciated sources of product and engineering insight. What customers ask, where they get stuck, and which features they avoid is among the most direct evidence available about what the product actually is, as opposed to what the team thinks it is.
Support Systems vs. Engineering Systems¶
Support work and engineering work are different shapes, and trying to track them in a single system usually fails both.
- Support systems track symptoms, incidents, and the operational reality of running a product. They are organized around customers and tickets. They prioritize response time, customer impact, and clear communication.
- Engineering systems track work to be done, with priorities, estimates, and ownership. They are organized around code, components, and the team's roadmap.
The right pattern is two systems that flow together. Many support tickets may map to a single engineering change. A single defect may produce dozens of tickets before it is fixed. Treating each as the other's exact unit of work produces friction in both.
The connection between the two matters more than the boundary. Support feeding engineering with patterns ("this same question keeps coming up") and engineering feeding support with context ("we're shipping a fix on Thursday; here's what to tell affected customers") is the loop that makes both functions effective.
Triage Is a Judgment Call¶
Triage is the discipline of deciding which problems get attention now, which get attention later, and which get acknowledged but not acted on. Done well, it produces fewer fires and faster resolution of the ones that matter. Done poorly, it lets the loudest customer or the most recent escalation drive the team's priorities.
A useful set of factors for triage:
- Severity. How bad is the impact when this occurs? A site outage is severe; a minor cosmetic issue is not.
- Frequency. How often does it happen? A rare problem with high severity may rank below a common problem with moderate severity.
- Customer impact. Who is affected, how many, and how visibly?
- Financial impact. What is the actual cost (lost revenue, refunds, contract penalties, support load) of the problem continuing?
- Security risk. Does the issue expose data, allow unauthorized access, or weaken the security posture of the system?
- Operational risk. Could the problem worsen, cascade, or block other work?
- Reproducibility. A problem the team can reproduce reliably is a problem the team can fix. An intermittent issue is harder and may require investigation before triage is even possible.
No single factor decides triage. The art is balancing them, especially when they conflict (a frequent low-severity problem against a rare high-severity one).
Emotionally Loud vs. Systemically Important¶
One of the persistent failure modes in support is treating emotional intensity as a proxy for importance. The customer who emails the CEO is not necessarily reporting a problem that matters more than the hundred customers who quietly hit the same issue and gave up.
- A loud customer with an unusual workflow may be reporting an edge case affecting only them. Fix it if you can, but do not let it crowd out the issues affecting many quieter customers.
- A loud customer with a common problem is also a signal that other customers may be experiencing it. The volume is the alarm, not the priority.
- A silent drop in retention may be a more important signal than any number of escalations. Customers who left without complaining are reporting the most important problem in the most ambiguous way.
The discipline is to look at the pattern of support data, not just the volume of the loudest channels.
Common Anti-Patterns¶
- Treating support as a downstream cost. Support is staffed defensively, kept understaffed, and excluded from product conversations. The team loses its richest source of customer feedback.
- No closed loop to engineering. Support hears the same problem fifty times; engineering hears about it once and forgets. The pattern is invisible because nobody is aggregating it.
- Whoever shouts loudest wins. Triage is reactive to escalation rather than proactive to impact. The team's priorities drift toward whoever is most insistent.
- Treating tickets as the work. A team that measures itself on ticket count optimizes for closing tickets, not for fixing the underlying problems. The same problem returns over and over, each time as a fresh ticket.
- Heroic individual support. One support person who knows everything carries the load. When they leave, so does the knowledge.
What This Looks Like in Practice¶
- Aggregate support signal periodically. Weekly or biweekly reviews of common themes, recurring tickets, and unsolved problems feed product and engineering with patterns rather than instances.
- Close the loop visibly. When engineering fixes something support reported, tell support, and ideally tell the customers. The fix is half the value; closing the loop is the other half.
- Make triage criteria explicit. A written triage rubric, even a simple one, protects the team from defaulting to "whoever asked last."
- Treat support knowledge as a product artifact. Runbooks, common-issue documentation, and frequently-asked-question collections are deliverables of the support function and part of the product.
- Bring engineers into support rotation. A few hours a month of an engineer in support is one of the cheapest ways to improve their product judgment. They see the product as users actually experience it.
Key principle
Emotionally loud issues are not always systemically important. Triage well requires distinguishing the two, treating support data as a primary product signal, and resisting the pull of whichever customer is currently loudest.
See also: Launch Is the Beginning, Continuous Improvement, Incident Response, Release Coordination, Useful Support Ticket, Product Ownership.