User Behavior Testing¶

Users rarely behave the way teams expect. They develop workarounds, skip steps, misunderstand labels, and optimize for their own goals rather than the team's intended workflow. This is not a flaw in the users. It is the predictable behavior of people who have a job to do and a tool in front of them; the tool is judged by whether it gets the job done, not by whether the user followed the design.

User behavior testing is the discipline of finding out what users actually do, as distinct from what the team assumes they do. It is the empirical foundation underneath product judgment. Teams that test routinely accumulate accurate models of their users; teams that do not accumulate convincing but inaccurate ones.

What You Can Actually Learn¶

Different techniques surface different kinds of information:

User interviews. Open-ended conversations about what users are trying to accomplish, what works, what does not, and how they think about the product or the problem. Best for understanding the why behind behavior.¹
Usability studies. Watching a user attempt to do a specific task with the product. The richest source of information about whether the interface communicates what the team thinks it communicates. Five users is usually enough to find the major problems.
Observed workflows. Watching a user do their actual work, not a contrived task. Reveals how the product fits into the broader context of the user's day and where it creates or removes friction.
Session recordings. Aggregated playback of real user sessions. Useful for finding patterns across many users; less useful for understanding any individual session.
Heatmaps. Aggregate views of where users click, scroll, and pause. Indicate where attention goes; less revealing about why.
Prototype testing. Showing a low-fidelity version of an idea to users before building it. The cheapest way to discover that an idea was wrong is before the code exists.
Diary studies and longitudinal research. Tracking what a small group of users does over weeks or months. Reveals adoption patterns, fatigue effects, and habits that single-session studies miss.

Most teams use several of these in combination. Qualitative methods (interviews, usability studies) tell the team what to think about. Quantitative methods (heatmaps, analytics, A/B tests) tell the team how widespread the patterns are.

What Users Are Actually Doing¶

A few patterns reliably show up across products, and finding them in your product is a useful sanity check:

Users do not read. They scan. Long blocks of text are skipped. Labels are interpreted from the first one or two words. Instructions are missed.
Users develop workarounds. When the intended path is too slow or too confusing, users find a faster path, often one the team did not design. The workaround is signal.
Users misinterpret labels. Words that seem clear to the team mean something else to the user. The fix is not to add more explanation; it is to use words the user already knows.
Users do not undo. Most users do not know about undo, do not trust it, or cannot find it. They work around the absence of confidence by being cautious in ways that slow them down.
Users repeat patterns. Once they figure out a way to do something, they do it that way forever. Even when a better path is added later, established users often do not discover it.
Users optimize for their own goals. Not for the team's funnel, not for the product's success metrics. The interface that fights the user's actual goal loses.

These are not flaws to be designed around with more training or more documentation. They are properties of how people use software, and good design accepts them.

Common Pitfalls¶

Leading the user. Asking "do you like this design?" or "is this clear?" produces answers shaped by the question. Asking "show me how you would use this to accomplish X" produces answers shaped by behavior.¹
Demo mode. Users who know they are being tested behave differently from users who are just trying to get something done. Field observation, when it can be done, is more reliable than lab studies.
Talking to the wrong users. Testing with the team's most engaged customers tells you what your power users think. The much larger population of casual or new users is the one most products lose, and they are the ones whose feedback is hardest to get.
Recency bias in interpretation. The most memorable thing a user said gets quoted in the readout, even if it was an outlier. Look for patterns across users, not for the most quotable comment.
Treating opinion as behavior. What a user says they will do is often different from what they actually do. The reliable signal is in observed behavior, not stated intent.

What This Looks Like in Practice¶

Schedule research as continuous, not project-bound. Teresa Torres's "continuous discovery habits" framing argues for small, ongoing user contact rather than large, infrequent research projects.² Continuous discovery produces a working model of the user; sporadic discovery produces a snapshot that ages quickly.
Show, don't ask. Wherever possible, watch users do the task rather than ask them about it. Behavior is more reliable than self-report.
Test prototypes before code. A clickable prototype tested with five users will surface most of the same issues as a built feature tested with the same users, at one-tenth the cost.
Bring engineers and product into research. A summary of research is much less useful than seeing the research happen. The team learns the user by encountering them, not by reading about them.
Look for negative results. A study that confirms what the team already thought is less valuable than one that surprises them. Surprises are where the team's mental model was wrong, and that is the only place new information lives.

Key principle

Users optimize for their goals, not the designer's assumptions. The discipline of behavior testing is to find out, repeatedly, what those goals are and how the product is helping or hurting them. Without it, the team is designing against a model of the user that the user has never confirmed.

Rob Fitzpatrick, The Mom Test (2013). On how to talk to customers without being lied to, including a sharp critique of validation questions ("would you buy this?") that produce comforting and useless answers, and concrete techniques for getting truthful evidence instead. ↩↩
Teresa Torres, Continuous Discovery Habits (Product Talk LLC, 2021). The canonical articulation of small, ongoing customer research as a working practice rather than an episodic activity. ↩