To paraphrase Mark Twain, it is not what people know about accident investigations, causality, and corrective actions that gets them into trouble (or leads to weak corrective action and thus the same problems over and over again), it is what they know that just is not so. This post is based on Sidney Dekker’s “The Field Guide to Understanding Human Error.” Only read further if you dare to have your worldview of critiques and corrective action challenged. You may conclude that everything you think you know about human error investigations (also known as critiques and fact findings) is wrong or in need of serious revision.
Most of what you think you know (called “the Old view” by Dekker) about investigating problems and human factors is probably wrong, will not lead to the outcomes you think you want, or, worse yet, will actually make effective progress in making things safer harder to achieve. This is not because of any evil intent, merely poor understanding (very few people get formal training on causal analysis), error carried forward (learning about human error by watching what others do) and ignorance (can be addressed by reading the reference books I recommend, see below). There is no escaping that the hopes many organizations have put in traditional post-event “remedies” like adding automation or procedures, reprimanding miscreants, retraining, and adding supervision (only temporarily, until we are sure they “get it”) are typically ineffective. Do any of the aforementioned corrective actions look familiar? There is little long-term effect on organizational performance if the basic conditions that people work under are left unchanged.
Systems are not basically safe. Safety is created by the people in them. People hold together the patchwork of technologies introduced into their worlds. They are the only ones who can make it all work in actual practice; human errors are at the heart of system failure because people are at the heart of making these systems work in the first place. Human error is not a cause; it is a symptom of trouble deeper inside a system. Safety is never the only concern, or even the primary concern of any system. Systems do not exist to be safe, they exist to make money; to render a service; or to provide a product. Besides safety there are multiple other objectives and pressures on our complex systems.
People are limited cognitive processors (there are no unlimited cognitive processors in the entire universe). People do not know and see everything all the time. So their rationality is bounded.
The point of a human error investigation is to understand why people did what they did, not to judge them for what they did not do. Learning from failure is the ultimate goal of an investigation: failures represent opportunities for learning-opportunities that can fall by the wayside primarily because of the overconfidence people feel in post-event reconstructions of reality. The mystery in an investigation is not why people could have been so unmotivated or stupid not to pick up the things that you can decide were critical in hindsight. The mystery is to find out what was important to the people in the system at the time, and why.
Hindsight and other fallacies allow accident investigators and others to transform an uncertain and complex sequence of events into a simple, linear series of obvious options (not so obvious to operators who do not live in your after-the-fact reality). Just finding and highlighting people's mistakes explains nothing. Saying what people did not do does not explain why they did what they did. In order to push a well-defended system over the edge (or make it work safely), a large number of contributory factors are necessary and only jointly sufficient. A “root cause” is an illusion; what you call "root cause" is simply the place where you stopped looking any further.
High reliability organizations do not try to constantly close the gap between procedures and practice by exhorting people to stick to the rules. Instead, they continually invest in their understanding of the reasons for the gap. This is where they try to learn about ineffective guidance, about novel, adaptive strategies and where they do and do not work.
Coming up with meaningful post-event recommendations is a kind of experiment. Human error is systematically connected to features of the tasks and tools that people work with, and to features of the environment in which they carry out their work. Recommendations propose to change some of these features. Your recommendations essentially propose to re-tool or re-shape parts of the operational or organizational environment in the hope of altering the behavior that goes on within it. You propose to modify something and you implicitly predict it will have a certain effect on human behavior. The strength of your prediction hinges on the connection you have between the observed human errors and critical features of tasks, tools and environment.
Error, by any other name, or by any other human, is the start of your probe, not its conclusion.
Common “Causes” that Mean Nothing
All error investigations reach conclusions about the causes of the problem in order to create corrective action. This is a good thing, of course. The problem arises when investigators are satisfied with familiar sounding causes that really have no specific meaning.
The assumption (typically unstated and made without question) in error investigations is that errors of judgment or poor decisions are things that are "out there" and exist independent of looking at it. Errors are only definable in hindsight with greater knowledge of the situation that surrounded people at the time. To believe that these words that we use "poor judgement," and "errors" have an essential property, that they really are "out there" and all you need is a good method to get access to them and start counting them or identifying them in a report is not the direction that the science of human factors has been going in the 21st century. Jens Rasumssen notes that if you see people taking an action and it does not make sense to you, it does not mean that they are making an error or have poor judgment. It means that your perspective does not allow you to see theirs. Your perspective is contaminated by hindsight and a larger knowledge of the situation. If they knew then what you know now, they probably would not have done what they did. Judgment can only turn out poor in hindsight, but you don't have that knowledge at the time. There is a danger in using these labels ("poor judgment," "errors") as if they are things that are uncontestable (i,e., that everyone agrees upon). Labels like "human error" are psychological attributions that are ours, that we make about other people's behavior. It is not something that belongs with their behavior, it is something that we import with the way we talk about it. Human error is an attribution that we give to others' behavior after the fact. The point is that when you use that language, recognize that it is you putting that label on other people's performance. In hindsight, the actors may agree with your characterization, but they do not know that in foresight.
Situation awareness is a folk model. Folk models are highly useful because they can collapse complex, multidimensional problems into simple labels that everybody can relate to. But this is also where the risks lie, certainly when researchers pick up on a folk label and attempt to investigate and model it scientifically. One feature of folk models is that nonobservable constructs are endowed with the necessary causal power without much specification of the mechanism responsible for such causation. According to Kern (1998), for example, complacency can cause a loss of situation awareness. In other words, one folk problem causes another folk problem. Such assertions leave few people any wiser. Charles Billings warned against this danger in 1996: We must avoid this trap: deficient situation awareness doesn't "cause" anything. Faulty spatial perception, diverted attention, inability to acquire data in the time available, deficient decision-making, perhaps, but not a deficient abstraction! (p. 3) Situation awareness is too "neat" and "holistic" in the sense that it lacks such a level of detail and thus fails to account for a psychological mechanism that would connect features of the sequence of events to the outcome. The folk model, however, was coined precisely because practitioners (pilots) wanted something "neat" and "holistic."
Another folk concept is complacency. Why does people's vigilance decline over time, especially when confronted with repetitive stimuli?
The resistance of folk models against falsification is known as immunization. Folk models leave assertions about empirical reality underspecified, without a trace for others to follow or critique. Although falsifiability may at first seem like a self-defeating criterion for scientific progress, the opposite is true: The most falsifiable models are usually also the most informative ones, in the sense that they make stronger and more demonstrable claims about reality. In other words, falsifiability and informativeness are two sides of the same coin.
Other commonly listed causes without clear meanings: inattention to detail, not following procedures, and poor judgment.
Folk models like “complacency” or “loss of situational awareness” do not explain anything and labeling causes with them does not make you any safer; not being able to find a cause is profoundly distressing; it creates anxiety because it implies a loss of control.
Errors to Avoid in Investigations
Errors in investigating human error are easily made. All of them actually stem from the hindsight bias in one way or another. Try not to make these errors:
a) The counterfactual reasoning error. You will say what people could or should have done to avoid the mishap. ("If only they ..."). Saying what people did not do but could have done does not explain why they did what they did;
b) The data availability-observability error. You will highlight the data that was available in the world surrounding people and wonder how they could have possibly missed it. Pointing out the data that would or could have revealed the true nature of the situation does not explain people's interpretation of the situation at the time. For that you need to understand which data was observed or used and how and why.
c) The micro-matching error. You will try to match fragments of people's performance with all kinds of rules and procedures that you excavate from written documentation afterward. And of course, you will find gaps where people did not follow procedures. This mismatch, however, does not at all explain why they did what they did. And, for that matter, it is probably not even unique to the sequence of events you are investigating.
d) The cherry-picking error. You identify an over-arching condition in hindsight ("they were in a hurry"), based on the outcome, and trace back through the sequence of events to prove yourself right. This is a clear violation of rule of engagement 2: leave performance in the context that brought it forth. Don't lift disconnected fragments out to prove a point you can really only make in hindsight.
Conclusions (based on the writings of James Reason)
Organizations that address problems primarily through blaming individuals without trying to understand the organizational context of the error suffer these penalties:
- Failure to discover latent conditions
- Failure to identify error traps
- Management has its eye on the wrong ball
- A blame culture and a reporting culture cannot co-exist
Real world organizational situations and contingencies seem impervious to easy risk management solutions. Human error is stubborn and enduring, so it is often the case that highly sophisticated discrete solutions to human deficiencies are promptly succeeded by even more highly sophisticated manifestation of human error.
According to James Reason,
Errors are merely the downside of having a brain. Each error tendency - inattention, forgetfulness, strong habit intrusions, etc. - lies on the debit side of a mental balance sheet that stands very much in credit. Each recurrent error form is rooted in highly adaptive mental processes. For example, trial and error learning is an essential step in acquiring new skills or dealing with novel situations. It is not the error tendency that is the problem, but the relatively unforgiving and artificial circumstances (from an evolutionary perspective) in which errors are made. Errors are fundamentally useful and adaptive things. Without them, we could neither learn nor acquire the skills that are essential to safe and efficient work.
Learning from problems and failures is important because the lessons are universal and can be applied to any workplace. The key to understanding problems is to look beyond the simple solutions to the underlying causes. Many people, especially those in management, love the simple solution. The organizational problems are more difficult to deal with. It is easier to deal with accident or disaster causes such as ' the cause was the failure of a pressure seal, ' (Challenger Disaster) than investigate and deal with the failure of the management decision making process. Disasters show us the importance of the safety culture of the organization.
James Reason notes:
You cannot change the human condition, but you can change the conditions in which humans work. The problem with errors is not the psychological process that shape them, but the man-made and sometimes unforgiving workplaces that exist within complex systems.
Sidney Dekker’s “The Field Guide to Understanding Human Error.”
“Ten Questions About Human Error”
For Further Reading
Human Error, James Reason
Managing the Unexpected, Karl Weick and Kathleen Sutcliff