PATIENT SAFETY NOW

PROVIDING A REFRESHING VIEW OF SAFETY

Decades of learning

Following on from my blog titled “an organisation with a memory’ it is worth us dipping in to things we have tried over the years.

There are a number of tools and techniques that are used in the safety-I approach.  These include, Heinrich’s triangle, the swiss cheese model, ‘5 whys’ and root cause analysis.

Heinrich’s triangle

Heinrich’s triangle – which states that likelihood of a fatality rises in line with the number of incidents – I am told has no basis in fact or research.  Heinrich thought of accident causation as a chain of events and that the chain could be broken in order to stop an accident from happening.  Heinrich studied accidents and considered that in 88% of all cases of accidents the workers were the cause – he called it man failure.  He believed that there was a hierarchy of accidents, a large number of minor accidents and a much small number of serious accidents and even less that were fatal.  This is called Heinrich’s triangle.  The triangle has a set of figures that represent this progression.  However, apparently he had no actual data to prove this and as has now been asserted, ‘made the figures up’ (Dekker and Conklin 2022).

Swiss Cheese Model

Professor James Reason’s swiss cheese has even been critiqued by Reason himself.  We believe that there are warning signs prior to incidents that we need to pay attention to but as Reason would say, they don’t really line up and flow through the holes of a system that also simply lines up for them to fall through.  It is far more complex than that.  Reason helped us to understand the difference between active failures or errors of the individual and latent failures that were contributory facts built into the system.  We identified with the latent failures which were related to decisions made sometime before an error, such as staffing levels, or resource allocation and crucially we understood that in order to prevent individual errors you should fix the systems.  He used the ‘swiss cheese model’ to try to explain the relationship between latent failures and accidents.  The swiss cheese model is used to explain the role of defences and suggests that sometimes the defences we build to prevent errors from impacting on our care don’t work.  That the holes (in the cheese) that are constantly there but usually made safe by these defences occasionally open up when the defences fail, which leads to an accident happening. 

The problem with the swiss cheese model is that it will never provide a detailed analysis of why something has happened, it doesn’t really explain the nature of the holes in the cheese and their inter-relationships and it does not take into account the complexity of relationships and the dynamic nature of the system.  Also, it does not explain what the holes are, where they come from, how they arise, how they change over time and how they get lined up to produce an incident.  

The swiss cheese model has been used since it’s conception in safety presentations to demonstrate the impact of decisions made upstream leading to incidents at the frontline.  However, the latent factors or conditions may never be clearly identified as many latent factors could lead to incidents in the future.  Some latent factors may take many years before an incident happens but many variables will have happened along the way.  The model has had its doubters because while the decisions made, sometimes years before, can lead to incidents happening today, incidents in complex adaptive systems occur are as a result of a multitude of factors. No one model will apply to all of them.  

Professor James Reason himself, wrote about the fact that it was a great model to raise awareness of the systems approach to safety but should not be relied upon to truly explain why incidents happen.  The swiss cheese model is a great tool for communicating the issues associated with safety.  However, like a lot of safety theories, they are great in terms of their theory but difficult to apply in the actual day to day world of healthcare.  

The ‘5 whys’ technique is one of the most widely taught tools within the root-cause analysis (RCA) methodology in healthcare.  The origin of 5 whys is found in the Toyota Production System.  Toyota’s approach is to ask why five times whenever they find a problem.  They state that by repeating why five times, the nature of the problem as well as its solution becomes clear.  It asserts that asking ‘why’ five times allows people to find a single root cause that might not have been obvious at the outset.

Five whys

The problem with the ‘5 whys’ ‘is that it oversimplifies the process of problem exploration.  

Many argue that it should not be used at all.  Like many safety tools, though, its reputation is not the result of any evidence that it is effective. It is used because it is simple.  When using the 5 whys, depending upon where you start, any investigator could come up with five completely different questions and therefore five completely different answers.  It forces people down a single pathway picked at random for any given problem and seeks a single root cause and assumes that the fifth ‘why’ on the causal pathway is the root cause and the place to aim the solution.  There is no logic to this conclusion.  Incidents rarely if ever have a single root cause.  Sometimes you may never even know what the root cause or causes are.  

In an article in the BMJ ‘the problem with ‘5 whys’ (2016) provides an example that is used to explain the technique.

Problem: The Washington Monument is deteriorating

  1. Why? Harsh chemicals are being used to clean the monument
  2. Why? The monument is covered in pigeon droppings
  3. Why? Pigeons are attracted by the large number of spiders at the monument
  4. Why? Spiders are attracted by the large number of midges at the monument
  5. Why? Midges are attracted by the fact that the monument is first to be lit at night

Solution: Turn on the lights one hour later.

However, according to others, this example found that many of the details were incorrect. The monument was actually the Lincoln Memorial, and it was not being damaged by the use of harsh chemicals. The real problem was water. Pigeons were not an issue at all, and while there were spiders at the memorial, they were not a major problem. Instead, most of the cleaning was done because swarms of midges were dazzled by the lights and flew at high speed into the walls of the memorial, leaving it splattered with bits of the insects and their eggs. The answers are also incomplete in a number of more important ways. For instance, it only addresses one potential source of deterioration: cleaning water.  The first ‘why’ could just as easily have asked about rain or acid rain, rising damp, erosion from windborne particles or damage from freeze-thaw cycles. 

If the goal had been to prevent harm to future monuments, the first ‘why’ could have focused on the use of marble as a building material and the choice of building site.  The solution they chose of changing the timing of the lights caused upset with tourists and because they complained the lights went back to the previous timing. 

Therefore, researchers suggest that this demonstrates that the 5 whys is too simplistic a tool for the complexity of the real world.  Systems thinking requires both depth and breadth of analysis.

Root cause analysis

Root cause analysis is a technique used to find out why things happened and to identify the root cause of the problem and fix it.  Assigning causes to an incident makes us happy because it means we have an explanation, in particular an explanation we can share with those that scrutinise or who are anxious for the answers but, there are many instances when the cause may never be found.  However, very few people can accept that in many instances things ‘just happen’ and when a cause has not been found it calls into doubt the credibility of the investigator or investigation.

In his seminal paper How Complex Systems Fail (Cook, 1998), Richard Cook put it this way:

Catastrophe requires multiple failures—single point failures are not enough. The array of defences work. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners. 

There is no root cause. The problem with this term isn’t just that it’s one root or that the word root is misleading.  Trying to find causes to explain an incident might limit what you will find and learn.  In the last two decades healthcare has tried to adopt models used in other high-risk industries, especially that of aviation, in order to explain why incidents happen.  The method most used in healthcare is that of root cause analysis.  However, root cause analysis is built on this idea that incidents can be fully understood. As I mentioned earlier, they can’t.  What is tricky to get our heads around is the fact that thousands if not millions of investigations have been carried out using root cause analysis over the last twenty years.  Yet sadly the approach has not truly helped us understand safety any more than we already knew in 2000.

The approach we should be taking is to separate out different causes and multiple contributing factors.  We will then be able to see that the things that led to an incident are either always or transiently present, it is just this time they combined into a perfect storm of normal things that went wrong at the same time.

As humans we like to find neat answers.  There is a belief that when something goes wrong there must be ‘a’ cause and we assume we will find the preceding cause.  Everyone likes a cause, even better if it is a single cause, this mean that the investigators may latch on to a superficial cause to the exclusion of more fundamental causes.  For example, if they found that people didn’t follow a policy or communicate well or didn’t perform a task well then, the recommendation is to ‘tell people to follow the policy’, provide ‘communication training’ and ‘retrain staff’ in relation to tasks.  The search for information is stopped when an acceptable explanation has been found even though this may be incomplete or incorrect.  

Also, the term root cause is almost always used in the context of negative outcomes or failures, and not in situations where an outcome is deemed a success. We don’t do an analysis to find the root cause of success.  Successful outcomes in complex adaptive systems come from many factors that come together in a positive way. 

It takes enormous skill to conduct an investigation well.  Investigators need to help people remember what happened, what they did or what others did.  They need to carry out the investigation in an unbiased way; e.g., unbiased by the outcome of the incident, by hindsight and their own confirmative bias which skews their ability to see the truth.    They need to try to see beyond these, get beneath the surface of what can be seen and learn from the data that isn’t there, and go beyond the lessons that are superficial. 

Investigators need to learn and use an accident causality model.  The underlying assumption of these models is that there are common patterns in incidents and that they are not simply random events.  The problem with these models is that they perpetuate the myth that there is a neat chain of events or failures which directly cause or lead to the next one on the chain. 

Most of the time investigations find shallow contributory factors rather than deep root causes and while addressing these contributory factors may help it will not prevent things from going wrong in the future.  A report with a list of recommendations, the more the better, whether implementable or not enables people to shut down any further need for more study.  So, the search for a root cause is a fallacy, another myth and this search is preventing us from working on what matters and we end up by working on something that is falsely labelled ‘the cause’.  Also, the changes we put in place, however good or bad they are, erode over time – we are very good at focusing intensely on something for a short while but we all take our eyes off the ball and resort to our original habits and behaviours unless we make fundamental design changes to the system which makes it hard for people to revert to old habits.

Interestingly, the things we assign causes to are things that are going on all of the time and sometimes they go right and sometimes they go wrong.  In fact, there are very few things that can be deemed a preventable root cause, and very few things that can be addressed so that things will never happen again in the future.  This is because systems are complex and adapt all of the time, outcomes emerge as a result of a complex network of contributory interactions and decisions and not as a result of a single causal factor or two.  Incidents are disordered and there is no such thing as find, analyse and fix.  It is important to note also that given the adaptive nature of complex systems, the system after an incident is not the same as the system before it, many things will have changed, not only as a result of the outcome but as a result of the passing of time.  So, when it comes to incident investigations healthcare is challenging to understand (let alone measure, optimise and improve) because the investigator has to truly understand the variabilities and dynamics of the system and the often vague or shifting performance.  There is always going to be a gap between how we think incidents happen and how they actually happen. 

In a complex system you cannot assume that because two events occur together or one after the other that there is a correlation or causal relationship between those two events.  By claiming one event must have caused the other there is a danger that a wrong conclusion could be made or even another unlooked-for event may be missed.  You cannot assume that there is only one explanation for the observation that is being made when in fact there will be undoubtedly many different explanations.  In general work evolves over time, and prescribed work proves too inflexible or too fragile to cope with real conditions.  Over the longer term, these adaptations may result in a drift from prescribed policy, procedure, standard or guideline, assuming any such prescription is in place.

Causality gets confused with correlation.  For example, the correlation between solutions and causes.  If incident reports reduce there is the danger of an assumption that the solutions that were put in place as a result of the incident investigation resulted in increased safety whereas there could and probably is a multiple number of variables that need to be considered.  As Hollnagel says we associate positive and negative attributes depending upon the outcome.  If the outcome was bad then the cause must have been bad.  If the outcome was good then the cause must have been good.  This y makes us feel as if there is an order in the system.

Biases

When we report incidents or investigate them we have to consider the fact that our biases can skew our thinking, make us see things that are not there or not see things that are and judge things incorrectly.  In short, work-as-judged is affected by how we think about outcome and baseline frequency, the quality of our judgement, our understanding of others’ mental states, information presentation, individual characteristics, and penalties and rewards.  Will never get rid of our biases, but can reduce them, some training can help, but it is mainly feedback that helps reduce bias.  

I have a blog dedicated to biases – so here are some of the biases that need to be considered in every aspect of safety-I and safety-II:

  • Outcome bias – to judge a decision based on the eventual outcome instead of the quality of the decision at the time it was made.  For example, ten times the dose of vitamin C for an adult is highly unlikely to lead to harm whereas ten times the dose of morphine to a neonate is highly likely to lead to serious harm.  These will be dealt with very differently when in fact they are the same type of incident.
  • Neglect of probability – to disregard probability when making a decision under uncertainty – there is a need to stand back and consider the likelihood or possibility of the risks associated with the decision coming to fruition.
  • Omission bias – to judge harmful actions as worse, or less moral, than equally harmful omissions. For example, if you do something wrong then you are judged more harshly than if you forgot to do something.  It is almost as if the wrongful act is seen as purposeful or a choice when both are unlikely to be intentional.
  • Naïve realism – to think we are objective.  None of us are objective, we all interpret the world differently even based on the same knowledge and experience.  You only have to watch a group of landscape painters painting the exact same view to see the difference in interpretation.   However, this also relates to our view that we can be impartial.  Again, we cannot we all come with our own take on what we see and whether it fits with our own beliefs and attitudes.  All of us are subjective.
  • Overconfidence effect – to be overconfident in the accuracy of our judgements and performance – the belief that we are better than others because of our experience or knowledge or status.
  • Bandwagon effect –to believe things because many others do. Sometimes this is linked to the ‘wisdom of the crowd’. 
  • Confirmation bias – to search for, interpret, focus on, and remember information in a way that confirms our preconceptions.  For example, if we have made our minds up that in a medication safety incident a calculation error has occurred we will seek only data that confirms this and ignore any other data that might point to a different ‘cause’.
  • Hindsight bias – to believe that events were predictable at the time that they happened, to belief that they would have acted differently if it had been them.  We tend to ignore the fact that we now know far more after the incident than the people who were involved at the time.
  • Continued influence effect – to believe previously learned misinformation even after it has been corrected. This is where we stay convinced about information we first heard despite the fact that new information has been found that contradicts this first view.
  • Illusory truth effect – to believe that a statement is true if it has been stated multiple times – politicians use this technique all of the time – so even if a statement is false, if you say it enough people will believe it – leading to false memories.
  • Framing effect – to draw different conclusions from the same information, depending on how that information is communicated or presented or ‘framed’ positively or negatively in particular.  For example, where information or statistics can be used in a variety of different ways and shown in a variety of different ways to influence people’s thinking or action.
  • Group attribution error –   to make assumptions about people based on group membership.  This is where we might say ‘all surgeons are…’ or ‘all managers are’.
  • Defensive attribution hypothesis –to be biased against people who are different to us when evaluating an event. 
  • Just world hypothesis – to assume that a person’s actions inherently bring morally fair consequences to that person. As in, people get what they deserve.