Is Fewer Incidents Always Good?

Hamed Silatani
|
May 8, 2025
Taggs:
Best Practices
Blog
Incident Management
IN THIS ARTICLE

Ready to make incident response your competitive advantage?

See how Uptime Labs builds provable, scalable incident response capability across your organisation.

The Zero-Incident Illusion

First, I’m going to go out on a limb and speculate that most technology folks, given the option, would prefer to experience fewer incidents: fewer outages, fewer degradations and fewer emergency 3am wake-up calls.

From this uncontroversial start, it’s just a small leap before you find yourself nodding along to the argument that the ideal future state is one the where the trajectory of ‘fewer' incidents meets its ultimate destination, ‘zero’ incidents. While most would acknowledge this to be impractical, many would consider it do be (at least) directionally desirable.

Given this ambitious goal, the next logical step is to consider monitoring progress along the journey. Such consideration demands measurement, perhaps in the form of an incident count or frequency measure. When plotted over time on a graph, such measures tell us whether our incident numbers are heading in the right direction, and how quickly.

Next, assuming that the intrinsic motivation of fewer incidents isn’t inspiring enough - one may choose to up the ante, by setting targets. Perhaps focus will be sharpened by a stretch-target of 50% fewer Sev1s by the end of Q2? And for the icing on the motivational cake, let’s link performance related bonuses to the target.

The recipe outlined above is guaranteed to do two things:

  1. Reduce the number of reported incidents.
  2. Destroy organisational resilience.

This particular road to hell, though paved in good intentions, is based on several questionable assumptions.

Questionable Assumption 1: Fewer incidents is better than more incidents

While less downtime is almost always better than more downtime, the same doesn’t necessarily hold for incidents. Fewer incidents isn't necessarily better than more incidents.

The definition of an incident differs between organisations. However, incidents are almost always categorised as such because they’re notable or meaningful in some way and represent a departure from an expected or desirable (explicit or implicit) condition. These notable and meaningful events are powerful learning opportunities; opportunities which tend to be exploited only if raised prominently.

Dr Richard Cook famously observed that ‘complex systems run in degraded mode’. In other words, it's normal for complex systems to behave in undesirable ways.

What’s more, the work that is being done every day, to deal with, or to prevent undesirable outcomes is frequently hidden from view (referencing David Wood’s Law of Fluency).

It’s hard to learn from something that’s invisible, so raising incidents when a learning opportunity arises is a very good idea.

Questionable Assumption 2: Zero Incidents is a Desirable Goal

Envisioning a future devoid of undesirable or unexpected events is appealing but unrealistic. Hidden within the premise of zero-incident attainability is the assumption that all possible eventualities within a complex system can be known, and ‘designed for’. Achieving zero incidents assumes complete foresight and design for all possible eventualities in complex systems. While system reliability can improve, resilience will always be tested by unforeseen surprises.

Plus, as systems gain capacity and efficiency, organisational demand doesn’t stay still. Rather it tends to expand to fill such capacity, with the effect of stretching system beyond expected, or designed for, operating boundaries (David Wood’s Law of Stretched Systems).

So, while the unexpected is possible, and while businesses are competitive, incidents will still be a thing.

Questionable Assumption 3: Counting incidents or measuring incident frequency is useful

The value of a measurement is primarily determined by its ability to inform valuable decisions.

While it's possible that an increase in incident frequency might be a signal that informs decision making, it's equally likely that it won’t be.

These metrics can be useful, especially when they prompt deeper reflection. However, they also risk triggering assumption 4, which can be actively harmful if left unchecked.

Questionable Assumption 4: Setting targets on incident measures makes it more likely that the intended objectives will be achieved

Goodhart's Law states:

'When a measure becomes a target, it ceases to be a good measure.'

Setting targets can introduce biases, favouring metrics that align with goals over those that don't. For example, aiming to reduce Sev1 incidents by 20% might influence how incidents are classified, rather than addressing their root causes.

This issue intensifies when external rewards, like bonuses or accolades, are tied to meeting these targets.

As statistician and management guru W.Edwards Deming is frequently quoted to have said:

'People with numerical targets and jobs dependent on meeting them will meet the targets, even if they have to destroy the enterprise to do it.'

More Incidents is More…

So while downtime is almost exclusively bad news, incidents (many of which don't involve downtime) are valuable signals of notable and meaningful events, offering rich opportunities for learning. These learning opportunities in turn carry the potential to improve organisational resilience.

Counting incidents is easy, and it's attractive to imagine that a graph showing fewer incidents over time is revealing a good-news story of improving resilience. Truthfully, however, this story likely to be merely wishful thinking.

Incentivising a reduction in reported incidents may decrease the reports but not the actual occurrences, leading to missed chances for valuable learning opportunities and organisational growth.

The goal is to learn from what actually happens rather than how we imagine or hope things might be.

Here are some strategies to try:

  • Celebrate an increase in reported incidents
  • Treat near misses with the same seriousness as actual outages
  • Pay attention to low severity incidents, not just the SEV1s
  • Consider whether the measures you're measuring are actually informing decision making
  • Celebrate especially high-quality incident reports & write-ups

And finally, if targets are really your thing, what might happen if you set a minimum target for reported incidents?

Hamed Silatani

Hamed is the co-founder and CEO of Uptime Labs. He has 20 years of experience in engineering leadership, reliability engineering and IT operations. Having spent the majority of his career at the sharp end of incident response in financial services, he's looking to help all companies master the unexpected.

Share this post

Ready to make incident response your competitive advantage?

— Chris Voss

See how Uptime Labs builds provable, scalable incident response capability across your financial services organisation.