
Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
I’ve been the poor soul on support overnight, trying to make sense of the world while being constantly distracted by managers desperate for information.
I have also been the manager who felt frustrated and was caught between a rock and a hard place, managing pressure from the C-level and wanting to give space to responders.
Major incidents often feel like a lose-lose game for mid-level managers, depending on the organisation’s setup for incident management.
Today, my story takes a different turn. Our business revolves around creating incidents and witnessing their resolution. This is how we earn our bread, so for once, as both a business leader and engineer, I want a lot more incidents:)
Incident response requires specific skills that differ from those we use daily as engineers. Once these skills are well understood and regularly practised, incident response becomes much less stressful and an amazing opportunity for learning and team bonding!
Though 'effective' can be an ambiguous term, in this post, I’ll highlight two skills we pay close attention to during incident drills. These are skills that participants can refine and master through repeated practice.
- Communication
- Ability to progress the incident
Communication
Communication in incident response means more than just sharing updates - it’s about tailoring information to the right people at the right time. It includes keeping senior IT leadership informed (and involving them in decision-making when necessary), maintaining a shared understanding with fellow responders, and managing expectations for business stakeholders.
To quote a recent blog post:
'We frequently ask: “What are you actually noticing when you’re noticing great incident response?”
One of the most frequent answers we hear is, ‘Good Communication’. Everyone agrees on the value of good (or excellent, even!) communication, and most people have a keen sense of when they’re witnessing it. Ask people to describe what it is they’re witnessing, however, and things tend to become rather more tenuous.'
The truth is, each audience comes to incident communication with its own priorities — and uses that communication to achieve a different goal.
Business stakeholders
Business stakeholder communication should be concise, clear and calm.
- Inform them that there is a problem with a super clear scope of impact on key business services (knowing what is working is as important as knowing what is broken)
- Assure them that it is being handled competently (summarising the problem, the action and the time of the next update).
- Give them enough information to manage customers
- Keep them away from incident managers and responding teams (do not ring us!).
Senior IT management
The primary objective is to keep them as far away as possible from incident responders and inform them that there is a problem with a clear scope of impact (nothing is more humiliating than someone from business telling them that there is a problem).
Here’s an example of communication with senior IT management and business stakeholders going wrong:
Picture this: a financial trading firm. An alert is raised about delays in 'trade execution confirmations. At first glance, there’s no hard evidence of regulatory consequences - just a potential risk that needs investigating.
As the incident lead begins gathering information, a senior department head - copied into the internal distribution list and sitting near the CEO - mentions the situation casually. The CEO, alarmed by what they’ve just heard, immediately calls the CTO. The CTO, blindsided and hearing about the issue from above rather than from their own incident team, feels undermined.
From that moment, the incident lead’s focus is pulled toward managing senior stakeholder anxiety instead of addressing the root problem. All of this unfolds within the first five minutes, before the initial status update could even be sent.
Though miscommunication between different parties during incidents is common, it can be mitigated by:
- Educating senior stakeholders on communication protocol is as important as educating incident responders.
- Providing key technical facts known about the incident
- Leaving no room for rumours or side-channel information
- Giving assurance that the risk is understood and being handled
- Reassuring them that lessons will be learned.
Ultimately, the comms should provide answers rather than raising more questions.
Fellow responders
Though easier said than done, there are various ways to keep communication with fellow responders calm and constructive:
- Maintaining a common ground of understanding
- Providing clear context and facts when escalating and engaging other responders
- Frequently updating on thought processes and new information (i.e. surfacing changes to mental models).
- Clearly distinguishing facts from opinions. Julian Wiegmann has a useful framework for differentiating between facts, assumptions and opinions:
‘A fact is by definition “a thing that is indisputably the case”
An assumption “a thing that is accepted as true or certain to happen without proof”.
A believe “is to think (!not know!) that you know something is true, correct, or real.”'
One of the easiest ways for communication to go wrong is to mispresent and mis-contextualise the situation to fellow responders. It’s easy - if you’ve done it before, don’t blame yourself.
Aside from mastering communication within the incident response chain, there’s another skill needed for effective incident response:
Ability to progress the incident
It comes down to feeling comfortable with ambiguity and engaging in an iterative exercise of evolving a working theory of what is going on.
We observed that, regardless of their level of domain knowledge, people who start by building a factual picture of the incident’s impact - sizing up the issue - are more effective. By leveraging knowledge available to them, whether their own or gathered from others, they can better understand the flow and build a working theory.
The working theory is the foundation for good communication and teamwork. Once formed, the responding team will naturally try to validate the working theory’s assumptions and evolve it until the service is restored.
In most careers, it takes years of exposure to different types of incidents before these abilities become second nature, embedded in your muscle memory.
The trouble is, waiting for those experiences to happen naturally isn’t always practical. And while incident drills can be a great shortcut, they’re often hard to arrange and, all too frequently, fail to capture the urgency and unpredictability of a real outage.
That’s why we run immersive, gamified incident simulations that put you right in the middle of the action. In just ten minutes, you can experience the pressure, pace, and decision-making challenges of a genuine incident - but in a safe, engaging and even fun environment.





