Too Soon or Too Late: The Incident Escalation Dilemma

Hamed Silatani
|
May 22, 2025
Taggs:
Best Practices
Incident Management
IN THIS ARTICLE

Ready to make incident response your competitive advantage?

See how Uptime Labs builds provable, scalable incident response capability across your organisation.

The short answer? Finding the exact right time to escalate or involve more people is an unsolvable problem. It’s always going to be too late, too early, or deemed completely unnecessary.

For folks working in offshore teams, escalation (especially out-of-hours) feels like Russian Roulette - you don’t know what kind of reaction you’ll get from a person you’ve never met physically (i.e someone you have a weak relationship with).

Anyone with operational responsibilities - including leaders - has probably either had a conversation like the ones below or heard something similar:

A: “If you’re not sure, just escalate. You can’t sit on a major incident.”
B: “I didn’t know it was a major incident - no customer issues were reported.”
A: “Because the markets weren’t open yet!”

OR

A: “Why are there so many people on the call and only a few contributing? We're wasting productive time.”
B: “We thought it was a major incident, so we included everyone who could possibly help.”

I confess this has been my hardest post so far. To begin with, I couldn’t find many studies on this specific topic, apart from the great work of Michael Wettic (2024), Dr. Laura Maguire (2020), and a handful of blog posts.

As I dug deeper and asked more questions, I started to see that escalation - and recruiting people to help during an incident - is messy. The terms are overloaded, and there aren’t clear expectations.

In all the cases I looked into, frustration over escalation was caused by people using the same terms with completely different understandings. Even the same person may interpret the same term differently depending on the situation. A few examples:

  • Does escalation mean the same thing as asking for help?
  • “If in doubt, escalate” - but what is doubt? Who can be 100% sure of anything during an incident?
  • “Do the basics before escalating/asking for help” - what activities or questions are enough to cover the basics?
  • “If it’s serious, call people in” - which people? What defines serious?

Table listing the incident challenges, assumptions and impact.

Much is left to the judgment of the incident handler in the early stages of an incident, when the least is understood. I’ve come to the conclusion:

Whenever someone escalates/asks for help, it made sense to them at that point - so escalation/asking for help always happens at the right time, because it made sense to the incident resolver to escalate/ask for help then.

This shifts the question from “When is the right time to escalate/ask for help?” to “How does an incident resolver decide to recruit others into the incident?” Once we understand that, the next question becomes: “What can I, as a leader, do to ensure that it makes sense for incident resolvers to enlist help at a point that aligns more closely with the organisation’s interests?”

“How does an incident resolver make the decision to recruit others in the incident?”

Michael Wettic identifies a number of goals an incident resolver may have when recruiting others into incident response:

  • Understanding what’s going on (Diagnosis), e.g. “I’m stuck and can’t make sense of the issue.”
  • Repairing the broken parts, e.g. “We need the DB team to do XYZ.”
  • Recruit a second pair of eyes to reduce the risk of mistakes, cross-check decisions or information.
  • Acquiring new information, e.g. “How does Service A work? What are its dependencies?”
  • Seeking approval, e.g. “Is it okay if I bypass the change protocol?”
  • Lack of capacity, e.g. “I can’t both investigate and keep the business informed,” or “There are 50 instances to check manually - can we divide and conquer?”

So what triggers someone to call for help? There are various motivators:

  • Perceived impact, e.g. “This is going to get very ugly, very soon,” or “This is Sev 1” (based on quick judgment or a priority matrix).
  • Time-based trigger, e.g. “If I can’t figure this out in 15 minutes, I escalate.”
  • Confidence in ability to diagnose or repair, e.g. “I have no idea where to start,” or “I don’t have access to the DB.”
  • Organisational rules/team norms, e.g. “If it’s Sev 2 or higher, escalate,” or “If it’s not fixed in 30 mins, escalate.”

There are also psychological and cultural factors that incident resolvers are likely to consider. Speaking from my own perspective:

  • Senior role impact - If I escalate to someone more senior, I worry about how they perceive my ability - especially if they have influence over my career.
  • Trust - If I know and trust someone, it’s much easier to call for help.
  • Self-confidence - My self-confidence and sense of security (or insecurity) affect my decision - have I done enough to not look incompetent?
  • Cultural background - If I come from a culture with a strong hierarchy, or where asking for help is seen as a weakness, it’s much harder to escalate. I can recall many examples early in my career where I was nervous to ask for help because I thought I was paid to ‘know my stuff.’
  • Organisational culture - How others are treated (or how I’ve been treated) in the past for calling for help influences whether I do it again.

There are other important aspects of escalation - like the organisational structure and help-recruitment mechanisms. For the purpose of this blog, I’ve kept them out of scope (these factors become more relevant once the decision to recruit help has already been made).

“What can I, as a leader, do to help incident resolvers escalate or ask for help at a time more aligned with organisational interests?”

Earlier this month, I asked folks in the Resilience in Software community which practices they’ve seen work best. At the time, I couldn’t explain why some practices worked — but now, with a better understanding of how incident resolvers decide to escalate or ask for help, I see why these practices are effective:

Educate and practice

More specifically, there are three strategies for encouraging education and incident practice: creating a shared understanding of key concepts, teaching communication skills and practising escalation.

1. Create a shared understanding of key concepts

Definitions can be specific to your organisation, but it’s crucial that everyone shares the same understanding. When people understand the goals behind calling for help, it becomes easier to decide when to do it - and to explain why.

For example:

  • Escalation is used to raise awareness and generate organisational focus.
    → Business impact is the main criterion
  • Asking for help can be for:
    • Diagnosis - triggered by time-boxing
    • Repair - as soon as you realise you need help
    • Second pair of eyes - same
    • Acquiring new information - same
    • Approval - same
    • Lack of capacity - same

2. Teach communication skills

  • Helping others help you means:
    • Sharing rich context
    • Asking a well-formed question or problem statement
    • Showing what diagnostic steps you’ve already taken

These skills must be taught deliberately. It's far easier to recruit help when everyone knows what information is expected.
More on communication during incidents →

3. Practice escalation

  • Run drills where incident resolvers must make escalation decisions and provide context to onboard someone new.

Provide simple, clear guidance (before the incident)

At 3am, no one is going to read a process document or playbook. But simple rules are easy to remember and follow:

  • “If within 20 minutes you don’t know how to fix the issue, get help.”
  • “As soon as you suspect multiple users are (will be) affected — escalate.”

Foster a generative culture

This is the most important step. A psychologically safe culture - where people share fears, admit mistakes, and continuously improve - starts with leadership.
More on building a resilient culture →

1. Invest in relationships

Give people (including senior staff) opportunities to meet in person and build trust. It’s not always possible, but even one personal interaction can lower the barrier to asking for help.

2. Make it safe to ask

Leaders must actively communicate that asking for help is a strength, not a weakness. They can back it up by:

  • Modelling the behaviour themselves
  • Calling out when it’s missing
  • Celebrating help-seeking as a core value

3. Create a learning environment

Stop asking: “Why did you escalate too early or too late?”
Start asking: “Why did escalation (or lack of it) make sense at the time?”

That is the question that will generate insight and lead to better escalation practices.

Hamed Silatani

Hamed is the co-founder and CEO of Uptime Labs. He has 20 years of experience in engineering leadership, reliability engineering and IT operations. Having spent the majority of his career at the sharp end of incident response in financial services, he's looking to help all companies master the unexpected.

Share this post

Ready to make incident response your competitive advantage?

— Chris Voss

See how Uptime Labs builds provable, scalable incident response capability across your financial services organisation.