Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
Where do I start?
Navigating the high seas of being an on-call engineer and managing IT incidents is no small feat.
This journey, spanning over a decade, has taken me from the front lines of support and triage to the strategic echelons of L3 escalation and incident command. The initial buzz of an incident call invariably sets off a cascade of emotions. These emotions may include - simultaneously:
- The anticipation of bad news
- The dread of disrupted plans, both personal and professional
- And the apprehension of the inevitable stress that follows.
And then the critical question, “Where do I start?” emerges. The question may be coupled with concerns about expectations, feedback mechanisms and the daunting possibility of exacerbating the situation.
Reflecting on these experiences, it’s clear that the intensity of these emotions has ebbed and flowed with experience. Yet, they never fully dissipate, serving as faithful companions through every alert and alarm.
It’s like walking blindfolded on an unfamiliar road
This shared emotional journey is not unique to me. In a decade of engaging with fellow practitioners, I’ve come to realise that this emotional rollercoaster is a common thread among those in our field.
For some, the chaos and frenetic pace of incident response are invigorating. Yet, for many, it’s a source of significant stress. These include tangible impacts on mental and physical well-being.
The root of this stress lies in the inherent uncertainty and ambiguity of incidents. Our brains crave clarity, causality and a clear path forward. Without these, navigating incidents can feel akin to walking blindfolded on an unfamiliar road.
The stress is an industry problem
The crux of the matter lies not in the inevitability of these challenges but in our preparedness to face them. The IT industry has not provided structured avenues for acquiring the skills needed to navigate this uncertainty. Thus, many learn through the crucible of experience, often at the expense of customers, employers and personal well-being.
Example: The journey of becoming an incident responder
Observation
For most people, incident response begins passively - shadowing senior engineers, attending war rooms, etc. Sometimes a mentor walks you through what’s happening; other times, you're thrown into the chaos and expected to pick it up.
The daylight phase
The next phase often starts with handling daytime support issues, where stakes are lower and experienced help is just a Slack message away. You learn the language and flow of a minor incident.
The night shift
Then comes your first out-of-hours rotation. You’re technically not alone (you’ll have a colleague to escalate to), but it feels like you are. There’s a moment of hesitation: Do I wake them? What if it’s nothing? What if I look stupid? That moment is a rite of passage and (unfortunately, often) a mental wall.
The long curve of building confidence
Building incident judgment takes time. Months, sometimes years. It’s not just about how many incidents you’ve handled, but how many types of failure you’ve seen.
If incident response isn’t your primary role - e.g. if you’re on a rota - it’s easy to go dark between shifts. But the longer you’re away, the rust creeps in. Incident response is a practice, not a title. And sometimes, that first real alert isn’t a gentle ramp-up. It’s a baptism of fire.
Finding a practical way forwards
However, there is a silver lining. Observations indicate that those with extensive experience in handling high-severity incidents develop a certain finesse and confidence in their approach. This is not merely a function of time but of exposure to a variety of critical situations.
The takeaway is clear: the skills to manage the uncertainty of incidents can be learned and honed. While it’s impossible to encapsulate the breadth of required skills in a single post, I can share a couple of insights gleaned from the best in the business:
1. Embrace the unknown
Recognise that it’s perfectly normal to feel disoriented at the outset of an incident. You’re not alone in this feeling; it’s a universal starting point for incident responders.In high-pressure moments, the mental barrier isn’t always the incident itself; it’s the quiet dread of “I don’t understand this system.”That hesitation can delay escalation, even when the impact is clear. But a systematic approach can cut through the fog: go from panic to uncertainty, then toward forming a working theory.
And if that theory’s wrong? That’s fine. Form another. What matters is momentum. Understanding impact helps unlock that shift - and asking for help early isn’t a weakness. You can always downgrade later, but you can’t unburn trust.
As engineering has become more specialised, that fear of not knowing has only grown. It’s why resilience today isn’t just about systems - it’s about psychology.
2. Adopt an iterative approach
Incident response is not a linear process but an iterative one, involving the development and refinement of working theories.
These theories are continuously tested against new information obtained from various sources, including colleagues, monitoring systems, and change logs and through active interventions in the system’s state.
But how can you integrate these insights into a more effective incident response practice?
Writing runbooks sounds like a solid starting point. That is, until you’re mid-incident, the clock is ticking, and the playbook just doesn't match what’s happening. Every incident is unique, shaped by timing, dependencies and the unknowns that emerge in real time. That’s why effective response isn’t just about memorising scripts; it’s about continuously interpreting new signals as they arrive. Practice helps - not just in theory, but through live drills, war games and structured chaos that forces teams to adapt under pressure.
A useful framework can guide you, but what really matters is building a configurable, holistic and constantly evolving understanding of your distributed systems.
Summary
The industry demands peak performance under extreme stress, often without adequate training or even a clear outline of expected competencies. My own experiences, marked by stress-induced physical discomfort, underline the urgency of addressing this gap.
For those looking to refine these skills in a supportive environment, Uptime Labs is here to assist. Our foundation stems from a recognition of the unfair expectations placed on incident responders. Each drill we host is our response to this challenge, aiming to ensure that no incident responder feels ill-equipped or unsupported in the face of chaos.





