
Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
My on-call experience started by accident in the mid-2000s. It was 5pm on the Friday at the end of my first week of employment as a software engineer at a financial services company in London. I was in the process of closing down my IDE for the weekend when my boss sauntered up to my desk, grinning awkwardly. He was carrying an IBM Thinkpad, a Blackberry, and the unmistakable burden of a man who needed to call in a favour.
“Oh, hey Stu, so the guy who was meant to be on-call this weekend has just gone home with food poisoning. I don’t suppose you’d…?”
“Erm, well yeah sure I can do that I guess….oh wait, I’ve only just moved to London, and my flat doesn’t have broadband yet. That…and I’m not sure how much use I’d be, given I’ve only got 5 days' experience on our systems…”
“Oh, not to worry, you’ll be fine! This weekend looks like it’s going to be quiet, so you probably won’t be called, and about that broadband….”
He shuffled the laptop and phone onto my desk while fumbling around in his back pocket, from which he conjured a 3G network card.
And with that, I was on-call.
“It’s going to be quiet, so you probably won’t be called”
My boss was correct about the weekend; it was quiet. Quiet until 02:58 on Saturday morning, when the unfamiliar ringtone of the BlackBerry woke me in a panic. The alien voice of an as-yet-unmet colleague apologetically informed me that there were some unresolved issues following scheduled Friday night downtime, and they needed help to bring systems back online.
Though I was too preoccupied to realise at the time, the distress I felt in that moment emerged from deep within the same ancient, reptilian part of my brain that kicked into action thousands of years ago within the skulls of early Homo sapien folk, as they fled for their lives across the Savanah, with sabre-tooth tigers in hot pursuit. The alarm of my BlackBerry’s ring tone was met by my brain's alarm system, the amygdala. Detecting the situation as threatening, my amygdala signalled the hypothalamus, activating my sympathetic nervous system, causing my adrenal glands to release adrenaline (or epinephrine if you’re American). This neurological cascade, virtually instantly triggered my physiological symptoms: increased heart rate and blood pressure, dilated pupils, sharpened senses and slowed digestion, accompanied by nausea, without so much as a “meow” from a sabre-toothed tiger.
This excitement in the reptilian part of my brain also had the effect of dampening activity within the modern, ‘executive’ part of my brain - the prefrontal cortex. The prefrontal cortex is used for rational decision making, analysing, weighing up pros and cons, tradeoffs etc: exactly the kind of functions that come in handy during production IT incidents.
In the following minutes and hours, a neurological chain reaction, governed by the hypothalamus, pituitary, and adrenal glands, resulted in the release of cortisol, a hormone that modulates an increase in blood sugar, providing a feeling of wired wakefulness, despite the ungodly hour. The cortisol also further reduced activity in my prefrontal cortex and reduced my working memory, just as I was required to be on top of my incident diagnosis and resolution game.

Monday
From the 02:58 phone call on Saturday, I was working (without successful resolution) on the production incident for the next 48 hours. As a recently hired employee, I was unprepared in several ways: -
- I had general technical expertise but had barely begun to explore the specifics of my employer’s domain.
- I knew my immediate team mates but had yet to meet most colleagues.
- Being new to the company, I could only guess at the severity or criticality of observed problems.
- I only had a passing familiarity with the conventions of the incident response process
Importantly, I also had limited awareness of the culture of the organisation, and how long weekend outages might be viewed by the people who matter.
As the weekend progressed, I managed to pull in the help of other colleagues, and by the early hours of Monday morning, we’d managed to stabilise the production environment in a degraded, but acceptable state.
Throughout the weekend, my performance became poorer and poorer due to sleep deprivation. My attention and focus dropped dramatically, and my prefrontal cortex lost its ability to regulate my emotions, and I became… let's say grumpy!
On Sunday evening, paranoia crept in as I became convinced that I was going to be fired on Monday morning. My limited experience did nothing to deprive me of the feeling that this was a strong possibility.
As I arrived in the office on Monday, I was met by my boss and fearing the worst, I apologised for not being able to help more effectively over the weekend. His response astonished me. He thanked me profusely for my efforts, told me I should be proud of my performance, shook my hand, and told me to take a day or two off to rest.
That response contributed a long way to my feeling comfortable about being on call in the future.
The psychological reality of incident response
Of course, this was around 20 years ago, and we’ve learned a lot about ‘on call hygiene’ and effective incident response since then. Regardless of how far we’ve come, it’s worth reflecting on the fact that incidents can be stressful, especially for those lacking experience or for those with short tenures within an organisation. Stress is not a universally bad thing. A little stress can (in some people) enhance focus, but incidents typically elicit enough stress simply by being time-critical, high-pressure, uncertain and ambiguous. Therefore, much of our energy is wisely directed towards modulating stress downwards.
Uptime Labs exists to help organisations and individuals to manage the challenge of incident response, and much of that exists in psychological form in addition to the more visible technological and social challenges. It’s important to understand that everyone responds differently to pressure and stress. There is no single set of signals to observe, and no single set of interventions to address it. People are different.
We’ll talk in more depth in the future about the psychology and neuroscience of incident response, but in the meantime, here are a few things you might like to try as you endeavour to build confident incident responders.
Nurture a culture of blamelessness and psychological safety
While my boss’s eagerness to put me on-call as an unprepared newbie might have been questionable, his response on Monday was a great practical introduction to the blameless culture that I became grateful for. The goal here is not to avoid uncovering problems or the source of problems; rather, the goal is to nurture a culture of safety where folks can discuss things openly without fear of reprisal.
Tune alerting with care
A high noise-to-signal ratio in alerting can lead to false alarms and frequent cortisol spikes, which can lead to chronic stress issues or to a mistrust in alerting, potentially resulting in missed issues (the ‘boy who cried wolf’ problem). Tuning alerting is extremely context-specific, but consider the psychological effects while optimising your approach.
Reduce on on-call rotation length
Consider your on-call rotation length. Finding a sweet spot can be tricky, but being on call requires being in a state of heightened anticipation, which can become chronically stressful. Every organisation will be different, depending on its size, skills, and the regularity of call-outs, but shorter on-call periods are generally better than longer ones.
Ensure you have handover conventions for long incidents
Humans can typically only maintain sustained attention for between 20-50 minutes. Incidents are ‘full-on’ experiences that require a high level of focus that cannot be maintained indefinitely. Ensure that responders are not working continuously over extended periods without a handover. This practice also requires that the current state of the incident be communicated effectively to a relief team. How might you do that?
Practice together, under pressure
Guess what. Practice works! The worst time for your first experience of collaborating on an incident is in production, for real. When the stress of a real incident hits, that prefrontal cortex effect may make it difficult to think clearly. Ideally, you’ll want to be able to fall back on learned behaviours or muscle memory for much of the incident mechanics, leaving your vital remaining cognitive capacity to focus on the hard job of diagnosis, coordination and recovery. Uptime Labs exists precisely for this reason: to allow individuals and teams to practice incident response in a realistic but risk-free environment, so they’re prepared for the unprepared.
There’s plenty more to say on this topic. We’ve barely scratched the surface. We’ll be writing more about managing the stress of incidents, but for now, remember that your incident responders are remarkable. Now go take a rest.





