
Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
Nearly 100 years ago, Edwin Link, a piano maker's son from Binghamton, New York, invented the first flight simulator called the "Link Trainer" or "Blue Box." Flight training was expensive and dangerous, and he saw an opportunity to let pilots practice basic flight manoeuvres safely on the ground.
The device was remarkably simple by today's standards: a small aircraft fuselage mounted on a universal joint that could pitch, roll, and yaw. It had functional cockpit instruments (including radioactive glowing radium dials) and allowed pilots to practice instrument flying without leaving the ground. Link initially struggled to sell the concept; the military and airlines were sceptical.
In 1934, after a series of fatal crashes during U.S. Air Mail Service operations killed several pilots, the military realised that many pilots lacked instrument flying skills. The U.S. Army Air Corps purchased six Link Trainers in 1934. By 1939, as WWII approached, they had ordered over 10,000 units. This rudimentary simulator proved decisive in WWII, with RAF Chief of Staff Air Marshal Robert Lockie quoted as saying, "The Luftwaffe met its Waterloo on all the training fields of the free world where there was a battery of Link Trainers".

A link trainer
Aviation wasn't alone
Other industries faced the same challenge: when the stakes are high, and safety is critical, learning on the job is insufficient. Nuclear power operators began training in control room simulators; surgeons started practising procedures on high-fidelity mannequins; maritime officers employed mandatory bridge simulator training following the Exxon Valdez disaster, and astronauts from as early as Mercury and Apollo trained using simulators.
Today, no commercial pilot flies passengers without hundreds of simulated hours. No surgeon operates without practising on simulators. No nuclear plant operator touches a reactor without extensive control room simulation. No ship's officer navigates without bridge simulator certification. In most industries, when the cost of failure is high, practice, through simulation is foundational. In such industries, such practice is mandated through regulation.
A recent study by New Relic estimates that the average cost of a major IT outage is $2 Million per hour, hardly a low cost of failure, yet most IT teams face their first major incident the same way pre Link-era pilots faced their first engine failure: in production, unrehearsed and underprepared.
What do other industries know about simulation that IT has yet to learn?
A common challenge to major investment in IT resilience is the claim that, compared to aviation, nuclear power, or the operating theatre, the stakes are lower, so IT is less ‘safety critical’. Listen to how Resilience Engineering pioneer Dr Richard Cook addressed this reasoning in 2012.
What's happening here is the lifeblood of commerce, the core of the economic system we now experience. Do you think that’s unimportant? The key question is whether healthcare's importance matches that of web operations—not the reverse.
While Dr Cook’s point was not about simulation, he noted that technology failure has a high impact, even without the obvious visceral associations of medical failure. He also recognised that practitioners across domains face similar challenges in emergencies: “People struggle to do a good job in hyper-complex, conflicted situations, and that’s a tough place to be. People lose sleep over that in all these worlds.”
What does simulation provide versus other ways of training?
Safety critical industries obviously employ other methods of training in addition to simulation. Classroom, e-learning, apprenticeship among other methods are all used and are all valuable. These methods are particularly suited to the development of theory, knowledge and skills. Simulation on the other hand is especially effective at nurturing expertise. To understand simulation's unique value, we must first distinguish between knowledge, skills, and expertise
Knowledge is what we understand — the concepts, models, and facts that explain how a system is meant to work and what “good” looks like. Skills are what we can do — the repeatable techniques and procedures we can perform reliably, such as following a runbook, using tooling, or coordinating tasks. Expertise is different: it’s the ability to apply knowledge and skills well when conditions are messy, time-pressured, and unfamiliar. It shows up in judgement, prioritisation, and adaptation, noticing what matters, anticipating second-order effects, and reshaping the plan as the situation evolves. Knowledge and skills travel well in stable conditions; expertise is what holds up when the situation doesn’t.
As such, simulation is well suited the development of expertise in dealing with crisis situations. While learning to fly an aircraft straight and level is expensive and dangerous, learning how to land in a cross wind with a double engine failure is a whole different level. Simulation not only allows practitioners to experience such scenarios without the obvious danger (or cost), it exposes them to the experience of dealing with uncertainty and ambiguity, which itself acclimatises people to dealing with the unknown.
Here are some other areas where simulation excels.
Deliberate Practice of Rare but Critical Events
Expertise develops over time through repeated exposure to situations, allowing pattern recognition and schema development. In complex systems, however, many critical events (major outages, cascading failures, security breaches) are rare. You can work in IT for years without encountering specific scenarios (analogous to an aircraft engine failure). Simulation allows practitioners to experience a range of unusual events that might otherwise take lifetimes to encounter.
Managing Cognitive Load Under Stress
During incidents, stress can cause working memory to become overloaded, degrade, and make knowledge retrieval harder. Simulation can be tuned to expose practitioners to varying degrees of stress with different cognitive demands. Experience under such conditions enables the practical development of strategies to manage cognitive load.
Developing Joint Cognitive Systems
Modern IT incident response is fundamentally a joint cognitive system. The "intelligence" doesn't reside in individuals but in the coordination between them and their interaction with technology and artefacts. Yet most training focuses on individual knowledge. Simulation allows teams to practice together in safety. Airline crews, for example, practice Crew Resource Management (CRM), which focuses on communication, leadership, teamwork and workload management.
Safe Exploration of the Problem Space
Experts develop deep mental models by exploring the boundaries of systems, understanding what breaks them, where the edge cases live and what happens when rules are violated. Simulation offers a safe environment to explore system boundaries without exposure to existential risk.
Accelerated Feedback Loops
Learning requires tight feedback loops—you do something, you see results, you adjust. In real IT operations, feedback is often delayed or ambiguous. In addition to its use to gain more frequent exposure to challenging scerarios, simulation can provide faster feedback, facilitating quicker learning in these scenarios.
Repeatability
There is no such thing as a repeat incident, but simulation does allow for the same scenario to be experienced multiple times. This allows a single individual to try multiple approaches to the same problem, and it offers the opportunity for multiple individuals to separately experience the same scenario and to compare and contrast the approaches. Imagine being able to expose multiple teams to the same incident scenario. What might you learn?
Building Metacognitive Skills (Thinking About Thinking)
Experts aren't just technically skilled; they monitor their own cognitive processes, recognise when they're confused, know when to slow down, and adapt their strategies. These metacognitive skills are invisible in classroom training. Simulation, (and reflection upon simulated scenarios) provides: -
- Recognition of confusion: "I thought I understood this, but under pressure I'm lost"
- Bias awareness: Discover your own confirmation bias, anchoring, or tunnel vision
- Strategy adaptation: Learn when your default approach isn't working
- Calibration: Develop accurate confidence in your own judgments
Perceptual Learning and Situation Awareness
Experts develop perceptual expertise, they see situations differently than novices. An experienced DBA glances at database metrics and immediately recognises a pattern that is suggestive of a specific issue. This perceptual learning requires extensive exposure. Simulation provides: -
- Cue recognition training: Experience what signals matter
- Pattern exposure: See the same underlying problem manifest in different ways
- Attention direction: Learn where to look, what to monitor
The Temporal Nature of Incidents
Expertise development requires experiencing the full temporal arc of events, how small problems grow, how cascading failures unfold, how recovery takes time. Incident simulation exposes practitioners to this temporal arc, and can also facilitate: -
- Fast-forward: Compress hours of slow degradation into minutes
- Slow-motion: Expand split-second decisions into teachable moments
- Temporal pattern recognition: See how events unfold over different timescales
- Pacing practice: Learn to work deliberately during pressure without rushing
Developing Adaptive Expertise vs. Routine Expertise
Runbooks and playbooks outline routine steps, designed to be helpful during incidents. These are undoubtedly helpful but should be seen as a floor rather than a ceiling for effective incident response. Real expertise manifests as the ability to adapt in real-time to the situation rather than sticking to the runbook. Simulation is the perfect environment to learn this skill
An Underexploited Opportunity in Incident Response
While it's inaccurate to say that simulation is completely absent in the routine development of incident response expertise, it is fair to say that it is underused and underexploited. Tabletop exercises and game days are both strategies that employ simulation techniques to provide practitioners with valuable experiences in low-risk settings. However, while tabletop exercises and game days can be useful, they tend to be expensive and disruptive to run, and therefore tend to be underused compared to simulations in aviation, medicine and nuclear power.
Uptime Labs is the first platform to bring these benefits in a convenient, cost-effective form that makes simulation practical as a routine discipline, rather than an occasional event. By reducing the time, coordination and organisational disruption typically associated with running game days and tabletop exercises, it enables teams to train more frequently and more deliberately. The result is that simulation can move from being an exceptional activity to a standard practice: embedded into operational strategy, repeated often enough to build genuine expertise, and scaled across teams without the usual overhead.




