
Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
Today, I’d like to share a valuable lesson about resilience in software. A lesson that was taught to me by a eucalyptus tree.
For the avoidance of doubt, the eucalyptus tree, like many of the best teachers, demonstrated a bias towards showing rather than telling, but its lesson was all the more powerful for it.
I bought the tree as a young sapling. Thin, perhaps 3 feet tall, somewhat scrawny but full of potential. I planted it in a large pot in the garden: large enough to accommodate plenty of growth.
As it was a young tree, I considered it important to provide some support for its spindly trunk, especially given the commonly blustery conditions in my Yorkshire garden. So I added some stakes, providing robust rigidity as it grew.
The tree grew quickly, and over the course of 6 months, reached around 8 feet in height. As it grew, it became more susceptible to damage by the wind, so I added more support in the form of longer and thicker stakes. I felt quietly pleased with myself.
It continued to grow strongly - until an autumn storm brought fierce winds. The supporting stakes snapped, and the tree flopped pathetically, folding halfway up its weedy trunk and eventually slumping, roots exposed next to its pot and a pile of earth. I then felt loudly disappointed with myself.

The Eucalyptus tree - still alive, barely.
Turns out, in my pursuit of a ‘robust’ tree, I’d inadvertently (some would say stupidly) stunted the tree’s ‘resilience’. The stakes that I deployed to help support the tree were indeed robust. But they also shared an attribute common to many robust systems: they were brittle. They provided plenty of support until a threshold was reached, upon which they failed suddenly and catastrophically.
Resilient systems, on the other hand, adapt more flexibly as they approach a threshold. The bending of a tree in the wind is actually necessary to prevent it from failing. It not only redistributes the force of the wind, but trees also have specialised mechanoreceptors that detect physical stress. These sensors trigger a cascade of biochemical responses - resulting in a thicker trunk, deeper roots and stronger wood.
So the development of strength in trees is an adaptive response to mechanical stress. The process has a name, which is too cool not to share: it's called Thigmomorphogenesis.
So what did the eucalyptus teach me about resilience in software?
This horticultural lesson underlines the point that resilience is a process. A process that can be nurtured and, equally, can be impeded.
In my case, I traded the opportunity to develop resilience for robustness. While this appeared (at least to my inexperienced tree-nurturing eyes) successful for a while, it led to sudden, brittle failure.
Organic, adaptive processes in biology have benefited from millions of years of evolution. Yet, software’s lineage is measurable in mere decades. Many of the techniques that we commonly associate with resilience, such as:
- redundancy
- failover
- autoscaling
…are probably more accurately described as robustness rather than resilience, in that they’re designed to cater for known scenarios rather than supporting adaptation to novel situations. As such, while they’re indispensable, they may not be sufficient.
However, software systems are socio-technical systems (the melding of humans and technology). The technical part might not have evolved adaptive capacity; yet, humans have been around long enough to learn a thing or two about that. It’s the humans in the system that are the source of resilience and potentially (depending on the choices one makes) its destroyers.
Just as I, a broken human with a broken eucalyptus tree, made choices that impeded its resilience, so too can I make choices that nurture or support resilience in software and organisations.
Examples of What to Avoid When Nurturing Resilience
Let's take a look at some examples (think of these as analogous to my over-staking of the tree):
Over-automation
While automation is great, if it isn’t paired with feedback mechanisms that improve the system, vital signals will be lost, resulting in a brittle system.
Overly rigid control structures
While control (such as release processes) is important, overly rigid structures can reduce the frequency of feedback associated with making changes, reducing the likelihood that adaptation will occur based on these signals.
Shallow or narrow resourcing (no slack)
If teams are utilised to 100% capacity, there will be little capacity to adapt, and brittle failure is likely.
KPIs that incentivise a reduction in reporting of incidents
Incidents are signals that can result in adaptation if listened to, or not if ignored.
Suppressing weak signals of strain
Frequently, the effort invested in learning from incidents is proportional to the severity of the incident. Minor incidents or near misses have the potential to trigger adaptation if noticed.
Strict metrics and KPIs that undervalue adaptation
KPIs are valuable, but over-incentivisation can lead to missed opportunities for adaptation if signals fail to emerge in a KPI.
Over-codification of best practices
Best practices without context are meaningless, and context is frequently shifting. Adherence to best practices over adaptation should be avoided.
Over-reliance on AI
We’re all punch-drunk on AI and it’s marvellous, but we need to ensure that we’re not short-circuiting human adaptive capacity by overly delegating to AI.
One thing that such practices have in common is that they impede the feedback loop between a signal and an opportunity to adapt to that signal. By over-staking my tree, I prevented the tree from receiving the signals associated with mechanical strain, therefore depriving it of the opportunity to adapt.
Tactics for Organically Growing Resilience
So, beyond practices to avoid, what can we do to cultivate resilience? Here are some things to consider:
Learning from incidents
Invest time and energy in learning about what actually happened during incidents, so that adaptations can be made to strengthen the system.
Continuous delivery
Deliver frequently in small batch sizes, regularly exposing the system to focused change, giving it the opportunity to adapt rapidly.
Pair/Mob programming
The ultimate real-time feedback loop where peers provide feedback during the creation process.
SLOs and error budgets
Error budgets acknowledge that errors will happen; they are unavoidable. However, error budgets raise signals when error rates reach agreed thresholds, creating an opportunity for adaptation.
Practising incident response
Game days and simulations are valuable methods of exposing teams to a variety of scenarios that can trigger reflection and adaptation. Incidents are fertile ground for adaptation, and the more you can become comfortable with dealing with situations of surprise and ambiguity, the more your resilience will increase.
Beyond these examples, the first thing is to ask yourself whether your practices impede or nurture adaptation. It's important to state that the answer is not an either/or. It’s a balance, and like any balance, it’s constantly shifting and requires continual attention. That’s another reason why resilience isn’t a fixed state; it’s a continual practice.
Note: This post was highly influenced by Dr Richard Cook’s amazing talk: The Bone Talk: Resilience and resilience engineering, in which he describes bone as the archetype of resilience.



