Ready to make incident response your competitive advantage?
See how Uptime Labs builds provable, scalable incident response capability across your organisation.
Organisations are complex socio-technical systems. Two words in that phrase carry fundamental implications for AI adoption strategy: socio-technical and system.
The former means people are part of the system. The latter means a functioning whole: a set of parts that only produces its result together, not individually. You can have the best individual components and still end up with a dysfunctional system.
I'm an early adopter by nature. As an engineer, I love using new ideas to do more good, faster. But this time around, something feels off. I'm getting a lot more done, yet I can't see proportional productivity in terms of economic value. I'm more stressed, I work longer hours, I pay more (AI tokens) and yet the output doesn't reflect any of it.
Then, in what felt like a coincidence, two separate talks gave me an explanation.
The first was by Adam Bender, who applies a systems thinking lens to AI adoption in software engineering. The second was by Daron Acemoglu, an economist who uses Weak Link Theory to explain the economic impact of technological revolutions. After watching both, I realised they were saying the same thing: your productivity is limited by the weakest link in the system.
Building and operating software is a function of a socio-technical system, which is itself a subsystem of a bigger whole called an organisation. If you map out how software is actually created and operated in your organisation, my bet is you'll end up with a complex, interconnected system that no single person fully understands. And yet, somehow, it works. Here's my own list at Uptime Labs:
- Mission of Uptime Labs: Why we do what we do
- Engineering principles: influencing product, architectural, and technical choices
- Understanding of all the technical decisions made so far
- Product requirements and user research
- Repository structure
- Quality gates
- Workflows
- Development environment and testing strategy
- Continuous deployment, post-deployment tests and rollback mechanisms
- Writing code and code reviews
- Communicating changes to customers
- Observability
- Incident response process and tooling
That list is already intimidating, and we're a startup. I can't even list everything, let alone map all the relationships between the parts.
Unless we can claim that AI eliminates the need for human input in every single one of those components, we will always end up with a socio-technical system that involves humans. Now assume AI can automate and scale all the non-human parts. Humans immediately become the weak link. At least three things likely to happen:
- People will feel unprecedented pressure
The system accelerates around them to the point of breaking o.e. a jet engine bolted onto a tractor. You can ignite it, but the tractor wasn't built for that.
- The value of the weak link rises dramatically
- Engineers who address the bottleneck unlock value across the whole system. Contrary to much of the public discourse, those skills become more valuable, not less.
- A lot of money gets wasted
We pay to generate 10x more code because we can. But if the system's output doesn't increase and may actually decrease due to pressure on the weak link. Ie. we've invested heavily in one part while starving the rest.
A few examples related to other parts of the system:
Take the principle of releasing small pieces of code, noticing a problem and rolling back quickly. What happens when you're releasing far more code, far faster, without time to observe the impact in production? Some problems take hours to surface.
Or consider code reviews; the cognitive load scales with the volume of code.
Or shared system understanding, which is critical to effective incident response. If nobody fully understands what was just deployed, how does the team reason their way through an outage?
This is where the left-over principle becomes important. These are the skills that remain uniquely human once AI handles everything else: sense-making, coordination, communication under pressure. When an incident hits, what determines whether it takes 20 minutes or 4 hours to resolve isn't usually the tooling. It's whether the people involved can orient quickly, communicate clearly, and make decisions in uncertainty. Those skills aren't a nice-to-have. As AI raises the ceiling on everything else, they become the rate-limiting factor.
My point isn't to slow down AI adoption. It's to think about the whole system and not waste money investing heavily in one part while the real bottleneck goes unaddressed. The examples I mentioned here all could be opportunities for innovation. Daron Acemoglu argues that at best, we're decades away from scaling all parts of the software engineering system. And even if technology is ready tomorrow, change takes time to propagate through a global ecosystem.
At Uptime Labs, we think carefully about what AI adoption means for operations and incident response: not just the technology itself, but the evolving relationship between humans, AI agents, software, and customers. Which skills become more critical as AI handles more of the rest? Operations and incident response are just one part of the picture, but they're a revealing one.
If you've made it this far, you deserve the biggest prize of this post: 4 golden minutes from Dr Russell Ackoff:
”The system is a product of the interaction of its parts”
I’d genuinely love to know what you think.
This is a critical moment for software engineering and IT operations. As practitioners, we have both the motivation and the duty to figure this out. Because it's quite literally affecting our lives.



