A page in which to capture one telling of the story I'm trying to tell.
This page is a 🌿 Budding Essay.
There's a thing going on in the software industry that I want to explain—a warning. But to get there I need to take you on a bit of a journey through some insights from Resilience Engineering.
There's a property of nuclear reactors that featured in the meltdown of Three Mile Island. It's called going solid.
Some nuclear reactors include a pressurizer which acts as a kind of shock absorber for sudden increases in pressure. The pressurizer contains some water and some steam. The steam can compress when pressures increase. By contrast, water barely compresses at all. If the pressurizer becomes completely filled with water, it is said to have "gone solid" and the system becomes unable to absorb increases in pressure.
The engineers at Three Mile Island were fixated on the risk that the pressurizer might go solid. In their fixation, reinforced by incorrect signals in the instrumentation, the engineers took decisions that contributed to the partial meltdown of the reactor core. The nuclear power industry never recovered in the United States.
It turns out their fixation was itself a consequence of effective training in the US Navy. All of the engineers on duty at Three Mile Island had operated nuclear reactors on board Navy submarines. That training regarded preventing the pressurizer from going solid as the single most important priority.
This jargon of "going solid" was applied to the analysis of hospitals in a bed crunch. See Saturation & Going Solid.
> We have observed situations where an entire hospital is saturated with work, creating a system wide bed crunch. The result is a dramatic change in workplace operational characteristics that creates new demands and increases the stakes for practitioner decisions and actions. ... > "Going solid" in hospitals creates problems that are similar in many respects to those that accompany going solid in nuclear power plants. In particular, "going solid" in a hospital creates a series of critical relationships, tightly coupling the units of the hospital together so that events in one place have direct implications for the operations of all the others.
How can we apply this pattern of going solid to software?
In the past few years, nearly all of the biggest players in the software industry have laid off significant parts of their workforce. The consequences of this loss of expertise can go unnoticed for years, hidden by perverse incentives that emerge in systems approaching overload.
These problems are entangled with other changes over the past decade. We have been slowly connecting all of our businesses through rapid adoption of software-as-a-service and cloud computing. We have created a series of critical relationships, tightly coupling our businesses together so that events in one can cascade to others.
The largest scale example of this tight coupling was revealed in the global failures from entanglement between Crowdstrike, Microsoft, and an extraordinary number of their mutual customers.
We can safely predict that other failures are likely, though we cannot predict where or how they will present.
Resilience Engineering has suggestions for what to do when you can anticipate turbulent times. See The Strategic Agility Gap.
> Organizational systems succeed despite the basic limits of plans in a complex, interdependent and changing environment because responsible people adapt to make the system work despite its design—SNAFU catching. The ingredients are: > * anticipation—seeing developing signs of trouble ahead to begin to adapt before the evidence is definitive (waiting till evidence is definitive almost guarantees being slow and stale); > * contingent synchronization—adjusting how different roles at different levels coordinate their activities to keep pace with tempo of events; > * readiness to respond—developing deployable and mobilizable response capabilities in advance of surprises; > * proactive learning—learning about brittleness and sources of resilient performance before major collapses or accidents occur by studying how surprises are caught and resolved.
# Four Capabilities for Continuous Adaptation
balance expression of initiative as pressures wax and wane... prioritize some goals, sacrifice others when conflicts across goals intensify.
build reciprocity across roles and levels.
learn to guide adaptability through tangible experience of surprise. learn to recognize SNAFU... Situation Normal. Surprises reveal how re-prioritization emerges amid goal conflicts.
proactive study of existing resilience. Specifically, invest in learing from the small surprises. Especially by studying sets of surprises that reveal resilient performance—the SNAFU catching. Invest also in lightweight mechanisms to spread the learning about SNAFU catching across roles and levels.