Check out the episode:
You can find the shownotes through this link.
Are you interested in how to create antifragile systems?
Our summary today works with the articles titled An introduction to Residuality Theory: Software design heuritics for complex systems and Residuality Theory, random simulation, and attractor networks from 2020 and 2022, by Barry M O’Reilly, presented at International Workshop on Computational Antifragility and Antifragile Engineering.
This is a great preparation to our next interview with Barry O’Reilly in episode 360 talking about how residuality theory can be used for the future of cities.
Since we are investigating the future of cities, I thought it would be interesting to see what Residuality Theory is and its connection to antifragility. These articles present Residuality Theory, a novel approach to software design, and proposes that complex software systems inevitably encounter unprecedented stressors, so it is best to create a system that can even handle those.
[intro music]
Welcome to today’s What is The Future For Cities podcast and its Research episode; my name is Fanni, and today we will introduce a research by summarising it. The episode really is just a short summary of the original investigation, and, in case it is interesting enough, I would encourage everyone to check out the whole documentation. This conversation was produced and generated with Notebook LM as two hosts dissecting the whole research.
[music]
Speaker 1: Have you ever been there? You’re relying on some software. Maybe it’s mission critical for your business, and it just stops. It breaks down, not because you necessarily click the wrong thing, though. Yeah. That’s one kind of stressor we could talk about, but because it hits something totally unexpected out in the real world, or maybe you’ve been on the other side on a dev team. Despite all the planning, the agile sprints, the project hits the rocks because the system just can’t cope with the unknown. It’s frustrating. It costs money, and it happens way too often. So today we’re gonna dig into something pretty groundbreaking, actually. It aims to tone that frustration into something stronger residual reality theory. It’s this really innovative approach to designing software systems that don’t just withstand unexpected stress, but can potentially learn from it, maybe even get stronger. Our mission today is to unpack this residuality theory for you. We want to get a grip on the core ideas. I really see how it offers a concrete and actionable way to build software that can thrive when things get messy and unpredictable. We’re trying to move past just saying we need resilience into something you can actually build and maybe measure. Okay, so let’s unpack this a bit For decades, software engineering field has been all about features, functionality. We build things to do specific stuff. What’s the hidden cost there? What gets left behind when we’re chasing those shiny new features?
Speaker 2: Oh, it’s a huge blind spot. You look at traditional methods, they often treat system behaviour, what we sometimes call non-functional requirements, almost as an afterthought.
Speaker 1: Right? Something you bolt on later or fix when it breaks.
Speaker 2: Exactly. It’s something you deal with after it’s in production, usually after you start seeing problems, and it’s precisely this complex behaviour, especially how the system reacts to the unexpected stuff in its environment. That’s so often behind project failures,
Speaker 1: so it’s not the feature itself failing, but how the whole system behaves when the environment throws at a curve ball. Does this problem like extend down into the tools and paradigms we use?
Speaker 2: It really does. The very tools we use are often tied directly to the tech itself. Even things like the agile movement, which is great for embracing some kinds of change. It often still produces what the theory calls a naive architecture.
Speaker 1: Naive. How
Speaker 2: naive. Because any approach that fundamentally relies on predicting change in these really complex environments, it’s doomed from the start. Predicting the truly unknown is by definition impossible.
Speaker 1: Okay, that makes sense. I’ve also seen this term platonic folding come up in relation to this. It sounds a bit abstract but important. Can you break that down?
Speaker 2: Platonic folding. It’s this tendency we have to simplify complex situations prematurely. Imagine trying to understand, say a whole city, just by looking at blueprints of individual buildings,
Speaker 1: you’d miss all the interactions, the traffic,
Speaker 2: precisely. You’d miss the power grid, the water system, the flows In software. We split systems into smaller parts, often assuming the interactions between them are simple or negligible. It gives this illusion of speed and efficiency during development. This oversimplification, this folding can lead directly to what the papers call difficult errors in black swans, those big, unforeseen high impact events because your simplified model just didn’t account for the messy reality.
Speaker 1: Wow. Okay. That paints a pretty stark picture for traditional design approaches. It predicting is impossible and simplifying is dangerous. What’s the big shift? What’s the radical idea? Ity theory brings.
Speaker 2: The radical idea is fundamentally a change in perspective. Instead of starting with what you want the system to do or how you think it should be structured, residuality theory says, hang on, let’s start by understanding what remains. When this thing is put under stress, that remaining part, that’s what’s called a residue.
Speaker 1: Okay? So residuality is the property of having these residues When stress hits, the idea is something will remain and a residue isn’t just code, right? The papers describe it as this collection components, infrastructure, people, information flows, that feels like turning design, thinking completely inside out. Why is that so powerful? What does starting there actually achieve?
Speaker 2: It’s incredibly powerful for a few reasons. First, it immediately decouples your design thinking from the underlying technology. If you start by defining residues what needs to survive, your initial analysis is completely independent of whether you’re gonna use Java or Python or a specific database.
Speaker 1: so you’re not locked into tech choices right at the start.
Speaker 2: That’s exactly, which is huge for modern cloud systems where you’re often mixing paradigms anyway. You get genuine flexibility baked in from day one. And maybe more importantly, this approach forces you to see architecture not just as a static 2D component diagram, but as a dynamic, multidimensional thing. It makes you consider the business context, the software itself and the infrastructure altogether holistically. It breaks down those silos that often prevent us from seeing the whole picture of how a system behaves under real world stress.
Speaker 1: The classic disconnect between dev ops and the business side.
Speaker 2: Precisely, and there’s a critical insight here from complexity theory. These residues aren’t just parts of the same system. They’re actually different link systems interacting with each other. A lot of it failures happen because we use reductionism. We study one part, assuming everything else stays static. Residuality theory forces you to see it as multiple interacting systems, the residues right from the start.
Speaker 1: So instead of finding out what these residues are, when something blows up in production, which is expensive and stressful,
Speaker 2: yes
Speaker 1: we can actually analyse for them upfront, proactively.
Speaker 2: That’s the goal. It’s just not sensible financially or organizationally to wait for production failures to reveal your system’s. True structure. Residuality theory provides a way to do this kind of shadow analysis during design cheaply, easily using what’s called stressor analysis. You’re essentially shining different lights on your design to see the hidden layers, the bits that stick around.
Speaker 1: That makes a lot of sense. Okay, so how does this connect to the bigger ideas we hear a lot about like resilience and antifragility
Speaker 2: resilience and software? It’s often a bit vague, right? Generally means the system can survive a disruption and get back to functioning. But residual reality theory aims higher than just survival. It’s aiming for antifragility.
Speaker 1: Right? Antifragility. That’s Nassim Nicholas Taleb’s concept systems that don’t just resist stress, but actually learn from it. Get stronger because of it.
Speaker 2: Exactly. So residual reality theory isn’t just about designing for resilience. It provides a way to actually confirm that your proposed structure is resilient or at least has the characteristics of resilience during the design phase. Yeah. Before you build anything. It inherently pushes you towards establishing those key properties needed for anti-fragility. Things like modularity, having weak links that can fail safely. Redundancy diversity. The design process itself aims to build these in to outline how the system will respond when failures happen, not just hope they won’t.
Speaker 1: Okay. This is fascinating. We’ve talked about the what and why of residues. Mm. But how does an architect like actually do this? The theory brings in some pretty deep concepts like hyper liminality,
Speaker 2: hyper liminality. It basically describes the fundamental situation we’re in. You have an ordered system, your software, which exists inside a much larger disordered, dynamic, unpredictable system. The enterprise context, the market, the real world,
Speaker 1: okay? The software is neat. The world is messy,
Speaker 2: pretty much, and as an architect, you’re constantly moving between these two realities. The predictable code you can test, and this chaotic, ultimately unknowable environment. It has to live in. Now within this hyper liminal setup, you get something called hyper liminal coupling. Okay? This is like an invisible connection, an unknown dependency between two parts of your software. Why is it invisible? Because they are both interacting with some common stressor out there in the messy environment, and because you often don’t know about that specific stressor until it actually happens, that coupling, that dependency remains hidden from you during design until you know it’s too late.
Speaker 1: That raises the really big question then. If the environment is unpredictable and creates these invisible traps, how on earth do we design for that?
Speaker 2: And that is the core proposition of residuality theory. It argues that software engineering in these hyper liminal environments can be understood and maybe even managed as a combination of two things. First, random simulation of the environment, and second analysis of the software’s architecture as a network.
Speaker 1: Okay, random simulation and network analysis. Let’s take the first one. How is that different from say, requirements gathering or risk management?
Speaker 2: Traditional methods try to reduce uncertainty. They focus on precise statements, known risks, maybe assigning probabilities. Residual reality theory flips that it says, look, stakeholders guessing about the future is already a kind of random simulation. Instead of trying to make it less random, let’s embrace the randomness. Let’s enhance it. How? By using what the papers call random nonsense stressors and impossible events.
Speaker 1: Random nonsense stressor. This server gets hit by lightning while simultaneously experiencing a DNS outage caused by squirrels.
Speaker 2: Maybe not that specific, but the idea is to move beyond just the probable stressors. It sounds counterintuitive, right? Why waste time on things that almost certainly won’t happen.
Speaker 1: Yeah, it does seem odd,
Speaker 2: but it’s crucial for tackling the curse of dimensionality. There are just too many possible interacting factors in a complex environment to predict them all by throwing in these less probable, even seemingly nonsense stressors. You probe the system structure in ways you wouldn’t otherwise think of. You uncover those hidden hyper liminal couplings. Even a near zero probability event might reveal a critical structural weakness or pattern. It’s less about predicting the future mathematically and more about collecting rich narratives about how your system might behave under a wider range of strange conditions.
Speaker 1: Okay, interesting. So you stress it in weird ways to see hidden connections. What about the second part? Network analysis?
Speaker 2: This means modelling your software not just as components, but as a network. A network of residues, and within those residues, networks of components and functions and flows. This ties into concepts like attractors from complexity science. These are the stable states your system tends to return to, even when perturbed. Your software structure being more ordered has fewer attractors than the wild environment it lives in.
Speaker 1: So the random stressors can still point to important stable or perhaps unstable patterns within the software itself.
Speaker 2: They help you map out these attractors, understand the system’s inherent behavioural patterns under stress. And this brings us to something called NKP analysis. This is where it gets really practical. I think NKP, yeah, N is the number of nodes or components. K is the number of connections between them and P is the bias of a node towards a particular outcome. NKP analysis gives you a way to actually start quantifying those traditionally vague concepts like loose coupling and cohesion,
Speaker 1: so you could actually measure coupling.
Speaker 2: The analysis provides metrics that relate directly to those concepts. It gives architects a much clearer, more measurable way to understand and engineer these desirable properties, providing a real blueprint for resilience before you even write code.
Speaker 1: This links to that idea of designing for the edge of chaos, that balance point
Speaker 2: precisely. You’re using NKP analysis to tune the network structure. You don’t want it too rigid and ordered brittle, nor too chaotic and interconnected, fragile, prone to contagion. You’re aiming for that sweet spot. The edge of chaos. A system there has stability, but it also has affordances. The ability to shift between different stable states to adapt when things get seriously stressful. It’s robust, but also adaptable.
Speaker 1: Incredibly powerful, but definitely a different way of thinking. How do architects actually do this network analysis and practice? What tools does the theory point to?
Speaker 2: The theory suggests using two main types of matrices, simple but powerful tools, adjacency matrices, and incidence matrices. Adjacency matrices help you look at dependencies between elements of the same type, so component to component dependencies or function to function or flow to flow. They help you spot unnecessary links or nodes that are so tightly coupled. They should probably be combined. It helps you refine N and K, the structure. Then you have incidence matrices, these map elements of different types against each other. Specifically, you map your stressors against your system elements, the residues, the flows, the functions, the components.
Speaker 1: So that shows you what gets hit by what
Speaker 2: exactly. It gives you this clear visual map of which parts of your system are most vulnerable to which kinds of stress. And crucially, it helps you group components that react similarly to the same stressors. That’s how you reveal that hidden hyper liminal coupling we talked about earlier.
Speaker 1: I see. So by looking at both kinds of matrices,
Speaker 2: by combining the insights, yeah. Architects can see clear patterns. They can see where refactoring is needed to reduce dangerous dependencies, or how to group components within residues to prevent problems from spreading to limit contagion. Long before it ever gets near production.
Speaker 1: Okay. That makes it feel much more concrete. What does residual analysis look like? Step by step, highlighting the most unique parts.
Speaker 2: It’s an iterative process. Remember, you typically start with what the theory calls a naive architecture. Basically your current best guess design, using more traditional methods. Then you describe that system in terms of information flows between its main parts. You brainstorm, list out all the potential stressors you can think of, technical, business, environmental, whatever. Now, here’s a really unique step. You design the residues, you define these collections of components, flows, et cetera, that need to survive specific stressors. But crucially, you do this without recourse to probability.
Speaker 1: Without probability. So you don’t care how likely a stressor is
Speaker 2: for the purpose of designing the residue. No, you’re covering a wider range of potential impacts, designing for the effect of the stress. Not just the likelihood of one specific cause. This forces a more robust, adaptable design than just focusing on the top five most likely risks kind of thing. After defining residues, you dig into the component structure Within them using those adjacency and incidents matrices, you figure out how stress hits them, how sensitive they are to contagions, spreading from other components or residues. You make decisions here about sharing components, isolating functions, et cetera.
Speaker 1: Okay, makes sense. Then what
Speaker 2: then comes perhaps the most crucial step? This borrows directly from machine learning concepts. You iterate using training and testing sets of stressors,
Speaker 1: like training an AI model.
Speaker 2: Yeah. Yeah. You hold back some of your brainstorm stressors. You use the others to train your design, refining the residues and component structures based on the matrix analysis. Then you use the held back test stressors to see how well your design copes with things it wasn’t explicitly designed or tweaked for.
Speaker 1: So you’re checking if it handles unknown unknowns, at least simulated ones.
Speaker 2: Exactly. You’re verifying if the structural changes you made actually improved resilience against stressors The system wasn’t specifically taught about. It’s about shaping the system based on the impact of multiple stressors, not just trying to encapsulate each risk individually.
Speaker 1: And this leads to something measurable, a way to know if you actually improved things.
Speaker 2: Yes. That’s the beauty of it. The process allows for the calculation of the residual index or re, this is a tangible metric. It compares the resilience performance, how well it handles those test stressors of your new residual design against your original naive design. If re is greater than zero, it indicates your design interventions were successful. The system is measurably more likely to survive unknown forms of stress and the papers mentioned lab runs showing positive revalues, giving it some empirical backing. Plus, a nice side effect is the architecture stays relevant. You can add new residues later as new stressors emerge. Without necessarily invalidating the whole design
Speaker 1: that is compelling. So bringing it all together, what’s the big takeaway here for someone listening, thinking about building more robust software? What are the key benefits of really digging into residuality theory?
Speaker 2: Well, I think the core benefits are pretty significant. First, you get concrete, potentially demonstrable resilience and antifragility. You have that residual index as a measure. Second, it forces you to elevate those non-functional properties, the system’s behaviour under stress right up front in the design process. That directly lowers the risk of project failure later on. ’cause you’re tackling the hard stuff early. Third, it shifts your focus. You stop trying to do the impossible, predict every change and start focusing on the more manageable task of understanding and designing for sensitivity to stress,
Speaker 1: which leads to different risk management.
Speaker 2: Exactly. It encourages non-linear risk management. It lets you model the system based on its stress sensitivity, not just its features or use cases. Hugely important is that it helps break down those persistent silos between infrastructure and code teams because you’re looking at stressors that affect both together. The architecture you design has a better chance of staying relevant longer and critically. Maybe the biggest thing, it provides actual measures you can use to reason about the resilience of a design before you commit to building it. That’s incredibly valuable. Finally, it’s unique in that it really pushes you away from supplying standard design pattern. It forces you back to first principles to design solutions, specifically tailored to your system’s, context and vulnerabilities.
Speaker 1: Okay. This deep dive really shows that residual reality theory offers a genuinely transformative way to think about software design, especially in this messy, unpredictable world we operate in. It really is about fundamentally shifting our focus. Stop obsessing only about what breaks and start understanding what remains when things get tough and it leaves you with a final thought, doesn’t it? The theory suggests that successful long-lived architectures, even if they didn’t use this terminology, probably intuitively followed some of these principles. They likely ended up with resilience structures through experience, maybe even luck. But what if we could apply this explicitly, systematically? Using powerful tools, maybe even algorithms for simulation and analysis as the theory hints might be possible in future research. Imagine building systems that are maybe not literally unbreakable, but truly designed to be immune to the vast majority of the unknown unknowns out there. What if the next big leap in software isn’t just about building faster, but about building fundamentally stronger, more enduring systems?
[music]
What is the future for cities podcast?
Episode and transcript generated with Descript assistance (affiliate link).


Leave a comment