Test Pappy

Looking From All Sides – P-Circle – The Six Moves – Part 6

This is Part 6 of a series where I apply six systems thinking moves to the AI landscape. In Part 5 we cracked open relationships with the RDS Barbell. Now we step back and ask a different question entirely. Not what, not how, but who.

The sixth move from the DSRP framework is the P-Circle. You take a topic and you lay out all the perspectives around it. Every perspective has two elements: a point (the observer, the one who is looking) and a view (what they see from where they stand). Different points, different views. Same reality, seen differently.

Looking at a cylinder from two directions.
The models of “reality” will also differ.

This move is deceptively simple. Put the topic in the center. Place the perspectives around it. Like, in a circle. But the real power is not in listing who’s looking. It’s in noticing who is not looking. Or who is looking but nobody is listening to. If you want to understand what they see, you can apply the RDS Barbell move from Part 5 to every view.

Let’s put the coding agent in the center and see who’s standing around it.

The Perspectives

The developer. They see a productivity tool. Something that helps with boilerplate, speeds up exploration, handles the boring parts. Some see a threat to their craft. Some see a crutch they’re becoming dependent on. Their view is shaped by daily interaction with the agent.

The junior developer. A perspective that is rarely separated from “the developer,” but it should be. They see a shortcut that works. But they also miss what they’re not learning. The debugging instincts, the architectural reasoning, the pattern recognition that comes from writing code by hand for years. Their learning path is being reshaped by a tool that doesn’t teach. It just produces.

The tester. They see risk. Non-deterministic output. Code nobody fully understands. Test cases written by the same agent that wrote the code. They ask: who validates this? How do we test something when the person who prompted it can’t fully explain what it should do?

The tech lead or architect. They see architecture drift. Code that works in isolation but doesn’t fit the bigger picture. Patterns that the model favors because they were common in training data, not because they’re right for this system. They worry about coherence over time.

The CTO or CIO. They see strategic positioning. Cost reduction. Competitive pressure. The board is asking about AI. Other companies are rolling out coding agents. There’s a fear of being left behind. Their view is shaped by market pressure and budget cycles.

The CEO. They see the narrative. Coding agents as a signal of modernization to shareholders, to the market, to potential hires. Fewer engineers, more output, better numbers in the next quarterly report.

The vendor. They see revenue. Market share. Lock-in. Every company that integrates their coding agent deeper is a company that will find it harder to leave. The demo is always impressive. The edge cases are never in the demo.

The investor. They see money. AI is a multiplier on any company’s story right now. Coding agents are the most visible proof that a company “does AI.” They need the hype. Their returns depend on it. Their view is shaped by portfolio performance, not by whether the agent actually improves the product.

Operations and support. They see the fallout. The bugs that ship because nobody reviewed the agent’s output properly. The 2am incidents caused by code nobody understands. Their view is shaped by being at the receiving end of everyone else’s velocity.

The end user. They see a product. Does it work? Is it reliable? Is it better than before? They don’t care whether a coding agent wrote the code. They care whether the button does what it’s supposed to do.

In this example we have only looked at roles of people. But perspectives can also be other elements. What is the perspective of the consuming service you integrate with? What is the perspective of the third party library that you use?

Leverage in Perspectives

Not all perspectives carry the same weight in the conversation. The CEO, the vendor, and the investor are the loudest voices. They shape the narrative. They set the agenda. They define what “success” looks like. And their incentives are aligned: more adoption, faster adoption, bigger numbers.

The developer has a voice but it’s fragmented. Some are enthusiastic, some are skeptical, most are somewhere in between and too busy to write about it.

The tester, the junior developer, operations, and the end user are the quiet ones. Their perspectives surface only when something goes wrong. When the bug ships. When the junior can’t debug without the agent. When the customer notices that the product got worse while the velocity metrics got better.

The P-Circle doesn’t tell you which perspective is right. They’re all valid from where they stand. The perspectives bring leverage. What it does is show you which perspectives are shaping decisions and which are being ignored.

Back to the Love Reality Loop

In Part 0 I introduced the Love Reality Loop. The idea that your mental model should point at reality, not at what you wish reality to be. The P-Circle shows you why this is so hard in practice.

When the only perspectives in the room are the ones that benefit from adoption, the feedback that reaches decision-makers is filtered. The signals from reality, from testers, from operations, from end users, arrive late, muffled, or not at all. The Love Reality Loop breaks. Not because reality stopped sending signals. But because nobody in the room is listening or wants to listen to those signals. The signals might not suit their perspective.

This is the last of the six moves. If you’ve followed this series, you now have a basic toolkit. Draw boundaries. Zoom in. Zoom out. Connect the parts. Examine the relationships. And check whose perspective is in the room and whose is missing.

These moves won’t give you answers. They give you better questions. And in a landscape as noisy as AI, better questions are worth more than confident answers.

Crack Open the Arrow – RDS Barbell – The Six Moves – Part 5

This is Part 5 of a series where I apply six systems thinking moves to the AI landscape. In Part 4 we connected the parts and watched them interact. Now we pick up one of those connections and look inside.

The fifth move from the DSRP framework is the RDS Barbell. Three letters, three steps. Start with a Relationship between two things. Say, a developer and a coding agent. There’s a connection between them. An arrow. Most people leave it at that. Step two: you make a Distinction. You take that arrow, that relationship, and you treat it as a thing in its own right. You give it a name. “Trust.” Now it’s no longer just a line between two boxes. It’s a box itself. Step three: you treat it as a System. You ask: what are the parts of this thing called “trust”? What is it made of? And suddenly a simple arrow becomes something you can open up and look inside. The barbell image comes from the visual: two things on each end, connected by a bar in the middle. Most people only see the two ends. The RDS move says: grab the bar, turn it into something you can open, and look inside.

This beautiful representation of the RDS barbell is by the man himself. Dr. Derek Cabrera in the YouTube video **RDS Barbell | Master the Move**

The Cabreras use the example of a door hinge. A door is related to a door frame. The hinge is that relationship. But the hinge is also a distinct thing with its own parts: leaves, knuckles, a pin, screw holes. The relationship between door and frame is not a simple line. It’s a system. And if you want to understand why the door squeaks or doesn’t close properly, you need to look at the hinge, not just at the door or the frame.

So let’s look at some hinges in the AI world.

“Developer trusts coding agent”

Two things, Developer and Coding Agent. An arrow in one direction, Trust. But what is “trust” made of here?

There is perceived accuracy. The output looks like code a human would write. It compiles. It follows recognizable patterns. Looking right and being right are not the same thing, but the brain treats them as close enough.

There is repetition without failure. It worked the last five times. Nothing broke. So you review a little less each time. This is not earned trust. It’s the absence of observed failure. A very different thing.

There is the understanding gap. When you understand the generated code, you can judge it. When you don’t, you tend to trust it more, not less. Questioning something you don’t understand feels like admitting ignorance. So the trust increases precisely where it should decrease.

Three parts of one relationship. And suddenly “trust” looks less like a solid foundation and more like a set of assumptions that nobody is testing.

“AI replaces human work”

You hear this everywhere. Simple arrow. But what’s in the bar?

There is task automation. Some tasks genuinely get automated. Boilerplate, repetitive formatting, simple lookups. Real and useful.

There is skill substitution. The AI does something a person used to do. But the person also did adjacent things while doing that task. They noticed inconsistencies. They built context. They asked questions. They reacted and adjusted to the smallest deviations. The substitution removes not just the task but the side effects of a human doing the task.

There is institutional knowledge loss. When you replace the person, you don’t just lose the task execution. You lose what they knew. The history, the “why” behind decisions, the mental model of the system. I wrote about this as the Knowledge Exodus in my lock-in post. You can’t get that back by hiring someone new and handing them an AI tool.

“Replacement” is not one thing. It’s different dynamics with different risk profiles. I only gave three examples to keep it short.

The Pattern

The RDS Barbell is a precision tool. It doesn’t ask you to map the whole system. It asks you to pick one connection, one arrow, one relationship, and examine it properly. Derek Cabrera said “Nature hides its secrets in relationships.” The same is true for systems we build. The interesting stuff is rarely in the boxes. It’s in the lines between them.

When someone says “developers trust AI” or “AI replaces jobs” they’re handing you an arrow. Your job is to grab that arrow and ask: what is this actually made of?

Next up: Part 6, the P-Circle. We’ve dissected the parts, the systems, and the relationships. Now we ask: who is looking at all of this? From where? And whose perspective is missing entirely?

Responsible Use of AI to Gain Personal Efficiency

I’ve been warning about the dangers of AI a lot lately. And I stand by all of it. But I realized that I’ve been painting an incomplete picture. Because AI, used well, is genuinely useful. So let me try the other side for a change.

The key question isn’t whether to use AI. It’s how. And the answer, as always: it depends! Here are a few examples that work for me in my context. The central theme:

Use AI to assist the human, not to replace the human.

Let’s start with a stack trace that makes no sense. Often a stack trace gives enough information to know where to look. But then there are some, that you’ve been staring at for twenty minutes. Now feed that stack trace to an AI and ask it to explain what’s going on. Chances are, it’ll point you in the right direction in seconds. Not because it’s smarter than you, but because it’s good at pattern matching across a huge corpus of known errors. You still do the debugging. You still decide the fix. But you get to the root cause faster.

Next example is writing code. You need a complicated SQL statement with three joins and a subquery. Or a regex that doesn’t make your eyes bleed. Let the AI draft it. Then read it. Understand it. Test it. Change it line by line if needed. The AI gives you a starting point, you provide the quality gate. I’ve used this in Kotlin a few times, when a complicated method chain results in List<String?> and just doesn’t match the expected List<String>? What am I doing wrong? Short inline chat later and I know.

This is where AI shines for learning. Use it to explain code you don’t understand. Use it to review your code and spot things you missed. Use it to rewrite a messy function into something cleaner. These are all tasks where you stay in control. You see the input, you see the output, you judge the result.

And it’s great for creating deterministic scripts. Need a build pipeline, a data migration script, a test setup? Let the AI draft it. Then go through it line by line. Understand every step. Change what needs changing. Run it, verify it, own it. The result is a script that does the same thing every time. That’s the kind of work AI is genuinely good at accelerating. Quick helper scripts.

Now here’s where people go wrong. They take that same AI and let it run unsupervised. They build a testing agent, set it loose, and walk away. And then they trust whatever comes back. But here’s the thing: the execution is not deterministic. Run a test script twenty times and you know exactly what it does. Every single time. Now run an AI agent twenty times. You get twenty different routes through your application. Exchange the model, you get twenty more. You don’t know what it does. You don’t know what it skipped. You don’t know what it hallucinated along the way. And the prompts need to be so precise to make it predictable that you could usually just code the thing yourself. That’s not responsible use, that’s hope-driven development.

The better approach? Let your dev team explore how AI should fit into their workflow. Don’t mandate it from above, don’t ban it from below. Give people room to experiment. Let them figure out where it helps them personally. Someone might love it for code reviews. Someone else might use it to draft documentation. A third person might find it useless for their specific work. All of that is fine.

The pattern I keep coming back to is this: AI for the repetitive cognitive tasks, humans for the judgment calls. Let it crunch, summarize, draft, and translate. But keep your eyes on the output. Stay in the loop. Because the moment you stop looking at what the AI produces, you’ve stopped being a professional and started being a spectator.

I know this sounds obvious. But look around. How many teams are already running AI-generated code straight into production without a proper review? How many people paste AI output into documents without reading it twice? The tool is only as responsible as the person using it.

So yes, use AI. Use it to learn better, debug faster, write faster or more elegant. But use it like a power tool in a woodworking shop. It makes you more efficient, it doesn’t make you a craftsperson. You still need to know what you’re building and why.

Pick it up. But keep both hands on it.

PS: Personal anecdote time. The best blog post ideas come when I’m not at my desk. If you wonder why I’m writing so much lately. I finally found a workflow that supports me. I get blog post ideas on my morning walk. I ponder them in my head, what to write, to what conclusion to come. Three minutes or one squirrel, bird or tree fungus later the post is forgotten. I tried using the dictate function on my phone. Forget it. I hate it. But the speech-to-text function from Claude – not sponsored – is quite okay. Together with auto-correction of the LLM it’s good enough. So I created a skill for blog-post ideas to convert my stammering into a sorted scaffold, capturing all ideas. When I come home, I can take the scaffold and put it into WordPress. I can come back to Claude to help on research, formulation or review the article for obvious issues. Typos, grammar, all inclusive. I have not lost a blog post idea in the last two weeks. I usually come back with 2-5 new posts now. And I can finally process my thoughts further, and put them in writing.

Is AI the New Drug, or How We Are Building Dependencies We Can’t Escape

I like AI. I use it every day. It helps me understand tricky things, write code, and capture ideas in new ways. And that’s exactly what makes me nervous.

Because there’s a pattern here that I’ve seen before. Not with technology, but with any substance or habit that starts out making your life better and slowly, quietly, becomes something you can’t function without. Let me be provocative for a moment: is AI becoming our new drug? Or a bad habit?

Think about it. Over the past two decades, we’ve handed over enormous parts of our lives to a handful of big tech corporations. Our communication, our documents, our photos, our social connections, our shopping habits. We traded convenience for dependency, and most of us didn’t even notice it happening. Now AI is accelerating that same pattern, but at a pace that should give us pause.

Let me give you a concrete example from my world. Software teams are now generating entire code bases with AI. And it’s impressive, honestly. You can spin up features in hours that used to take days. But here’s the thing nobody talks about: those code bases are becoming unmaintainable without AI. The code is generated so fast, in such volume, with so little human understanding of what’s actually in there, that no developer can reasonably read, debug, or evolve it on their own anymore. You need AI to understand the code that AI wrote. And you need AI to fix the code that AI broke. That’s not a tool. That’s a dependency loop.

It’s like hiring a contractor to build your house, but the contractor uses a construction method that only they understand. If they leave, or raise their prices, or change their terms, you’re stuck. You can’t maintain your own house. You can’t even understand how the walls are holding together.

And this isn’t just about code. Think about all the crafts and skills we’re replacing. Writing, illustration, translation, data analysis, customer support, legal research. Each of these is a craft that people spent years learning. When we replace the practitioners with AI, we don’t just save money. We lose the human understanding of how those things work. And once that understanding is gone, we can’t go back without starting from scratch.

This is where I think the systems thinking perspective is helpful. In systems terms, we’re creating a reinforcing feedback loop. The more we use AI, the less we invest in human skills. The less we invest in human skills, the more we depend on AI. The more we depend on AI, the more power shifts to the companies that control it. And those are, once again, the same handful of big tech corporations that already have an extraordinary grip on our digital lives.

I’m not saying we should stop using AI. That would be silly, and I’d be a hypocrite. But I think we need to be honest about what’s happening. There’s a difference between using a tool and becoming dependent on a tool. A good carpenter uses a power saw, but they also understand wood, joints, and structure. If the power saw breaks, they can still use hand tools to build something. Can we say the same about our relationship with AI?

The question I keep coming back to is this. Are we using AI to become more capable, or are we using it to avoid building capability in the first place? Because one of those paths leads to empowerment, and the other leads to a dependency that makes the last twenty years of big tech consolidation look like a warm-up act.

So yes, I’ll keep using AI. But I’ll try to use it the way I use a good tool in my workshop. To extend what I can do, not to replace what I should know. And I’d encourage you to ask yourself the same question. Because the best time to notice a dependency is before you can’t live without it.

More Is Not Better, or How AI Kills the Art of Leaving Things Out

There’s a principle that every good craftsperson knows. Whether they work with wood, with code, or with words: the magic is in what you leave out. A well-turned wooden bowl isn’t beautiful because you added more material or decoration. It’s beautiful because you removed everything that wasn’t the bowl. And you stopped adding when it was enough.

Software used to work like that. At least, that was the aspiration.

Good software craftsmanship was never about writing more code. It was about writing the right code. About structure. About asking yourself: do we actually need this? Does this feature earn its place? Is this module pulling its weight, or is it just sitting there, adding complexity for the sake of completeness? The best developers I’ve worked with weren’t the ones who produced the most lines. They were the ones who looked at a problem and said, “We can solve this with less.”

And for the user? Same story. The best interfaces weren’t the ones packed with options and toggles and settings. They were the ones where someone had the courage to say no. No, we’re not adding that button. No, that menu doesn’t need a submenu. Less was more. That wasn’t just a design slogan, it was a discipline.

Now here comes AI. And suddenly, more is easy.

I see it in myself. I have a small tool I built by hand. I crafted it carefully over time, adding things when they were needed, keeping it lean. Then I started using an AI coding agent. And now? I’m extending that tool every other hour. A new idea pops into my head, I prompt it, and boom, there’s a new feature. It’s almost frictionless. The cost of adding something has dropped to near zero.

But here’s the thing about cost. When the cost of adding is low, you stop asking whether you should add. You skip the most important question in software development. Is this actually necessary?

Let me give you an example from woodworking. When you’re turning a bowl on a lathe, every cut matters. You can’t just undo a bad decision. You remove material deliberately, with intention. If you mess up, you work with what’s left. That constraint forces you to think before you act. It forces a kind of respect for the material and the process.

AI removes that constraint from software. There’s no friction. No penalty for adding. No reason to pause and ask, “Wait, does this belong here?” And so you end up with a tool that’s overcrowded. Features piled on top of features. The original clarity buried under layers of “wouldn’t it be cool if…”

I think there’s a real danger here. Not the dramatic, existential AI danger that makes headlines. Something quieter. AI is eroding our sense of reductionism. Our ability to focus on what truly matters. Because when everything is possible, nothing forces you to choose. And choosing, deliberately leaving things out, that was always the hard part. That was the craft.

Think about it from a systems thinking perspective. Every feature you add is a new part in your system. New parts create new relationships. New relationships create new complexity. New complexity creates new ways for things to break, confuse users, or drift away from the original purpose. You’re not just adding a feature. You’re changing the system. And if you do that every other hour, without stepping back to look at the whole, you’re not crafting anymore. You’re accumulating.

So what do we do? I don’t think the answer is to stop using AI. That ship has sailed. But I think we need to bring that old discipline back, consciously. Before you prompt the next feature, ask yourself: does my tool need this, or do I just want it because it’s easy? Is this addition making the whole better, or just bigger?

The art was never in adding more. The art was in knowing when to stop.

Try stopping.

I Don’t Trust This Output Farther Than I Can Throw a Washing Machine

Over the last few months I had a weekly pair testing session with James Thomas. When I pair with James Thomas, we talk. A lot. Not about the weather or what we had for lunch, but about what we’re doing and why. “I’m clicking here because I noticed this behaves differently when…” or “I want to see that, because there is a system coming later in the flow that….” The cursor moves, and the reasoning follows. We learn from each other’s thought processes and complement each other. I see how James approaches a problem, what he notices that I missed, what mental models guide his exploration. He sees the same from me. That’s not a side effect of pairing. That’s the point.

I did something similar over a decade ago with Session-Based Test Management in my team back then. The real value was in the debrief. Sitting down with a tester after their session and asking: What did you explore? What did you find? What worried you? What did you decide not to look at, and why? The trust you build isn’t primarily in the outcome. It’s in the coverage. It’s in understanding the mental model of the person who ran the session. You can only evaluate that if you understand the thought process behind the exploration.

This is how testers built confidence. Not by looking at results alone, but by understanding how those results came to be.

And now here comes AI, generating outputs, making decisions, even “explaining” its reasoning. Every LLM response has a kind of logic to it. Ask Claude or ChatGPT why it said something, and you’ll get an answer. Coherent. Plausible. Sometimes even insightful.

But here’s the thing: I don’t trust this output farther than I can throw a washing machine.

The explanation is there. But is it the real reason? Or is it a post-hoc rationalization, a story the model tells because it’s good at telling stories? When I watch James’s cursor move and hear him think out loud, I’m getting the actual process. When I read an LLM’s explanation of its own reasoning, I’m getting… what, exactly? A reconstruction? A guess? A hallucination about its own internals?

I genuinely don’t know. And that’s the problem.

Now multiply this uncertainty. We’re not just talking about one model answering one question anymore. We’re talking about agentic systems. Architectures where one agent coordinates, another reviews, several others execute tasks, and they all pass information back and forth. A system of systems. Layers upon layers of black boxes talking to each other.

How much traceability do we have in such a setup? Where did a decision actually get made? Which agent introduced the error? Which review step missed it? When something goes wrong (and it will), where do you even start looking? How much are the artifacts worth that the bots create to communicate?

I’m working in the context of Software as Medical Device, under the EU Medical Device Regulation (MDR). Under MDR, decisions need to be traceable. Reviews need to be traceable. Validation and verification activities need to be documented in a way that lets you reconstruct what happened and why. This isn’t bureaucracy for its own sake. You need that kind of information to do a root cause analysis in case of an incident. Without traceability, you can’t learn. You can’t improve. Identifying the actual root causes becomes a guessing game, depending on luck and having the right people around the table. If you don’t have any evidence of traceability, you’re just staring at a bad outcome with no map back to its origin.

How do you do lessons learned when the magic happened inside a black box? How do you know what to adjust?

This is what’s happening right now in the wild. People experiment with LLMs. They try different prompts, different system configurations, different agent architectures. Eventually, they find something that works. Output looks good. Users seem happy. Ship it.
Then they sell it. As a product. As a service. As influencer content. As expertise.

But here’s the uncomfortable truth: they often have no clue why it works. Only that it worked, in one context, for a while. They found a combination that produced acceptable outputs, but they don’t understand the mechanism. They can’t explain which parts of the prompt matter and which are cargo cult. They can’t predict what will break it. And when it does break (or worse, when it starts failing silently), there’s no pointer to what needs adjusting to keep it working. No theory. No model. Just trial and error with extra steps.

This is alchemy, not engineering.

Back in 2019, I gave a talk at TestBash Brighton called “Testing Machine Learning Algorithms 101”. I used the hello world example of machine learning: hand-written digit recognition. And the beautiful thing about that model was that I could look inside. I could visualize what each layer of the neural network was “seeing”. I could trace how the input image got transformed, step by step, until the model arrived at its classification. The complexity was high, the math made my brain go to my happy place, absolutely. But it was graspable, even for me. I could point at a specific layer and say “this is where the model starts recognizing curves” or “this is where it distinguishes 3s from 8s”. I could test hypotheses about its behavior. I could explain it.

With LLMs? Forget it.

The scale is incomprehensible. Billions of parameters. Attention mechanisms. Context windows spanning thousands of tokens. The architecture itself resists inspection. We don’t know what prompts or contexts are added in tools, UIs, APIs, pre- and post-processors. We’ve traded graspable complexity for raw capability, and somewhere in that trade, we lost something important: the ability to understand what we’ve built.

This is where Explainable AI research becomes relevant. I was inspired to write this post when I read about the work my previous company QualityMinds is doing in this space. The post is only available in German. But I guess you know a model that can translate it for you. 😉
Their AI team is asking exactly the right question: not just what a model decides, but how it internally arrives at that decision. What happens in those hidden layers before the output appears? Can we make those internal states visible? Just like I did when learning about the simple neural network I used in my talk. Can we connect them to concepts humans can understand?

This is the testing mindset applied to AI internals. Not just checking outputs against expectations, but trying to understand the process that produced them. Building a foundation for trust that goes beyond “it seems to work”.

The research is still early. Many approaches are experimental, model-specific, hard to generalize. Explainable AI delivers insights right now, not easy answers. But the direction matters. Because if we can’t eventually explain how these systems work, we can’t really test them. We can only observe them and hope.

If you’re testing systems that incorporate LLMs, or AI agents, or any machine learning component, you need to ask yourself some hard questions. Can I trace how this decision was made? Can I reproduce the conditions that led to this output? If something goes wrong in production, will I have any idea where to start looking?

If the answer is “no” to all of these, you’re not testing. You’re hoping. And hope is not a good strategy.

The hype train doesn’t wait for understanding. New models ship weekly. New agent frameworks appear daily. The pressure to adopt, to integrate, to ship something with AI in it is immense. FOMO is real. But we should resist the urge to skip the hard questions. As testers, we’ve always been the ones asking “but how do you know?” Now we need to ask it louder. Because the machines that answer our questions can’t explain themselves any better than we can explain them.

And where are the observability folks in all of this? We spent years building cultures around logging, tracing, metrics, dashboards. We learned that you can’t operate what you can’t observe. But now we’re deploying systems where the most important decisions happen in places we fundamentally cannot see. The irony is bitter.

If you can’t explain it, you can’t trust it. Not really.

So maybe, before we get too excited about what AI can do, we should get a little more serious about understanding how it does it. The testers who figure this out first will be the ones who actually add value. The rest will just be along for the ride, hoping nothing breaks.

PS: Why do you think Peter Steinberger, the Austrian developer who invented Clawdbot, Moltbot, OpenClaw, or whatever it’s called today, was hired by OpenAI? He found a way to achieve something that the companies behind the LLMs, who should have a quadrillion tons more insight into how their own models work, were not able to achieve. For me, that’s a sign of how unpredictable LLMs still are.

When the Parts Start Talking to Each Other – Part Party – The Six Moves – Part 4

This is Part 4 of a series where I apply six systems thinking moves to the AI landscape. In Part 2 we zoomed in and found six parts inside a coding agent. In Part 3 we zoomed out and found layers both below and above. Now comes the fun part.

The fourth move from the DSRP framework is the Part Party. You’ve identified the parts. Now you draw the relationships between them. How do they influence each other? What flows from one to another? Where are the feedback loops? Where are the tensions?

This is where systems thinking really starts to show its teeth. Because in isolation, parts look manageable. A model selection here, a review mode there, a button over there. Fine. But when you connect them, when you see how they interact, patterns emerge. And some of those patterns are not pretty.

Let’s Throw a Party

Take the parts from the last two posts and put them in a room together. The developer. The prompt window. The model. The output. The review mode. The repository. The team. The delivery process. Production. And let’s see what happens when they interact.

The developer writes a prompt. The prompt goes to the model. The model generates code. The code appears in the output. So far, simple.

So the text input (prompt) is bundled with the context. That’s the files you selected, but maybe also the whole code base or at least large chunks of it. That’s additional skills you have defined and other input beyond the immediate visibility of the coding agent window.

You have selected a model. That model is coupled with a pricing factor. Is it an old, cheap model, a current one, or a brand-new, fancy, token-devouring model? This influences the price, but also the quality of the result. Is the model suitable for the given prompt? Is the code base small or complex? Do I need a reasoning model? Is a simple model good enough to search and replace?

The mode you selected means either restriction to provided context or the whole code base readable for the agent. It means the agent comes back with a plan. Or the agent makes the changes immediately. Some agents can handle editing files successfully through provided tools from the IDE. Others tend to use shell commands to do the job.

Pressing the button means, you now send a bundle of tokens through the world. The pre-processors read, adjust and enrich the prompt with input you have no idea of. It goes through settings you don’t know about, and that could be different on the next prompt. Guardrails that have been defined. A so called “temperature setting” to influence the randomness.

Output is generated. Further actions are performed. Depending on your settings you have to approve and confirm each and every one, or only a few, or none. The agents writes a lot of output in between to signal business and thinking process. I call it BS and a distraction. I’m wondering how many tokens this fake and useless output alone swallows.

In the end you get a report on what was done. Or a statement that it’s done. You are in a special review mode for the changed code. This is not using the IDE’s comparison mode. You see old and new code next to each other. Makes reviewing easier. But probably also sending feedback to the model provider, that the changes have been accepted.

When the code is accepted, with good or bad review practices, we leave the system of the coding agent. We see the output on a journey.

The code goes into the repository. Other team members now interact with it. They read it, they build on it, they debug it. But if the original developer didn’t fully understand the generated code, and the next developer assumes it was written with intent, now two people don’t understand it. The lack of understanding propagates through the team like a slow infection.

The code moves through the delivery process. Tests run. Maybe they pass. But who wrote those tests? If the coding agent wrote the tests for its own code, you have a system testing itself. The agent doesn’t know what it doesn’t know. It will write tests for the happy path it imagined, not for the edge cases it never considered. The delivery process gives you a green checkmark. And you trust it. Because green means good. Right?

And then production. Real users. Real data. Real edge cases that nobody, human or machine, anticipated. Something breaks. Or worse, something is subtly wrong and nobody notices for weeks. By the time someone traces it back, the original prompt is long forgotten. The context is gone. The agent has amnesia anyway. Debugging AI-generated code you don’t understand, in a system that has moved on since, is archaeology.

The Pattern

The Part Party reveals something that looking at parts in isolation never can. It shows you the dynamics. The flows. The accumulations. The reinforcing loops that quietly make things worse while every individual step looks reasonable.

No single part in this system is broken. The agent works. The prompt window works. The delivery pipeline works. Production is up. But the relationships between the parts create emergent behavior that none of the parts intended. More output with less understanding. More speed with less control. More trust with less verification.

That’s the nature of systems. The behavior emerges from the interactions, not from the components.

Next up: Part 5, the RDS Barbell. We’ve seen the relationships. Now we pick one, crack it open, and ask: what is this relationship actually made of?

The Wolves in Your Company, or Why Replacing People with AI Breaks Things You Can’t See

In 1995, scientists reintroduced gray wolves into Yellowstone National Park. The wolves had been absent for about 70 years, hunted to local extinction because people saw them as a problem. What happened after their return stunned everyone. The elk changed their behavior. Trees grew back along the rivers. Beavers returned. Songbirds came back. Even the rivers changed their course because stronger root systems stabilized the banks. Scientists call this a trophic cascade: remove one element, and the entire system shifts in ways nobody predicted.

Now here’s the thing. When they removed the wolves decades earlier, nobody anticipated any of this. People looked at the wolf and saw a predator that kills elk. That’s the visible part. What they didn’t see were the hundreds of indirect relationships the wolf maintained just by being there. The wolf wasn’t just killing elk. The wolf was shaping the entire ecosystem through its presence.

Does this remind you of something?

Right now, companies everywhere are racing to replace people with AI. And they’re using the same logic that got the wolves killed. They look at a person and they see a function. “This person writes reports. AI can write reports. Let’s remove this person.” It’s clean. It’s rational. It fits on a spreadsheet.

But you’re not removing a function. You’re removing a node from a living network of relationships. That person who writes reports also notices when a colleague is struggling. They ask the odd question in a meeting that changes the direction of a project. They remember that three years ago, a similar approach failed and why. They have lunch with someone from another team and accidentally solve a problem nobody formally assigned to them. None of this is in their job description. None of this shows up on your org chart.

And this is where it gets tricky. You won’t notice what you’ve broken. Not immediately. Just like in Yellowstone, where it took years to understand that the disappearing wolves were connected to eroding riverbanks, the damage in your company will show up slowly, in places you wouldn’t think to look. A team that used to be creative becomes strangely stuck. Decisions that used to be good start going sideways. Knowledge that everyone assumed was documented turns out to have lived in someone’s head and in the conversations they had over coffee.

From a systems thinking perspective, what’s happening is painfully obvious. You’re seeing the person as a distinct, isolated element. And you’re ignoring the organizational ecosystem it describes. You see the part but not how it connects to the whole. And the relationships are where the real work of an organization happens. The formal org chart is just the skeleton. The relationships are the nervous system, the blood flow, the immune system. Remove a node, and you don’t just lose that node. You lose every connection it had.

Using AI to replace people is a fundamentally different thing than using AI to support people. One preserves the ecosystem. The other rips out the wolves and hopes the elk will behave.

The scientists who killed Yellowstone’s wolves were not stupid. They just couldn’t see the system. They saw parts. And that’s exactly what’s happening in boardrooms today. Leaders see costs, outputs, and efficiency metrics. They don’t see the invisible web of trust, knowledge, and human connection that holds everything together.

Before you replace someone with AI, ask yourself: do I really understand all the relationships these people have within the system? If the answer is no, and it almost certainly is, then maybe pause. Because once you’ve broken those connections, rebuilding them won’t be as simple as reintroducing a few wolves.

It took Yellowstone decades to recover. Your company might not get that long.

Is Plumber Availability a New Signal for AI?

Here’s a sentence I never expected to write. One indicator for the state of artificial intelligence might be whether you can get a plumber to fix your kitchen sink. At least in the US. But it shows how weirdly connected systems can be.

BlackRock just invested $100 million in training plumbers, electricians, and HVAC technicians. Not because they suddenly developed a passion for copper pipes, but because the AI boom is eating through skilled trade workers at an alarming rate. Nvidia’s Jensen Huang put it bluntly: the labor needed to build AI factories is already in short supply. And Meta? They’re backing a data center in Louisiana expected to be four times the size of Manhattan’s Central Park. Four times Central Park. For servers.

I find this fascinating and deeply troubling at the same time. Not because building infrastructure is bad. Quite the opposite. Humanity has always built big things. We built railroads across continents. We built large harbors. We built highway systems connecting entire nations. We built power grids that electrified the world. And here’s the thing about all of that. When the initial excitement faded, when the technology matured, when the economics shifted, we still had the tracks. We still had the roads. We still had the power lines. That infrastructure could be maintained, repurposed, expanded. It served generations. It still does.

But what exactly will be left when these massive AI data centers are no longer needed in their current form?

This isn’t a hypothetical question. Industry experts already warn that many facilities could be functionally obsolete within seven to ten years. GPU hardware cycles are running at two to three years before the next generation makes the old one look like a pocket calculator. Companies are signing ten-year leases on buildings designed for rack densities that will be outdated before the concrete has fully cured.
Some 18 years or so ago, I have managed the testing side of a “hardware refresh”. What used to be one huge American fridge-sized server housing of a Sun Fire 15K, became a 2U server in a rack together with the rest of the whole environment in one small 19″ rack.

And then there’s the carbon footprint. Concrete can represent up to 80% of embodied carbon emissions in data center construction. Cement production alone accounts for roughly 8% of global carbon emissions. Every ton of cement produces about 1.25 tons of CO2. Building a new facility creates eight times more carbon than repurposing an existing one. So when we build something the size of a small city to house servers that might be obsolete in a few years, we’re not just making an economic bet. We’re making an irreversible environmental one.

I’m not an expert on construction or energy policy. But I can see patterns. When we built an Autobahn in Germany (historic reasoning behind it aside) or the Interstate Highway System in the US, the investment was enormous, and the payoff lasted decades, arguably forever. When we built the electrical grid, it transformed civilization permanently. These were bets on infrastructure that could evolve with the technology running on top of it. A road doesn’t care if the car running on it has a combustion engine or an electric motor. The electrical grid is probably a different beast that we will cover sometime in the near future.

But how long will these data centers, built for today’s GPU racks, endure the test of time? I remember the data center from my example above; temperature control suddenly became a huge issue. Can you adjust the building?
What are you doing with a build complex the size of four Central Parks? Look at Detroit, where some of the old car factories from the 50s are still empty.

So here’s my uncomfortable question: are we confusing activity with progress? Is pouring billions into concrete shells that might become the digital ruins of the 2040s really the smartest way to build the future? Or are we just building very expensive monuments to a moment in time?

The next time you can’t get a plumber for your bathroom renovation, remember: they might be busy building the infrastructure for an AI that will be obsolete before your new tiles need regrouting. And that should make all of us think.

Systems Don’t Exist in a Vacuum – Zooming Out – The Six Moves – Part 3

This is Part 3 of a series where I apply six systems thinking moves to the AI landscape. In Part 2 we zoomed in and discovered six parts inside a coding agent. Now we reverse the direction.

The third move from the DSRP framework is Zooming Out. Instead of asking “what are the parts?”, you ask “what is this thing a part of?” What larger system does it sit in? What is around it? What does it depend on? What depends on it?

Zoom In and Zoom Out are two sides of the same coin. Together they give you the vertical axis of understanding. Down into the details, up into the context. In Part 2 we went down. Now we go up. And with a coding agent, there’s a lot of “up” to explore. So, let’s go.

Zooming Out Through the Technical Stack

In Part 2 we looked at the parts you interact with directly. The mode selector, the model dropdown, the prompt window, the context, the output, the review mode. But behind all of that sits a technical stack that you never see. Every time you hit enter on a prompt, a chain of things happens.

Your prompt leaves the plugin and travels through an API to a server you don’t control. This is a network call over the internet. Latency, availability, and data privacy are all in play now. Your code, or parts of it, leaves your machine. Depending on the provider’s terms, it might be logged. It might pass through infrastructure in a jurisdiction you didn’t choose.

On the other side of that API sits a processing layer. Your prompt goes through access control, safety filters, rate limiting, and more. There might be system prompts that the provider added, that you didn’t write and can’t see. The provider shapes the conversation before the model even starts generating.

Then there’s the model itself. The part everyone talks about. It predicts the most probable next tokens based on your input. It doesn’t understand your code. It produces statistically plausible output. Sometimes good, sometimes wrong, but always very confident. Non-deterministic, as I wrote about in earlier posts. And you can’t reliably predict which one you’ll get.

The model runs on infrastructure. Servers in a data center. The ginormous ones they talk about in the news. Managed by the provider or a cloud partner. GPU availability, load balancing, regional routing.

And underneath all of it sits the training data. Code from GitHub, Stack Overflow, documentation, books, and who knows what else. This is where licensing questions, intellectual property concerns, and pattern biases come from. If the training data over-represents certain frameworks or languages, the model will too. If it includes buggy code, the model learned from buggy code. You inherit all of that, invisibly. And depending on your contract, your code might feed the next training round.

That’s five layers you don’t see, don’t control, and mostly can’t inspect. And yet the output of all of these layers is what you accept or reject in your review mode. If you even use it.

Zooming Out Through Your World

Now let’s zoom out in the other direction. Not down through the technical stack, but up and outward from where you sit.

The coding agent is a plugin in your IDE. That’s where you interact with it. And where the coding agent gets access to the code. The code it changes or produces doesn’t stay there. It moves.

The code goes into a repository where it lives alongside other code. Repositories often follow certain conventions, patterns, and architectural decisions. The generated code has to fit in there. And other people will read it, build on it, and depend on it.

That repository is part of a product. A solution for some problem. The coding agent has no concept of the product. It doesn’t know the purpose, the user, the constraints. It generates code. Whether that code makes sense in the context of the product is entirely your problem.

The product is built by a team. People with different roles, different knowledge, different perspectives. The coding agent is used by some of them, maybe all of them. But it doesn’t participate in the team. It doesn’t join the standup. It doesn’t hear the discussion about why we decided against that approach last sprint. The team carries context that the agent never has.

Then there is the delivery processes. Code reviews, pull requests, CI/CD pipelines, test stages, approvals. In regulated environments like mine, these processes exist for good reasons. The coding agent doesn’t know about any of them. It produces code. What happens to that code afterwards, the reviews, the checks, the sign-offs, is invisible to the tool.

And at the end of that chain sits production. Real users. Real systems. Real consequences. The code that started as a prompt in a chat window is now running somewhere, doing something, affecting someone.

That’s six steps from the coding agent to production. Six steps where context gets added, where decisions get made, where things can go wrong. And the coding agent is aware of the prompt window and the code files and context you provide.

Why Both Directions Matter

When you zoom in, you understand the tool. When you zoom out, you understand the context. You need both.

In our example, zooming out can take different routes. The system we look at is most probably part of many other systems. A coding agent is not just a plugin. It’s a node in a network of technical, organizational, economic, and regulatory systems. And every one of those systems influences what happens when you hit enter.

Next up: Part 4, Part Party. We’ve identified the parts. We’ve seen the larger systems. Now we make the parts interact. How do they relate to each other? Where are the feedback loops? Where does it get messy?

	motiv8n on Simple is not the opposite of…
	Kayla S on Reinventing Testers and Testin…
	Mirek Długosz on Web of Causation and why I don…
	Deacon W on Parallels between testing and…
	Five for Friday… on Software Testing for 20 y…