Uncategorized

Where is the improved software

Developers now use language model–based tools for software development. These tools can write, refactor, and explain code. They come with clear and obvious advantages: development is faster, boilerplate disappears, and it is easier to get from idea to working code.

If that is true, it feels natural to expect that the amount and quality of software should have gone up. The tools and apps we use should have become better and more capable. We should see new tools in areas where we did not have any before. In general, there should simply be more good software.

But do we actually experience that?

When we look at the apps and tools we use every day, many of them feel much the same as a few years ago. Some have gained “smart” features and assistants. A few things are smoother or slightly more automated. But we do not see a clear wave of fundamentally better tools, or a flood of new kinds of software in areas that were previously untouched.

Something seems to block the translation from better development tools to noticeably better products. Part of the explanation is that writing code has never been the only hard part. Understanding users, deciding what to build, designing something simple enough to use, dealing with legacy systems and regulations — none of that disappears just because code is easier to generate.

Incentives also matter. Many companies optimize for engagement, lock-in, and short‑term metrics, not for maximum quality or radically better user experience. Extra development capacity can end up in internal projects, experiments, and incremental changes, instead of visible improvements to the tools we rely on.

This does not mean language models make no difference. For developers, a lot has changed: faster prototyping, easier learning, less time spent on repetitive tasks. There are also behind‑the‑scenes improvements in infrastructure and internal tools that users rarely see directly.

If we really used this new capability to its full potential, we might expect more obvious results: simpler and more reliable everyday apps, more tools for niche workflows and small professions, more software adapted to specific languages, regulations, and local needs.

The question “Where is the improved software?” is therefore less about the tools themselves and more about what we choose to build with them. The technical ability to create more good software is increasingly there. Whether we actually get that software depends on priorities, incentives, and the willingness to aim for genuinely better tools, not just more code.

Default Do Nothing Decision Pattern

When an agent needs to make a decision, it considers the possible actions against its goal. Each option is evaluated: does this move us toward the goal, and is it good enough compared to some minimum requirement or threshold?

The Default Do Nothing (DDN) pattern says that if none of the available decisions can be judged as good enough, the agent should default to doing nothing. Instead of forcing a low-quality or risky choice, the agent deliberately chooses not to act.

Doing nothing is treated as a real, explicit decision. The agent has evaluated its options and concluded: “None of these actions meet the required standard. Therefore, I will not act right now.” This is not a failure or an error, but a strategy.

When DDN is triggered, the agent waits to see if conditions change. That might mean more information becomes available, the situation changes, or new options appear. The idea is that by postponing action, a better decision may become possible later.

If conditions do not change, the agent effectively commits to having done nothing. In that case it is betting that doing nothing was still the best available choice compared to the poor alternatives. In many situations, especially where the risk of a bad decision is high, inaction can be better than a wrong or unsafe action.

The core of the DDN pattern is simple:
when no option can be evaluated as good enough relative to the goal, default to doing nothing, wait for possible changes in the situation, and accept that sometimes the least harmful and most rational decision is not to act at all.

Process Over Result

Many people struggle to change a process, even when they can see it will not lead to a good outcome. Instead of adjusting, they hold tightly to the process details and argue that sticking to the process is more important than the result. That raises a simple question: is a process actually good if it leads to bad results?

One reason we cling to processes is comfort. A process tells us what to do next. If something goes wrong, we can say, “At least I followed the steps.” In some organizations, this is even rewarded. You can fail as long as you did things “by the book,” and you can be criticized for breaking the process even if the outcome was better. Over time, this teaches people to defend the process instead of improving the result.

Ego and identity also play a role. The process might be “our way of doing things” or something we designed ourselves. Admitting that it doesn’t work can feel like admitting we were wrong. To avoid that, we double down: we argue that following the process is what really matters, even if the outcome is clearly poor.

This has real costs. Teams waste time and resources continuing with a process that everyone quietly knows is not working. Opportunities are missed because better ideas don’t fit the existing way of doing things. “We followed the process” becomes an excuse, not a learning moment. When the process becomes a shield, reflection and improvement stop.

If we are honest, a good process is not one that is detailed or strict. A good process is one that increases the chances of getting the results we actually care about. If a process repeatedly leads to bad outcomes, it is not a good process, no matter how official, familiar, or well-documented it is.

There are some warning signs that a process is failing. Outcomes are consistently worse than expected. People complain informally that “this is pointless, but we have to do it.” Most discussions focus on whether the steps were followed, instead of whether they made sense or helped. In moments like this, it helps to ask simple questions: what result are we trying to achieve, and is this still the best way to get there, given what we know now?

Shifting from process-first to outcome-first does not mean chaos. It means treating process as a tool, not a belief system. You can define in advance when you will review and possibly change your approach: if a certain result has not happened by a certain time, you revisit the process. You can make it normal for someone on the team to challenge the way you work and suggest changes. And you can reward people not just for following the process, but for improving it when reality shows it is not working.

In the end, the core idea is simple: a process that leads to bad results is not a good process. The goal is not to worship the process, but to reach meaningful outcomes. Next time you are tempted to defend a poor result with “but we followed the process,” stop and ask: what needs to change in our process so we don’t end up here again?

LLM as generator or as a knowledge system

Many people use a language model as a tool that “writes things for you”: emails, code, summaries, blog posts. In that way of working, the model is a generator. You ask it to produce specific content for direct use, and the output is the product.

Used as a generator, a language model creates text, code, or summaries that you can use more or less as they are. You might ask it to draft an email, write a short article, generate a function in Python, or summarize a long document into a few bullet points. The goal is fast production of specific content, where you mainly edit and polish what the model gives you.

There is another way to use these tools: as a knowledge system. Here, you are not asking the model to deliver the final product. Instead, you are using it to find information, explore possibilities, learn, and think. The focus is on working with knowledge rather than generating finished text.

As a knowledge system, a language model can help you find relevant information on a topic, highlight options you might not have considered, and point you to potential opportunities. You can use it to learn something new by asking for explanations, examples, and comparisons, and by letting it guide you step by step through complex material. You can also use it to work with and improve what you are already doing: ask for feedback on your draft, suggestions for better structure, or ways to clarify your arguments or design.

In this knowledge mode, the final product is not generated by the model. You stay the author and decision-maker. The model supports your thinking and helps you refine your work, but it does not replace it. The main value is in discovery, understanding, and improvement, not in ready-made output.

Both ways of using a language model are useful. As a generator, it helps you produce concrete content quickly. As a knowledge system, it helps you find information, see new possibilities, learn, work with knowledge, and improve what you are already doing—while keeping the actual product in your own hands.

Original Error Analysis

When something goes wrong, we usually fix what is broken right now and move on. The bug is patched, the incident is closed, the meeting is rescheduled. But many errors are caused by other, earlier errors. What we see is often just a symptom.

Behind the visible problem there is often a root cause—the original error. This is the first actionable mistake in the chain of events. It might be a decision, a missing check, a wrong assumption, or something we forgot to do. If we find and fix that original error, we often solve many problems at once and prevent new ones from appearing.

In practice, we often don’t do this. Time pressure, habits, and routines push us to focus on the immediate problem. We close the ticket and move on to the next task. As a result, the same types of errors keep coming back in slightly different forms, and we end up firefighting instead of improving.

The original error is also where we can learn the most and get better. It is the point in the chain where we can ask: what should we have done differently? Which assumption was wrong? What was missing in our way of working? This gives a real basis for change and improvement, instead of vague ideas like “we should be more careful”.

Original error analysis is a simple method for doing this. You start by describing the visible error clearly: what happened, when, and who or what was affected. Then you repeatedly ask “what caused this?” until you reach the earliest point where a realistic action could have prevented the problem. That point is a good candidate for the original error.

It is important to check that this is actually an original error, and not just another symptom. A useful test is: if we fix this, will it significantly reduce the chance of similar errors in the future? If the answer is yes, you have probably gone deep enough. You do not need to go all the way to abstract causes you cannot change.

Once you have identified the original error, you can decide what to change. That might be a small adjustment to a process, clearer communication, a checklist, a review step, or some simple automation. The goal is to change the conditions that allowed the original error to happen, so that you avoid a whole class of future errors.

This way of thinking is also useful for “wicked problems” – complex, messy problems that are hard to define and hard to solve. You may not find a single clean root cause, but original error analysis can still help you see earlier decisions and structures that shaped the situation, and where you still have room to act.

Original error analysis works best when it becomes a habit. After something goes wrong, take a few minutes to ask: what was the first mistake in this chain? What can we change so this does not happen again? Over time, this shifts focus from firefighting to learning and improvement, and helps prevent many errors from occurring in the first place.

Don’t Play Winner Takes It All Games

Some initiatives, projects, solutions, businesses, products, and markets are by their nature built so that only one or a few can win. The upside can be huge: money, status, visibility. But the downside and the risk are just as large. For most people who join, the most likely outcome is to lose almost everything they put in: time, energy, and sometimes money.

If you want to succeed in a way where you are almost certain to get something back for your effort, you need to think about what kind of “game” you are entering. If you want to be able to live with not being the best, but simply being good enough, then you need to avoid games that only reward the single winner.

That means choosing arenas where more than one person, company, or product can do well at the same time. Places where competence, reliability, and steady work are enough to give you a decent result, even if you never become number one.

The good news is that most activities in work and business are like this. They are not glamorous. They are often normal, common, even a bit boring. But they are also safer. You don’t have to crush everyone else to have a good outcome. You can do solid work, be good enough, and still be rewarded.

There is nothing wrong with aiming high or trying something ambitious. But doing it blindly in a winner-takes-all setting means you are accepting a very high chance of “all or nothing.” Before you commit to a path, ask yourself whether you are entering a game where only a few can win, and whether that is actually what you want.

Most of the time, especially if you care about stability and a livable life, it makes more sense to choose the steady, common, “boring” games. They may not look as exciting from the outside, but they let you build something that lasts, without needing to win it all.

When to build an agent

“Let’s build an agent for that.”

That’s a common reaction to almost any problem right now. If there’s a repetitive task, someone suggests wrapping a language model in an “agent” and letting it handle everything end to end.

But an agent is only one option. You can build an agent to automate a task. You can build a normal program. Or you can simply add the functionality to the tools people already use.

The real question is: when should you build an agent, when should you build a regular program, and when should you just extend the existing tool?

Imagine a simple, old-fashioned proofreading workflow. You sit and write a document in a basic text editor. There is no built-in spellcheck, no grammar help, no automatic feedback. When the document is ready to go to print, you print it out on paper. You send this printout to a colleague for proofreading. The colleague reads the document, underlines spelling mistakes, marks typos and other errors, and writes corrections in the margins. Then you get the paper back and sit down at your computer again. You go through the marked-up pages and manually correct the original document in the text editor.

This is a separate, clearly defined step: first you write, then you print, then someone else proofreads, then you correct.

A naive “agent” approach would be to replace the colleague with a language model agent, but keep the rest of the workflow more or less the same. You still write in the same simple text editor. When you are done, you print the document or export it to a file. You send this to a system that scans or reads the document. The agent runs through the text, finds spelling mistakes and other errors, and produces a new, corrected version, maybe with underlines and suggested fixes. You then print or download this corrected version and use it to update your original document.

On paper, this sounds modern: the human colleague is replaced with an automated agent. In reality, it keeps a lot of unnecessary steps. You still print or export. You still send the document somewhere else. You still jump between different tools and representations of the same text. You have built a separate system around the work instead of improving the place where the work actually happens: the text editor.

The smarter solution is the one the world discovered quite quickly for text processing: build spellcheck into the editor itself. Instead of a separate proofreading step, the editor underlines spelling mistakes as you type. You get suggestions directly in the tool where you write. You fix errors immediately, without printing, scanning, or sending anything to a colleague or an agent. Spellcheck is not its own system; it is part of the work tool.

This is the core idea: not everything that can be automated needs an agent. Often, the right move is to bring the functionality into the tool where the user already works, instead of wrapping the old workflow in something that looks clever from the outside.

You might choose an agent when the task really does span multiple tools and systems over time, and there is no obvious main place to put the functionality. You might choose a normal program when it is a separate job with clear input and output. But when the task is tightly connected to what the user is already doing in one tool—like proofreading while writing—the natural place for that functionality is inside that tool, as part of the workflow, not as a separate agent.

How to Choose the Best Decision Agent

Imagine you have several decision-making agents, and you want to find out which one is the best. A simple idea is to test them all on the same task and keep the one that performs best. For example, you could ask each agent to predict a series of coin flips: heads or tails.

You give all agents the same number of coin tosses under the same conditions. Then you measure how many they get right, or you look at who gets the longest streak of correct predictions in a row. The agent that makes the fewest mistakes, or has the longest streak, might look like the best decision-maker.

It is tempting to conclude that this agent is better at making decisions, because it “proves” itself on the test by getting more predictions right. From this point of view, the agent with the most correct answers, or the longest run of correct predictions, is simply the best.

But there is a crucial question: is this agent really better at making decisions—or did it just have the most luck?

If all agents are effectively guessing on a fair coin, then someone will, just by chance, get a long streak of correct answers. If you test many agents, one of them will almost always stand out with an impressive result, even if none of them has any real skill. In that case, you have not found the best decision-maker; you have found the luckiest one.

This matters in practice. If you run a single test, pick the apparent winner, and trust it as “the best”, you may be basing your decision on randomness. You might over-trust an agent that got lucky in one experiment and ignore others that would do better over time.

To choose a genuinely good decision-making agent, you need more than one short test. You should look for performance that is consistent across repeated trials and different tasks, not just one nice streak. You should also compare against simple baselines, like random guessing or basic rules, to see whether the agent is actually doing better than chance.

The simple coin-flip example shows the core idea: testing agents on the same task and picking the one with the best streak does not automatically mean you found the best decision-maker. It might just be the one that had the most luck.

Is Everything Just a .md File with a Prompt

When you start working with language model–based systems, a simple pattern often appears: almost everything seems to be a .md (Markdown) file with some text instructions. The agent is a .md file, skills are .md files, instructions are .md files, and policies are .md files. That quickly raises the question: is everything in this universe just a .md file with a prompt?

There are practical reasons for using Markdown. It is human-readable, easy to version-control, and easy to review in tools like Git. Putting the “prompt” in a file separates behavior from code, which means you can adjust how an agent behaves without redeploying anything. It also lets non-developers participate by editing the .md files directly.

An agent, in this setup, is usually defined as a .md file that describes who the agent is and what it is supposed to do. You can think of it as a role description for the language model. The file typically contains the agent’s identity (“You are a customer support assistant”), its responsibilities (“Help users solve product issues”), its tone (“Be clear and calm”), and its boundaries (“Do not give legal or medical advice”).

A skill is also a .md file with a prompt, but it represents a specific capability instead of a role. A skill describes what the system can do in a reusable way. For example, a “summarize ticket” skill file might explain that when given a long support ticket, the model should return a short bullet-point summary. The file usually defines the purpose of the skill, when it should be used, what the input looks like, and what format the output must have.

An instruction is again a .md file with text, but with a narrower focus. It describes how responses should look in a particular context. Instructions might set formatting rules (Markdown, JSON, plain text), length limits, language and tone preferences, or other local constraints. For example, an instruction file might say “keep answers under 200 words, write in English, and respond in Markdown.”

A policy is a .md file that defines rules and constraints the system must always follow. It covers safety, compliance, and domain-specific restrictions. A policy file might say “do not output personal data,” “do not provide medical or financial advice,” or “refuse to help with illegal activities.” Policies typically override anything else: if an agent or skill would violate a policy, the policy should win.

So from one angle, yes: agents, skills, instructions, and policies can all be “just” .md files with prompts in them. But they are not the same thing. In practice, it helps to treat them as different concepts: agents as roles, skills as capabilities, instructions as local guidelines, and policies as global rules.

A simple project structure might mirror this way of thinking, for example:

agents/support-agent.md
skills/summarize-ticket.md
instructions/chat-formatting.md
policies/safety.md

In a typical interaction, the system might load the support agent, apply the summarize skill for long tickets, format the answer according to the chat instructions, and enforce the safety policy. All of that behavior is driven by separate .md files, each with a clear purpose.

There are limits to this .md-file view. Tools, APIs, code execution, state, and workflows usually live outside Markdown. Still, as a mental model, “a universe of .md files with prompts” is useful, as long as you remember that not everything is literally just a file. The important part is not Markdown itself, but the structure: clear definitions of agents, skills, instructions, and policies that you can read, review, and evolve over time.

Agents and Priorities

When we build systems of agents based on language models, we often start with a simple idea: split a big problem into smaller parts, give each agent a clear task, and let them work together. But each agent will still optimize for something, and what it optimizes for matters a lot. If goals are set wrong or unbalanced, the overall system will miss its main purpose, even if each agent “does its job.”

Agents have different objectives and make choices and evaluations based on those. Some agents have very local goals that are limited to their own specific task. They focus on short-term, concrete outcomes like “summarize this document,” “classify this ticket,” or “extract these fields.” Other agents have broader goals and consider a larger whole, not just a single subtask. They might care about things like “improve user satisfaction” or “help the user solve their problem effectively.”

For a system of agents to function and reach an overarching goal, you need a combination of these local, constrained goals and more composite, system-level goals. The local goals give clarity and focus, while the global goals ensure that the system is moving in the right direction as a whole. Together, they should form the basis for the evaluations and decisions the agents make.

If you only have agents with narrow, local, short-term goals, the system easily becomes unbalanced. Those agents will tend to reach their local goals: tasks will be completed, short-term metrics will look good, and each agent can claim success. But the overall outcome is often worse. You can end up with fast but unhelpful answers, lots of extracted data that is not actually useful, or content that is technically correct but misses what the user really needs. The main goal of the system is not reached, even though each agent hits its own target.

The opposite imbalance also creates problems. If you only prioritize global, overarching goals like “maximize user value” or “ensure project success,” agents may make poor evaluations and decisions in practice. Global goals are often abstract and not clearly connected to local, short-term realities. When an agent has only a broad mission, it may not know how to act in a specific situation: Should it be brief or detailed? Strict or flexible? Conservative or creative? Different agents might interpret the same global goal in different ways, and decisions become inconsistent and hard to control.

The key is to connect local and global goals explicitly. Each agent should have a clear local objective that defines its own task: what it is responsible for, when its task is “done,” and under what constraints. At the same time, that local goal should be designed so it supports the overarching purpose of the system, and is limited by constraints that come from the global goal.

For example, in a support system, a triage agent might have the local goal “classify and route tickets accurately,” while a response agent has “provide a clear, actionable answer.” Both of these local goals should be grounded in a higher-level goal such as “resolve user issues effectively without unnecessary delay.” That global goal can add constraints: routing should prioritize correctness over speed when in doubt, and responses should prioritize resolving the issue over being as short as possible.

If the system becomes unbalanced, you get predictable patterns. With too much focus on local, short-term goals, every part looks fine, but the overall result is poor: the system reaches local targets but fails the main purpose. With too much focus on global, abstract goals, decisions become vague and ungrounded: agents struggle to translate the overarching aim into good local decisions, and the connection between what they do now and what the system should achieve later is unclear.

Designing a good agent system means thinking about both levels at the same time. You define the overarching goal of the system, and then design local goals for each agent that clearly contribute to this goal and are consistent with it. This balance between local and global goals is what allows many agents with different responsibilities to work together and move the whole system toward its intended outcome.