Uncategorized

LLM Tools as Expert Systems

Most tools built on large language models today are generic. They are more like frameworks than finished solutions. They are designed to be flexible, configurable, and reusable across many different domains.

Because of this, they can be set up to generate text that looks like expertise in almost any area. With the right instructions, a model can be told to act like a lawyer, a doctor, an engineer, or a consultant. On the surface, it will often look and sound like a real expert system.

You can instruct a language model to present itself as an expert in a specific field, and it will produce answers that non-specialists are likely to believe. The output uses the right language, has the right structure, and appears confident. For many people, this is enough to trust it.

This illusion is very attractive for companies that develop these tools. By keeping the core product generic, they can sell the same tool to many different customers in many different industries. With some light configuration or domain-specific prompts, it can be marketed as a solution for almost any field.

The vendors also avoid the heavy work of really understanding each professional domain and making sure the tool is actually good enough for that domain. They do not need to take responsibility for the difficult part: guaranteeing quality and reliability for specific tasks. It is easy to leave that burden to the users.

Customers, however, often want something different. Many of them do not want to develop their own solution on top of a tool. They have a concrete problem they want solved. They want a solution, not a flexible framework they have to shape into a solution themselves.

So the shortcut to building a real solution is not to make a language model appear as if it can solve the task. Making the tool look and sound like an expert system does not turn it into a proper, domain-ready system. It only changes the surface. The harder, more important work remains: understanding the domain, defining what “good enough” means, and building something that actually solves a specific problem in a reliable way.

Why Productivity Gains from Language Models Are Hard to See

Most tools based on language models today are about individual productivity. People use them to find information, improve text, correct errors, draft suggestions, write code, and so on. These things absolutely create efficiency gains for the individual.

But they do not automatically create efficiency gains for the whole organization or the overall system.

Even if someone gets their job done faster by using a tool like ChatGPT, that doesn’t necessarily mean that person gets more done in total. The gain can be taken out in several ways. One possibility is that the person simply spends more time on other things, maybe tasks that are not very important, or even things that have nothing to do with work at all.

Another possibility is that the time is spent on over-improving the result. Because it’s easy to generate and refine text or code, you can keep polishing endlessly. You can spend just as much time as before, but now on fine-tuning details that don’t really matter for the outcome. You can polish forever, long after the point where there is any real benefit.

So when you try to measure the effect of language-model tools in a company, it can be hard to find it. The gains are real at the individual level, but they can quietly disappear. Time savings are not necessarily used to increase output, shorten lead times, or improve something that shows up in the company’s metrics. They are often absorbed into other activities or into unnecessary extra quality.

This is why leaders often hear that people like the tools and feel more productive, while the organization as a whole does not see a clear productivity boost. The system is the same as before. The work processes, bottlenecks, and priorities are unchanged. Only the individual has a new tool.

Without changing how work is organized and without being explicit about what should happen with the time saved, the effect of language models will mostly stay hidden at the system level, even if many employees experience that their personal work has become easier and faster.

Accepting Delay and Inaccuracy

When we run a process or a traditional program, we expect certain things. We expect a fast answer. We expect an accurate answer. And we expect that the same input will always give the same result. If a calculator takes several seconds to answer 2 + 2, or sometimes says 5, we don’t think “close enough” – we think it’s broken.

This is how we normally relate to software: as tools that should be deterministic and reliable. A banking app should always show the correct balance. A ticket booking system should clearly confirm or reject your order. Speed, precision, and consistency are the baseline expectations.

Something interesting happens when we start giving programs more human-like qualities and call them “agents”, often powered by a language model. We stop thinking of them only as tools and begin to relate to them more like we relate to people. We “ask” them for help. We say they “didn’t understand” or that they “misinterpreted” something.

With that small shift in language and framing, our expectations change. It suddenly feels more acceptable that the agent takes a bit longer to respond, as if it is “thinking”. We tolerate that it might be less precise, giving approximate or partial answers. And we accept that it does not always give exactly the same result every time, even for the same question.

In other words, when we see something as a traditional program, we expect fast, accurate, and consistent answers. When we see it as an agent built on a language model, we open up for slower, more imprecise, and less consistent behavior. The core functionality might not have changed that much, but the way we talk about it and think about it makes delay and inaccuracy easier to accept.

Three Agent Types – Builder, Maintainer, Discarder

In a living system, agents can be divided into three main types: the Builder, the Maintainer, and the Discarder.

The Builder is the one who builds, makes, and creates. The Builder values what is new and different the most. This type is drawn to change, novelty, and the creation of something that was not there before. In practice, the Builder shows up as the person who starts new projects, suggests new ideas, or creates new structures. The Builder’s focus is on bringing something new into the system.

The Maintainer is the one who maintains, preserves, and takes care of what already exists. The Maintainer values what is already there the most. This type is drawn to stability, continuity, and reliability. In practice, the Maintainer is the person who keeps things running, looks after existing processes, and makes sure systems do not fall apart. The Maintainer’s focus is on protecting and supporting what has already been built.

The Discarder is the one who throws away, removes, and scraps what is not needed. The Discarder values what is no longer needed being taken out of the system. This type is drawn to cleaning up, simplifying, and making space. In practice, the Discarder is the person who says, “We don’t need this anymore,” who shuts down old projects, and who removes things that no longer have value. The Discarder’s focus is on clearing out what the system can do without.

All three agent types are necessary in a living system. Without Builders, nothing new appears. Without Maintainers, what exists falls apart. Without Discarders, the system fills up with things that are no longer needed. Understanding these three roles can help us see our own tendencies more clearly and notice which type might be missing or undervalued in the systems we are part of.

When you spend a lot of compute on a language model but skip encryption

People often think of encryption as something expensive that should be used only when absolutely necessary. The assumption is that encryption burns a lot of CPU, adds overhead, and risks increasing latency. Because of that, teams sometimes choose to skip extra encryption, even when it would be the smart thing to do.

Public/private key cryptography does use more CPU than sending data in the clear, or than “just” sending everything over HTTPS without any extra layer. Symmetric encryption also has a cost, even if it is usually small on modern hardware. So yes, encrypting content is not free.

But when you compare that cost to what you are already spending to run a language model, the picture changes completely. A typical request that sends data to a model involves tokenization, network transfer, and, most importantly, heavy inference compute on GPUs or other accelerators. That inference step dominates the resource usage by a huge margin.

If a request/response workflow is mostly about sending data to and from a language model, is it really worth dropping encryption to save some CPU cycles? You are already paying for massive amounts of compute to run the model. In that context, the overhead of encrypting a few kilobytes or even megabytes of text is a rounding error.

This is where the usual reasoning breaks down. People worry about “extra CPU” for encryption, and as a result avoid using public/private key technologies or additional encryption layers, even for sensitive prompts and responses. But if the data is important enough to send to a powerful model, it is usually important enough to protect properly on the way there and back.

A more realistic way to think about it is: if you can afford the compute cost of the model, you can almost certainly afford the CPU cost of encrypting the content around it. The trade-off is clear: a tiny increase in CPU usage versus a potentially large improvement in privacy and security.

So when your system is already spending a crazy amount of compute on running a language model, skipping encryption because of CPU considerations is rarely a good argument. Instead of asking “Is encryption too expensive here?”, the better question is: given what we already spend on model compute, is it really worth not encrypting this data?

Is DRY Always a Good Idea?

In programming, we often become obsessed with reuse and with only having one of everything. The ideal is: define it once, reuse it everywhere, and then you only have to change it in one place. But this can easily go too far, to a point where it’s not even smart from a software perspective. Everything ends up hanging together with everything else, and you get a spaghetti of dependencies. In that kind of system, the idea that “you only need to change it in one place” doesn’t really have value anymore, because that one place is connected to so many things that any change becomes risky and complex. So DRY is not generally a good idea in all situations; it always needs judgment and context.

If you look at other areas, like communication, documentation, and getting information across, “repeat yourself” is often a good idea. Repetition helps convey the message. The recipient doesn’t catch everything the first time. Restating key points underlines what is important and makes it more likely that people will remember it. In writing and teaching, never repeating yourself often makes things harder to understand, not easier.

Coming back to program code, things have changed here as well. With language models and other tools, we end up reading more code than we write. That makes the communication aspect of code more important. Code is not just instructions for machines; it is also communication for humans and for agents that need to understand the code. In that light, a bit of repetition or duplication can be useful if it makes the code easier to read and reason about in isolation. Being explicit in several places can be better than hiding everything behind one shared abstraction that connects unrelated parts of the system.

This is why DRY quickly ends up being a principle with limited value on its own. It is one trade-off among many, not a goal in itself. Other aspects are often more important: clarity, maintainability, loose coupling, and how easy it is to understand code when you read it later. Sometimes the right choice is to repeat yourself a little in the code, so that both people and language models can understand what is going on without having to untangle a web of DRY abstractions.

Software is expensive. But the real question is: expensive where?

When people think about the cost of software, they usually picture developers writing code. That’s where the action is: new features, pull requests, tests. But if you look at the total cost of a software system or app over its lifetime, the picture changes. Coding is just one part of a much larger whole – and often not the biggest one.

The total cost is split into different areas. There is the direct build cost: developing the software, writing code, and testing. Then there is deployment or shipping: getting the system onto servers or delivering it to end users as apps. After that come the operational costs: hardware, cloud infrastructure, monitoring, backups, and everything needed to keep it running. Over time, there are also continuous changes: new features, bug fixes, regulatory updates, and adaptations as requirements evolve. Around all of this sits the cost of the organization itself: people, coordination, support functions, management, and the processes to keep everything moving. The list just continues.

If you look at the total cost over time, the pure programming part is actually quite small. The rest – deployment, operations, change, and organizational overhead – quietly dominates. Language models are very good at automating all or parts of the programming work, and many teams already use them for that: generating code, writing tests, refactoring, or explaining tricky parts of the codebase. That is useful, but it only touches a small piece of the total cost.

Deployment and shipping is one area where there is a lot of manual work that could be reduced. Setting up pipelines, handling configuration, managing environments, preparing releases, and communicating changes all take time. A language model can help generate and update deployment scripts, explain existing setups, and create clearer release notes and runbooks based on commits and tickets.

Operations and infrastructure is another big cost center. Running servers and cloud resources, handling incidents, looking at logs and metrics, doing routine maintenance – all of this adds up. Here, a language model can help by turning scattered technical data into understandable summaries, suggesting possible root causes, and drafting or updating operational documentation.

Change over time is often where costs really grow. Every system accumulates history and complexity. People leave, documentation goes out of date, and nobody fully remembers why things were done in a certain way. A language model can help developers understand existing code, answer “where is this implemented?” questions, generate overviews from code and configuration, and support safer refactoring and impact analysis. Making it easier and safer to change a system can be more valuable than simply speeding up initial coding.

Then there is everything around the actual building and running of the system: the organization. Product management, security, legal, finance, HR, customer support, and internal support all contribute to the total cost. A lot of this work is about communication and information: writing and reading documents, reporting status, answering questions, and coordinating between roles. Language models can help by summarizing long threads and reports, drafting documentation and FAQs, assisting support staff with suggested replies, and helping people find relevant information faster.

If you only think of a language model as a “coding assistant”, you automatically limit its impact to a small slice of the cost. A better approach is to first ask: where do we actually spend time and money across the whole lifecycle of our systems? The biggest opportunities are often in repetitive processes, communication-heavy workflows, and areas where knowledge is locked in the heads of a few people.

Programming is just one part of the total cost of a software system. Over its lifetime, deployment, operations, change, and organizational work take up a much larger share. Language models are excellent for helping with code, but they might be even more valuable when used across these other areas that represent a bigger part of the real cost.

Software’s Unique Power of Easy Replication

When organisations talk about software systems, cloud platforms and vendors, the conversation often sounds like we’re dealing with unique, heavy, physical things that are hard to build more of and even harder to move. People say “we’re a [vendor] shop” or “we’re locked in” as if the system were a nuclear power plant or a railway line: a one-off construction that can’t realistically be copied.

But software is not like that. Software has a unique property that most other products do not have: it is easy to create copies. Once a system exists, making another instance is basically a matter of copying bits. You can run the same software in more than one place, and you can move it from one environment to another.

Even the hardware that runs software, while physical, is very different from big physical constructions like nuclear power plants, oil platforms or train lines. Modern hardware and infrastructure are built on well-established standards for servers, storage, networks and so on. This makes it far easier to replicate than large, bespoke physical constructions.

From a technical perspective, this means there is no fundamental reason you must be dependent on someone else’s system. In principle, you could run your own copy, or an equivalent system, if you wanted to. Technically, the system is not unique and immovable.

So where does the dependency come from? In practice it comes from other factors: licenses and ownership, people and competence, and the surrounding organisation. Licenses and contracts can restrict your right to copy, modify or run the software yourself. Intellectual property rules can mean you are only renting access, not owning what you use.

On top of that, there is the human side. Operating complex systems requires personnel, skills and experience. Many organisations do not have the competence to run certain systems themselves, or they do not prioritise building that competence. This makes them more dependent on vendors, not because the software cannot be replicated, but because the people and organisation around it are not prepared to do so.

Organisational structures and processes also play a big role. Workflows are built around specific tools. Risk aversion, “we’ve always used this provider”, and lack of incentives to change all contribute to staying with a given vendor. None of this is about what software can or cannot do technically; it is about how humans and organisations choose to structure things.

Because software and much of the hardware are relatively easy to replicate, it should be entirely possible to achieve real digital sovereignty and ownership. That means owning your own data, having control over your own digital services, and not being completely dependent on a single external provider for critical functions.

Achieving this is mainly about addressing the legal, organisational and competence barriers. Technically, the path is open: systems can be copied, moved and reimplemented. If we stop thinking about software as a unique physical object and start seeing it as something that can be replicated, it becomes much easier to imagine and design for digital sovereignty and true ownership of the digital services we depend on.

Which task to automate

When we think about automation, we often start with the wrong thing: the task that is closest to us. We look at what we do every day and ask: “How can I automate this?” We don’t always ask whether the task actually matters for the bigger goal of the process or the organisation. That means we risk spending time and effort automating work that does not really need to be done in the first place.

Imagine this chain of work. We decide to automate the programming of a program. That program is supposed to automate the sending of an email. That email is meant to remind someone that there is something they need to do. The person who receives the reminder then has a task: they go through a list of possible problems that have occurred in the last month and pick out which ones it is possible to do something about. This list is collected and reported from across the entire organisation.

If we stop here, it looks like the obvious thing to automate is the programming of the program that sends the email. It is close to us, it is visible, and it feels concrete. But that is only one step in a longer chain. Before we decide to automate, we should trace the chain backwards and ask why each step exists.

Why do we need the email reminder? Because otherwise the person might forget to review the list. Why do we need the monthly review of the list? To decide which problems we should act on. Why do we collect a list of possible problems from across the organisation? To have an overview of issues and opportunities for improvement. Why do we need that overview? To improve how the organisation works and support its overall goals.

When we walk the chain backwards like this, we might discover that some links are weak. Maybe the list is reviewed, but almost nothing is followed up. Maybe the same problems appear every month without any action. Maybe the review is something “we have always done”, but it does not actually lead to decisions that change anything important. In that case, automating the programming of the reminder system does not create much value. We are just making it cheaper and faster to do something that might not need to be done at all.

A more useful approach is to start from the goal instead of the task. What are we actually trying to achieve? Better service for customers, fewer incidents, lower cost, less risk? From there, we can ask which decisions support that goal, which information is needed for those decisions, and which tasks are needed to produce that information. Only then does it make sense to ask what should be automated.

Before automating any task, it can help to ask a few simple questions. What concrete goal does this task support? What would happen if we stopped doing it for a month? Is there a simpler way to reach the same outcome? If the task does not clearly connect to a real goal, or nothing would break if we paused it, maybe it should be changed or removed instead of automated.

Sometimes, when you follow the chain all the way back to the organisation’s purpose, you find that a whole series of tasks — collecting lists, reporting them, sending reminders, reviewing them — is not really needed. Then the best “automation” is to avoid building anything at all.

Quality of Knowledge and Information

There are different levels of quality in knowledge and information. Not everything we “know” is equally reliable. Some things are well-checked and stable, other things are based on quick impressions, misunderstandings, or old data. On top of that, different people can understand the same information in different ways.

One simple way to think about quality is to look at the degree of confirmed correctness. In practice, that means asking: how sure are we that this is actually true? Is it something someone just said once, or something that has been checked and confirmed? Everyone has experienced this in communication: what one person meant, what another person said, and what a third person understood are not always the same. Messages can easily be misunderstood or distorted when they are communicated and interpreted.

Because of this, it is important to have an explicit relationship to the quality of the information and knowledge we work with. Instead of treating everything as simply “true” or “false”, we can ask how certain we are, what this certainty is based on, and how likely it is that something has been misunderstood or misinterpreted.

This can be supported with practical tools and habits. For example, using gradings or levels of certainty (“uncertain”, “partly confirmed”, “confirmed”), doing simple checks and controls, and asking clarifying questions about source and context. We can also look at how new or old the information is, and mark recency or freshness (“last updated”, “based on data from…”), because information can lose relevance over time.

It can also help to ask for confirmations when something is important: who has checked this, and how? Has more than one person or source confirmed it? At the same time, we can consider how important the information is: some things can be approximate without causing problems, while other things really need to be correct because they are critical for decisions, safety, or legal reasons.

In everyday work, a small shift can make a big difference: instead of only asking “Is this correct?”, also ask “How sure are we that this is correct, and how do we know?”. By being more conscious of the quality of knowledge and information, we reduce misunderstandings, improve decisions, and communicate more clearly.