Uncategorized

How Systems Develop Over Time

Systems develop gradually, step by step, bit by bit. They change through small improvements, small adaptations, and also small deteriorations. A process is adjusted slightly. A tool gets a small fix. A routine is modified to handle an exception “just this once”. None of these feel dramatic on their own, but over time they add up and can make a system very different from how it started.

This gradual evolution doesn’t happen in just one place. It happens in many different systems at the same time. Inside a company, several processes, tools, and ways of working are all changing in parallel. Across a market or industry, multiple products and solutions are also evolving at the same time. Even among similar systems trying to solve the same problem, each one develops slightly differently as people make different choices and apply different small adjustments.

These systems also compete. They compete for users, attention, resources, and trust. Some of the similar systems that evolve in parallel turn out to be much, much better than others. Small differences in decisions and adaptations accumulate, and over time a few variants become clearly superior, while others stagnate or quietly get worse through layers of workarounds and compromises.

From the outside, it does not always look gradual. For a long time, differences between systems may seem small or invisible. Then, suddenly, development no longer looks step by step. It looks like a jump, a revolution, a quantum leap. This apparent leap happens when one system replaces another. The “new” system is usually not new in an absolute sense; it has been developing in parallel with others, but at some point it crosses a threshold where it is so much better that it rapidly displaces the alternatives.

The similar systems that were evolving at the same time did not all evolve in the same way. Some became much better and suddenly replaced the others. From a distance, this looks like sudden change. From up close, it is the result of many small improvements, adaptations, and deteriorations interacting over time, until one system wins the competition and the shift becomes visible.

High level standard components

Adding login to an app or an API is something developers keep doing over and over. You pick an identity provider, set up configs, implement a login flow, validate tokens, handle errors, add logging, and integrate it all into an existing system. None of this is especially unique per project, but it’s still a lot of work each time.

To get a login flow working, a developer typically needs to get access to an app or API at the chosen identity/auth provider, choose which provider to use, set up separate dev and prod configuration, and make sure the app and server can read that configuration correctly. This means wiring environment variables, config files, secrets, and making sure everything is consistent between environments.

Then you have to program the actual login flow. That usually includes a login page or button, redirecting the user to the provider, handling the callback endpoint, dealing with state, and setting up sessions or tokens. On the server side, you need components or endpoints to receive and process the login response and connect it to how your app represents users.

On top of that come all the cross-cutting concerns. You have to code token validation (for example verifying JWT signatures, issuer, audience, expiry and scopes). You have to handle error situations such as invalid tokens, expired sessions, misconfiguration, or provider errors. You need logging and observability so you can see when login fails and why. Finally, you must integrate everything into the existing application or system, protecting endpoints, checking roles or permissions, and mapping identities to whatever domain model you already have.

Most of these tasks are very similar from project to project. Developers keep deciding and implementing the same things repeatedly. The real choice in many cases is only which id/auth provider to use and which flows to support. The rest is largely the same glue code and setup. There is little reason to solve all of this from scratch every time.

Instead of re-implementing everything, you could use a high level standard component for login. In such a model, the only thing a developer really needs to decide is which identity/auth provider to use and provide the minimal configuration for it. The component would handle dev and prod configuration, reading config in the app and on the server, the login flow (pages and server parts), token validation, error handling, logging, and simple hooks for integrating with existing applications.

The goal is to avoid manually setting up and coding all of these pieces in each new project. By treating login as a reusable, high level standard component, you reduce the work to what is actually necessary: choosing a provider and specifying the few details that are truly specific to the application. Everything else becomes implementation detail that the component takes care of, instead of a new “doing job” for every single app.

Will a language model tool say no

When we ask a language model tool to do something, it usually just does it.
“Improve this text.”
“Rewrite this email.”
“Refactor this code.”

But will it ever say: “No, that’s not necessary” or “What you have is already good enough”?

If you ask a model to improve some code or polish a paragraph, it will almost always produce a new version. It does not stop and say: “This is fine as it is.” It does not tell you that the code you want improved is already clean, or that the text you want generated is unnecessary. The tool does not, on its own, take a position on whether the task needs to be done.

Today, these tools are designed to follow instructions. They do the task that is requested: generate, rewrite, expand, refine. They treat the request as a given. The logic is simple: you asked for something, so they try to deliver it. They do not usually question if the code is already good enough, or if the text you are asking for adds any real value.

There are some situations where a model will say no, but those are mostly about safety or policy. It may refuse to answer because something is not allowed. That is different from saying “You don’t need this” or “This is unnecessary.” The refusal is about what it is allowed to do, not about whether your request makes sense or is worth doing.

If you want the tool to act differently, you have to ask for it. You can say: “First, check if this text actually needs improvement. If it is already good enough, just tell me that and don’t rewrite it.” Or: “Review this code. Only suggest changes if there is a clear benefit. If not, say no changes are needed.” In other words, you have to explicitly invite the model to evaluate whether the task is necessary, not just to perform it.

By default, a language model tool does not decide whether your request is needed. It runs the task you give it. If you want a tool that sometimes says “No, this is already good enough,” that behavior has to be part of your prompt or the way the tool is set up—not something it will do on its own.

The importance of access control on information

Access control on information is becoming much more important now that we are using all these new language-model–based tools. This is especially true when the results are not just for your own use, but are going to other people. Imagine using a copilot to summarise what the board of a housing association has done in 2025, to inform all the co‑owners. The copilot has access to everything stored in the association’s documentation system, plus email, Vibbo messages, Vibbo posts, and other sources. You ask it for a summary, it gives you a nice text, you skim it, think “good enough,” and send it out. It’s just the housing association, how bad can it be?

Then you discover that the summary includes a paragraph saying that the association has been plagued by a lot of noise from a named co‑owner, and that measures for forced sale of the apartment have been initiated. In reality, nothing like that has been formally decided or made official. Maybe there are complaints, maybe someone has mentioned forced sale as a possibility in an internal email, but the tool has mixed together drafts, discussions, and documents and presented it as if it were a fact ready to be communicated to everyone.

This kind of obvious mistake might be caught if someone reads carefully. But it becomes much harder when it is not so clear what information can be given to whom, or when it is unclear what is considered official and what is not. A language model does not understand the difference between internal discussion and official communication. It just sees text and tries to answer the question you asked.

Some organisations try to handle this by forcing all information to be classified. For example, every document must be marked as open, internal, or restricted. In theory this should help, but in practice it is very easy to mix things up. People mislabel or forget to label. Different teams use the labels in different ways. Over time the classification becomes something you click past, not something that is actually trusted. For tool developers and system administrators, it can be even harder, because they often have technical access to everything and must somehow make sure the tools respect labels that are not consistently used.

What really matters is control over which context information belongs to, how it is used, and when it is allowed to be used. Information that belongs in a board context should not automatically be available in a public context. Internal complaints and draft legal assessments should not suddenly show up in a summary meant for all residents. The same piece of information can be appropriate in one context and completely wrong in another.

To handle this, we need more than simple labels. We need to think in terms of contexts and audiences. Is this for the board only, for internal staff, for all residents, or for the general public? Is this a draft or an approved decision? Tools should not have free access to everything just because the data is technically stored in the same system. There should be technical “watertight compartments” between different types of information and different uses. When a user asks for a text to be sent to all residents, the tool should only be allowed to use information that is safe for that audience, and exclude sources that belong to internal discussions.

This is not just a technical problem. It is also about culture and routines. People need to understand that these tools are powerful and can mix information from many places, and that they will not automatically know what is sensitive or unofficial. Generated texts should be treated as drafts that must be read with the same care as anything else you send out. If we combine clear thinking about context and audience with technical separation of data, we can use new tools without accidentally leaking, inventing, or exposing information that should never have left its original context.

Maintaining Flow of Work

Most of us like to finish things completely before we stop working. We close all the loops, tick off the task, and then shut everything down. It feels good in the moment, but it often makes it harder to start again next time.

When you come back to the work, you sit down and have to ask yourself:
Where do I start? What was I thinking last time? What is the next step?
That small resistance can easily lead to delay and distraction.

A simple alternative is this: don’t end a work session by finishing everything. End it by beginning a little bit of the next task or step.

That means you:

  • Stop while you still know exactly what to do next
  • Don’t complete everything in each round
  • Intentionally leave a small “loose thread” you can pick up later

This loose thread can be very small:

  • The first sentence of the next section
  • A few bullet points about what you want to do next
  • A short comment to your future self: “Next: explain X with an example”
  • A function signature in code, or a clear TODO
  • A draft of the next email with just the greeting and one line written

The point is to avoid ending on a full stop. End on a comma. When you return, you don’t have to think hard about how to begin. You just continue what you already started.

There is another benefit: by starting the next step, you give yourself something to mentally and unconsciously work on between sessions. When the next step is clear and written down, your mind can keep turning it over in the background. Ideas and solutions often show up later, when you are not actively working – in the shower, on a walk, or while doing something else.

To use this in daily work, you can make a tiny end-of-session habit:

  • Write down the next concrete step
  • Do a very small piece of it (one sentence, one bullet, one comment)
  • Leave it visible so it is the first thing you see when you come back

This applies to writing, coding, analysis, planning, and creative work. The details differ, but the principle is the same: always leave something small and clear to continue.

The only things to watch out for are:

  • Don’t leave things in a chaotic state with no clear next step
  • Make the next step specific, not vague
  • Don’t create too many loose threads on too many projects at once

The core idea is simple:
End each work session by beginning the next step, and leave yourself a small thread to pull on. You give your future self an easier start, and your mind something to quietly work on in the meantime.

Trust but verify

Trust is good. Control is better.

Most of us know why. A report that looks fine but contains one critical error. A deployment that “should work” but breaks in production. A language model that gives a confident answer that turns out to be wrong. We want to trust our colleagues, our systems, and our tools. But everyone can make mistakes, both humans and machines. To have real trust in ourselves and others, we need to check and double‑check.

Controls can be difficult to carry out everywhere. They cost time and resources, often without any immediate visible benefit. People experience them as extra work and bureaucracy. Under time pressure, checks can be rushed or skipped. At the same time, we know that the lack of control can be even more costly: errors in production, wrong decisions, compliance issues, and loss of trust.

The goal is not to remove control, but to make it more effective. That means focusing checks where the risk is highest, keeping them as lean as possible, and building them into normal workflows. Instead of many manual reviews and repeated approvals, we need efficient, automatic ways to perform verification.

Automation can handle a large part of routine control. Systems can validate inputs and outputs, apply standard rules, and monitor for unusual patterns. Data quality can be checked continuously. For language models and agents, we can structure requests, ask for reasoning, and automatically validate formats and basic facts, while still using human review where the risk is higher.

When control is integrated and to a large extent automated, people don’t experience it as a separate layer. It becomes part of how work happens. That way, “trust, but verify” is not about distrust, but about accepting that mistakes are normal and using smart verification to catch them early.

Trust is good. Control is better. Real trust is built on checking, not on hoping nothing goes wrong.

When language models remember too much and context gets sticky

Many language model tools have some form of memory or long‑term context. They store previous conversations and information you’ve worked with, so they can pick up the thread where you left off. This makes the agent feel like a conversation partner that remembers what you’ve talked about earlier. That can be very useful in many situations.

But it quickly gets messy. The context that has been built up for one purpose can be completely useless—or even harmful—in another. What was helpful memory in one situation becomes plagsomt sticky in another. When you switch tasks or projects, the agent still drags in old assumptions, documents, and details that don’t belong in the new setting.

Often you don’t have a good way to get rid of this. You can’t easily clear only the parts of the memory that are in the way. In many tools your choice is either to keep everything or delete everything permanently. There is no real possibility for proper context switching—not just for the language model, but also for everything around it. Projects and other groupings help to some extent, but even there, information is easily mixed together and ends up in the wrong place.

In practice this leads to common problems. You move from one project to another, but the agent still assumes the first project is relevant. You change role from casual brainstorming to formal documentation, but the tone and assumptions from the earlier work bleed into the new task. Old information keeps showing up, even after it has become outdated. The result is that the tool feels less predictable and harder to trust, because you never quite know what context it is actually using.

What we really need is a more deliberate way of handling context. Instead of one sticky memory that grows over time, you should be able to define what context applies right now: the current goal, which information and documents are relevant, and which constraints (tone, audience, domain, privacy) should apply. You should be able to start a new task with a clean slate, attach only the information you want, pause one context and resume it later, and throw away or archive a context when you’re done with it.

There are ways to solve this. The “context” concept in kontext.cloud is one example. Here, a context is a first‑class object: something you explicitly create, switch, and manage. Each context has its own documents, instructions, and conversation history. When you switch context, the agent only sees what belongs to the active context. Other projects, clients, or personal notes are invisible by default.

This reduces unwanted stickiness and cross‑contamination of information. You avoid mixing data between clients and projects. You get better separation between work and personal material. And you get a clearer understanding of what the agent “knows” in each situation, instead of depending on a fuzzy, hidden memory that you can’t really control.

Memory in language model tools can be very useful, but without proper context switching it turns sticky and annoying. Treating context as something explicit—rather than a side effect of past chats—gives you more control, less confusion, and a tool that behaves more like a reliable partner than a forgetful one.

From Demo Code to Production Quality

With modern language model tools for coding, it’s very quick to get something up and running. You describe what you want, generate some code, add a few features in a partially working state, and before long you have a decent demo. The workflow is straightforward and follows the normal use of the app: you focus on the main user journey, tweak things here and there until it works “well enough,” and you stay mostly on the happy path. For a demo, that’s perfectly fine.

The situation changes completely when you need software that is good enough for production. Especially production at larger scale, with many users or critical functionality. This includes systems that handle important data, affect life and health, control physical devices, or involve a lot of money. In those cases, it’s not enough to stay on the happy path in your development flow. You have to go into the details, consider variants and edge cases, and build real robustness. You need to ask: do the transactions hold under failure and concurrency? Does the performance hold up under real load? Is the security actually good enough against realistic threats? “What if this happens?” becomes the question you ask everywhere.

Language model tools can be excellent for this kind of work as well. You can use them systematically to go through all components, methods, functions, and calculations. You can look for obvious errors and missing checks, find logical weaknesses, and identify improvements at the component level. You can also use them to think about how components work together: where data can get out of sync, how errors propagate through the system, and where you need stronger guarantees.

The important point is that the tool does not decide how you work or what you focus on. The developer does. You set the quality bar. You choose whether to stop at a demo that works on the happy path, or to go further and build something that can handle real-world conditions. Being satisfied with a happy-path demo version is completely fine, as long as you treat it as a demo. What you should not do is present that demo as finished, production-quality software—especially not when there are many users, critical functions, or real consequences involved.

System instructions versus other instructions for a language model

Many APIs and tools for language models expose a special place for “system instructions” or something similar. It is usually separated from normal messages, labeled clearly, and highlighted in documentation.

This easily gives the impression that system instructions are a higher level of authority: that they always take precedence, that they define what the model must follow, and that other instructions are secondary. Users start to think there is some strong technical guarantee behind that separation.

In practice, there might not be such a big difference. In some cases, there might be no real difference at all.

From the model’s point of view, everything ends up as text in a single prompt. System instructions, user instructions, previous messages: it is all just context the model reads and then uses to guess what it should do next. The model is not applying a fixed rule engine where “system beats user” in all cases. It is pattern matching on the combined text.

That also means it does not really matter where the instructions are given to the model. They can be sent in a separate “system” field, or they can be written into the first user message. Either way, they become part of the same input. The separation exists mostly for humans and tools, not as a hard technical boundary inside the model.

When instructions conflict, this becomes very visible. You might have a system instruction saying “Always answer in Norwegian,” and later a user message saying “Answer in English instead.” Or a framework might inject one set of system rules, while the application or end user adds different or even opposite instructions in normal messages.

In those situations, the model does not strictly enforce a priority based on where each instruction came from. It tries to guess what it is supposed to do overall. Often, it will lean toward what seems most likely or most recent in the conversation. Sometimes it follows the system instruction, sometimes the user, sometimes it ends up in between. The result is that you cannot rely on the location alone to resolve conflicts.

So what can you actually do with system instructions?

They are still useful as a way to define shared baseline behavior: things like role, tone, and general constraints you want across many conversations. They are also a convenient place for tools and frameworks to inject their own configuration. But you should not assume that “because it is in the system field, the model will always obey it.”

The practical takeaway is: think of all instructions together, not as layers with strict power. Avoid contradictory instructions if you want predictable behavior. If something really matters, make it clear, simple, and repeat it where necessary, instead of trusting that a single system instruction will always win.

Hands-on Automation

Many automation efforts fail because nobody really understands the task in enough detail. People jump straight to tools and agents and try to automate based on assumptions. A hands-on, “manual first” approach starts from how the work is actually done in real life and gives you both learning and documentation before you automate anything.

Start by doing the task that should be automated manually several times. Click through the systems, fill out the forms, send the messages, move the files. If you cannot do it yourself, do it together with someone who has the domain expertise. Ask them to talk through what they are doing and why at each step. This is how you uncover the small rules and decisions that are usually not written down anywhere.

While you perform the task, write down the details of what is done, step by step. For each step, note what you look at, what you decide, and what the result is. Capture which tools or systems are used and who is involved. Do not worry about perfect structure at this point; focus on capturing what actually happens.

Then look at which variations of the task can occur. Real processes are rarely linear. Identify different paths: what happens if something is missing, delayed, in the wrong format, or unusual? List these variations and try each one manually at least once. This shows you how the process changes and which conditions and branches exist.

The result of this work is both learning and documentation. You get a clearer understanding of what needs to be done and a written description of what is done and how it is done. You can turn your notes into a simple description, checklist, or flow with the key steps, decision points, rules, and exceptions. If you worked with a domain expert, review this together and adjust it until it matches reality.

The next step is to identify what can be automated. Go through the documented process and separate repetitive, rule-based steps from judgment-heavy or unclear steps. Some parts are good candidates for full automation, some are better suited for human-in-the-loop, and some are not worth automating at all. How the automation should be solved must be evaluated based on the concrete task: sometimes a simple script or integration is enough, other times you might use a workflow tool or a language model or agent to assist with unstructured parts like text or documents.

By starting with manual execution, detailed notes, and explicit variations, you avoid automating based on guesswork. You build automation on a solid understanding of the real process, which makes the result more robust, useful, and easier to improve over time.