Knowledge can be represented through language: text, speech, articles, blogs, stories, fairy tales, documentation. This is how we usually write things down, explain details, and share information. Language is good at being precise, describing steps, and capturing logic.
Knowledge can also be expressed visually: images, figures, diagrams. Visuals help us see structure, relationships, and patterns quickly. They give us overviews that are hard to get from text alone. A picture can say more than a thousand words.
Visual knowledge is not just a visual representation of what is already written. It is knowledge and information that cannot be effectively or efficiently expressed in writing. Things like complex interactions, flows, and spatial layouts are often easier to understand in a diagram than in paragraphs of text.
To scale knowledge, these two forms can work together. A language model can work in pair with a visual model. The language model handles text: describing, explaining, and structuring knowledge in words. The visual model handles diagrams and images: turning descriptions into visual structures and visual input into something that can be interpreted and explained.
Together, they can move back and forth between text and visuals. Text can be turned into diagrams, and diagrams can be turned into clear explanations. This pairing makes it easier to express, share, and understand knowledge in the form that fits best—sometimes as words, sometimes as pictures, and often as both.