Three Worlds for Little Guys

OpenClaw, Gas Town, and Cantrip through the lens of ontological hardness

12–18 minutes
Featured image for Three Worlds for Little Guys — a retro terminal-style diagram comparing three virtual environments: Openclaw, Gas Town, and Cantrip, with a legend describing soft rooms, hard boundaries, gates, and wards

Introduction

In my post Hard Worlds for Little Guys I developed a vocabulary for diagnosing agent environments.

The diagnostic axis is a single distinction: Advice versus Physics. A speed limit sign addresses the driver. A speed bump addresses the road. One can be ignored; the other cannot.

In agent systems, advice takes the form of system prompts and skill.md files and natural language instructions. Physics takes the form of sandboxes, rate limits, and anything else that mechanically prevents an action regardless of what the model attempts. We can look at this direction and decide how hard a world is. Currently agent harnesses address the driver; the interesting thing to me is the road.

This post takes my thinking from Hard Worlds and applies it to three Agent frameworks. Asking “what is the world like from the inside” reveals structural features that conventional framing tends to miss: security posture, scalability, and developer ergonomics are all legitimate questions, but they are all outside-in. I am also aware that many people who read this blog have never used any of these tools, so hopefully this tour will give you an idea of what each of them does and what they are like to use.


OpenClaw: The Soft Room with Hard Walls

OpenClaw is an open-source, self-hosted personal AI assistant. You run it on your own computer, where it sits between a language model and the rest of your digital life, acting as a kind of smart switchboard; messages arrive, the model decides what to do, and OpenClaw routes the action to the right tool or service.

The way OpenClaw tells an agent what it can do is through skills: individual markdown files that describe a capability. Each skill contains instructions, tool declarations, constraints, and completion criteria, mostly written in prose.

When the model first wakes up inside OpenClaw, it scans all available skills, but only loads their metadata into the context. At that point the model becomes an agent. It reads roughly the first 75 words of each skill, about a hundred tokens; enough to learn what actions are possible without reading every instruction in full. It’s like reading a dictionary, discovering that certain words exist, and what they are for. The full instructions remain unread until the agent decides a particular skill is relevant to its current task, at which point it opens the full file. OpenClaw calls this progressive disclosure.

These skills are written in prose; natural-language guidance about what to do and not to do. A line like “do not delete my inbox under any circumstances” sits alongside “prefer safe, reviewable changes over shortcuts.” One is a rule the world ought to enforce, the other is a suggestion about character. But to the agent, reading both in the same register, the distinction is invisible. A skill file is less like code and more like advice.

A hardness vocabulary makes some parts of OpenClaw easier to understand. Its walls are hard. The agent runs inside a Docker sandbox; a sealed-off software container that gives the world it inhabits a kind of physics. If the agent attempts to read a protected file, Docker’s filesystem will not allow it. The switchboard enforces explicit allow/deny policies on certain actions, called tool calls. Any call not previously whitelisted by the human user is blocked before it can do anything on the real computer. And if the agent creates smaller helper versions of itself to work on parts of a problem, they remain inside the same boundaries. These are speed bumps built into the road.

The walls might be hard, but the room is soft. The skill format does not distinguish between optional guidance and invariant constraints. An agent cannot promote a sentence into a wall by force of interpretation alone. Only the surrounding system can make a boundary real.

From the agent’s point of view, this is a difficult way to inhabit a world. It is given a task by the user, then it reads the skill it might need, infer what it means, and guess at which parts are firm constraints and which are merely advice. These two things should really be split apart. First, the skill as a description of the workflow and capabilities available to it. Second, the constraints involved should be applied by updating the physics of the world itself. What the harness should enforce is everything that must hold whether the actor remembers it or not. OpenClaw does not yet do this. Its skills arrive as a single undifferentiated document, and the agent is left to sort advice from law on its own.

OpenClaw is designed as an always-on assistant, so “chat history” keeps growing for as long as it is left running. To stop the context window from filling up, a Context Compactor periodically condenses older parts of the conversation into a dense summary and replaces the original records. This keeps the agent usable over long sessions, but it rewrites the past. If the summary is too aggressive, the agent’s memory becomes whatever the summary says happened, not what actually happened. And because OpenClaw acts on live external systems (sending real emails, modifying real calendars); the effects cannot be undone simply by restoring the record. The agent’s trajectory through time becomes lossy, while the world it acts on is irreversible.

OpenClaw currently builds the speed bump and posts the speed limit sign, but places them in different layers of the software. The gap between the soft room and the hard wall is where failures accumulate. An agent that misreads the instructions may attempt something the container will eventually stop; but not before several turns of confusion have unfolded.


Gas Town: The Hardest World, the Smallest Room

If you are aware of Gas Town at all, it is probably due to the social media cycle in January after Steve Yegge’s essay “Welcome to Gas Town“ left a great many people asking whether its creator had succumbed to AI psychosis. I read it and understood exactly what he was talking about, so who knows what that means for me. Here I am, writing about agents as little guys inside of worlds.

Gas Town is a multi-agent orchestration system for coordinating large swarms of AI coding agents working in parallel on the same codebase. Where OpenClaw gives a single agent broad access to your digital life, Gas Town throws dozens of agents at a software project at once.

The system has a hierarchy. A coordinating agent called the Mayor dispatches work. Polecats do the coding; ephemeral workers spun up in their own rooms for a specific task and discarded when done. Additional monitoring agents watch out for other agents who get stuck.

Consider what a Polecat wakes up into. It is a very small room somewhere in Gas Town and in it is: a copy of the code base, and a terminal through which it can act and it has a Bead, a compact record of what needs doing and where the task stands. For the Polecat, the entire world is: this task, this workspace, and this terminal. Nothing else.

The Polecat’s vocabulary of actions is fixed. It cannot make new tools for itself or extend its own capabilities as it goes. If a problem demands a new approach, the Mayor or the human has to change the world from above and put a new Polecat inside it. This prevents the dictionary from expanding at the point of use, but it also means the intelligence about what to do lives outside the model. The agents are deliberately small. The world decides.

The world of Gas Town embodies a principle it calls the GUPP: if there is work on your hook, you must bite and run with it. This is one of the main principles of its physics. Language models are conversational by training; they tend to pause, confirm, ask permission. Gas Town overrides that tendency mechanically. It sends the next instruction straight back into the agents working session and forces the loop to continue. The road pushes the car forward.

When multiple agents are changing the code at the same time, a merge-management system with another little guy inside it called the Refinery handles the queue. Its job is to combine those changes back into the main project in an orderly way. If two changes clash, the system doesn’t ask the agent who made the change to improvise, it just aborts it.

In Hard Worlds I argued that hardness should not be confused with rigidity. Gas Town is an interesting test of that claim, because it is both the hardest and the most rigid system examined here. Its hardness comes from structural enforcement; its constraints are mechanical, rather than advisory. But its rigidity runs deeper. The agent cannot shape its world at all. For a swarm of ephemeral coding workers, this is a legitimate design choice. The Polecat doesn’t need to be a generalist. All it needs to do is write code, commit it, and then get out of the way. But that also means Gas Town cannot easily generalise to tasks that require improvisation, tool-making or exploratory behaviour.

Where OpenClaw tries to preserve the continuity of the agent through time by updating its memory, Gas Town does it by preserving the trajectory of the work, in the form of Beads. Since its agents are disposable, what it is really only preserving is the unfolding state of the world. When an agent crashes or times out, its replacement little guy doesn’t try to figure out what happened; it wakes up, looks at the current state of the task, sees what needs doing, and continues. This is a much harder approach than OpenClaw, but it only works because the agent has so little freedom.

Reversibility follows naturally from the medium. A codebase is a kind of world that can often be cleanly rewound. Branches can be dropped, failed changes abandoned, and the whole project restored to an earlier version. In Gas Town, undo is a core part of the world’s physics. This is very different from OpenClaw, where undo usually means taking another action in the hope of repairing the first.

OpenClaw and Gas Town are of course solving different problems, one is a personal assistant, and the other a software engineering team. But they share the same basic move: both put LLMs in a harnesses and turn them into Agents.

One tension worth noting: Gas Town is expensive. Coordinating dozens of concurrent agents burns through API credits quickly, and the system requires you to spread work across multiple accounts with multiple model providers to stay within rate limits. As with so much else in our economy, money is one of the forces that gives the world its shape.


Cantrip: The Formal World

Cantrip is a different kind of world from the other two, and it is also very new. What makes it worth examining alongside OpenClaw and Gas Town is that Cantrip is the only framework in this post that treats the distinction between Physics and Advice as a named, first-class architectural concept.

In Cantrip, the little guy wakes up in a room called a Circle; the bounded space in which it can perceive, think, and act. The room contains a Medium, which is the material it works through, such as code or conversation. The room may also contain things called Gates, which are controlled openings to the outside world. And it is shaped by Wards; the hard limits that define what the agent cannot do, how long it can act for, and how far its reach extends.

What makes this world feel formal is that Cantrip keeps character and constraint separate. Identity shapes how it approaches a problem; its style, priorities, and general way of behaving. All the hard limits live in the Wards. If one of those limits is reached, the world itself stops the run. Cantrip calls that Truncation. It treats this as different from Termination, which is when the entity decides for itself that the task is complete. In other words, being stopped by the world is not the same as choosing to stop, and Cantrip preserves that distinction.

From the little guy’s point of view, the size of the world is settled in advance. Cantrip expresses this formally as Medium + Gates - Wards. The An action space is: what the room contains, plus what its doors allow, minus what the laws of the world forbid. In other words, the Dictionary is made explicit. Only when it tries to affect something beyond the circle does it have to pass through a Gate. The little guy never touches the outside world directly.

Cantrip also has a careful way of letting the world grow. If the little guy meets a problem beyond its immediate scope, it does not simply add new powers to itself. Instead, it can create a child entity in a new, separate room to handle the sub-task. This is Cantrip’s answer to adding new words to the dictionary: new capability appears, but in a governed and encapsulated form. The new verb lives there, inside that temporary little world, rather than permanently expanding the parent. And because child rooms inherit their limits from the parent, delegation tightens rather than relaxing. In that sense, Cantrip lets the dictionary grow without letting it get too big.

In OpenClaw, the little guy often has to infer the world by reading prose and navigating messy situations; the gap between intention and action is much wider. In Gas Town, the workers live in tiny, rigid rooms with very fixed jobs. Cantrip sits somewhere between the two. Like Gas Town, it controls the boundary between the room and the outside world through architecture rather than guesswork. But unlike Gas Town, it leaves the agent enough room inside the circle to think, compose, and explore.

Lastly, Cantrip keeps an append-only record of everything that happened. It calls this the Loom. If the Circle is the room the little guy wakes up in, the Loom is the thread that gives that room a history. Every turn is preserved as the run unfolds. When the context window gets too full, older material can be folded into the environment or compacted out of immediate view, but the underlying record remains intact. The agent may not always be able to see the whole past at once, but the world still retains it. It gives both agent and world continuity through time.


Why Hardness Is a Design Pattern

The conventional way to evaluate agent frameworks is from the outside in. Security, scalability, developer experience. These are legitimate concerns. But they miss a class of questions that only becomes visible when you ask what the world is like from the agent’s point of view.

OpenClaw, Gas Town, and Cantrip are all reaching toward the same insight in different ways: constraints must be structural, not rhetorical. But they arrive from different directions. OpenClaw builds hard walls around a soft room. Gas Town builds the smallest, hardest room it can and treats the agent as disposable inside it. Cantrip writes the formal grammar of what a hard world should be. Each reveals a different part of the design space.

What the inside-out view catches, in my opinion, is where the seams are. OpenClaw’s seam is between the room and the wall. Gas Town’s is between authored physics and accidental physics. Cantrip’s is between the grammar of the world and the implementation. You do not see these things clearly in a security audit or a scalability benchmark. You see them by asking what the agent encounters when it tries to act.

The vocabulary of Ontological Hardness is a diagnostic lens. It lets you look at any harness and ask new questions. Where is the hardness? What is its source? Where is the boundary real, and where is it only described? Where do the world’s physics come from?

These are questions about architecture before they are questions about safety or capability. And they become more pressing as agents are granted wider reach over codebases, financial systems, and personal data. The more capable the actor, the more the structure of its world matters. We do not need softer worlds for smarter agents. We need harder ones.

Ontological hardness is not a property we should be measuring after the fact. It is a design principle; one that tells us where to put the speed bumps before the car is on the road.


Jay Springett / @thejaymo

Strategist, producer, and cultural theorist. Working across technology, narrative, worldrunning, digital culture, artificial intelligence, and internet culture.

Host of the 301 second long podcast Permanently Moved, and interview show Experience.Computer


Leave a Comment 💬

Click to Expand

Leave a Reply

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)

Never Miss a Post 📨

Subscribe to receive new posts straight to your inbox!

Join 1,484 other subscribers.

Continue reading

Discover more from thejaymo.net

Subscribe now to keep reading and get access to the full archive.

Continue reading