I hate Selenium. There, I said it.
I spent last Saturday night debugging a scraper that had been running perfectly for six months. The target website didn't even do a major redesign. They just changed a single CSS class name on their login button. One tiny change, and my entire script exploded.
If you write automation scripts, you know this feeling. It is the absolute worst part of the job. We have been building bots that are "blind" for years. We tell them to click coordinate X or find ID Y. They do exactly what they are told, which is usually the problem.
That is why I stopped scrolling when I saw a repository called hkuds/clawwork popping up all over my GitHub feed recently. It promised to fix the one thing that makes browser automation a nightmare: fragility.
I decided to dig into it because I kept seeing people search for "what is Clawwork" or specifically looking for the hkuds version. I pulled the repo, burned through a bunch of API credits, and gave it a spin.
Here is the honest breakdown of what this thing actually is and if it is worth your time.
Why We Are All Tired of "Dumb" Bots
The problem with tools like Puppeteer or Playwright isn't that they are bad tools. They are amazing. The problem is that they rely on the DOM.
You are basically playing a game of "Where's Waldo" with HTML tags. You write a script that says "Find the blue button." But tomorrow, the button might be green. Or it might be inside a strictly positioned div that blocks the click.
Traditional bots crash when things change. They have no intuition.
Clawwork is different. It is part of this new wave of "Agentic AI." Instead of looking at the code, it uses Large Language Models and computer vision to look at the actual pixels. It sees the page like you do.
So, What is Clawwork (hkuds)?
Okay, let's get technical for a second. Clawwork is an open-source browser automation agent. There are a bunch of versions floating around, but the hkuds/clawwork repo is the one everyone is talking about right now because it is clean and it actually works.
Think of it as a wrapper around Playwright. But instead of you writing the steps, you hook it up to a brain like GPT-4o or Claude 3.5 Sonnet.
You don't write: await page.click('#submit-btn');
You write: "Go to Amazon, search for a mechanical keyboard under $100, and add the one with the best reviews to the cart."
The agent takes a screenshot. It sends that image to the AI. The AI looks at it and says, "Okay, the search bar is at the top. I need to click there."
It is weirdly human. It calculates coordinates based on what it sees, not just what is in the code.
Why the "hkuds" Repo Specifically?
I was wondering this too. Why this specific fork?
It comes down to control. Most "AI Agent" tools right now are expensive SaaS products. You have to pay a monthly fee and you don't really know what they are doing with your data.
The hkuds version is open source. You can clone it. You can run it on your own laptop. You bring your own API keys. It handles the messy stuff like managing the browser context and the vision processing so you don't have to build that from scratch.
I Tried to Break It (And It Kinda Surprised Me)
I didn't want to give it an easy test. I sent it to a travel booking site. You know the kind. Popups everywhere, dynamic date pickers, weird overlays.
My prompt was simple: "Find a non-stop flight from New York to London for next Tuesday."
I watched the browser open on my second monitor.
It sat there for a few seconds. That is the "thinking" phase. Then the mouse moved. It didn't jump instantly like a robot usually does. It slid over to the "One Way" toggle. Clicked it. Then it opened the calendar.,

This is the part where my Selenium scripts usually die. The calendar date for "next Tuesday" changes every week. You can't hard code it easily.
Clawwork looked at the calendar, figured out what today's date was, calculated next Tuesday, and clicked the right number. It was slow, sure. But it worked. It felt a little ghostly seeing the mouse move by itself based on visual cues.
Under the Hood
If you are a dev, you probably want to know how the sausage is made. The architecture in the hkuds repo is actually pretty smart.
It has three main layers:

1. The Driver (The Hands) This is standard Playwright. It handles the actual clicking, typing, and scrolling. Nothing new here.
2. The Perception (The Eyes) This is the cool part. It captures the viewport state. Early agents tried to turn the HTML into text, which was messy. This version leans on Vision. It overlays a grid or numeric tags on interactive elements. So the AI sees a button labeled "42" and just tells the driver "Click 42."
3. The Brain ( The Wallet) This is where your API key comes in. The agent sends the state to OpenAI or Anthropic. The model decides what to do next. "I see a popup, I should close it."
The Real Talk: Pros and Cons
I am not going to sell you a dream here. This tech is new and it has some rough edges.
The Good Stuff It heals itself. If the website moves the login button to the left, Clawwork sees it and clicks it anyway. You don't have to update your code. That is huge. It handles logic that is hard to code, like "Choose the best looking option."
The Pain Points It is not cheap. Every single step is an API call with image data. If you are scraping ten thousand pages, you are going to burn through your wallet fast. It is also slow. The "Observe -> Think -> Act" loop takes a few seconds. You are not going to use this for high-frequency trading.
When Should You Use It?
Don't use this to scrape Wikipedia. That is overkill.
Use it for the stuff that makes you want to pull your hair out. Use it for QA testing where you need to simulate a confused user. Use it for complex workflows that involve multi-step forms. Use it for legacy internal tools that have terrible code structures.
Getting It Running
I am not going to lie, the setup can be a bit tricky if you haven't messed with Node or Docker environments before. You need to get your environment variables right or the browser will just close immediately.
I spent a few hours banging my head against the wall getting the Docker container to play nice with my local network.

Since I already went through that pain, I wrote a specific guide just for the installation.
> Click here to read my full ClawWork Installation Guide
I break down exactly how to configure the .env file and how to avoid the common errors I ran into.
The web is changing. We are moving away from rigid scripts to autonomous agents. hkuds/clawwork is one of the first tools that actually makes this accessible to normal developers. It is messy, it is fun, and honestly, it is probably the future.
Check out the repo, but definitely grab my guide first so you don't waste your evening debugging config files like I did.