Why am I doing this? Link to heading
I kinda set up myself for failure over the years to some degree that I opened up probably too many side projects on GitHub.com. Then with all the life commitments, I can barely keep up with the throughput that I want vs. the actual hours I can realistically devote to them. All the shower ideas, refactoring, feature development, bug fixes, etc. every single one of them could easily eat up a day for me, and there are several hundreds of them.
The raise of AI coding agents sheds a light to me.
This series of challenges is not about shitting on AI coding agents, but rather, my journey to find a working solution that allows me create and maintain side projects at 10x or even 100x throughput that I currently capable of. These are experiments to find the sweet spot between me and those agents.
Setting up the scene Link to heading
The scene of today’s round is pretty simple. Every agent starts with an empty directory except a prompt.md
that describes the challenge. One-shot means other than minor fixes, there is no further iteration allowed for comparing the results.
Below is the prompt.md
, all agents use the exact same copy of it.
## Preface
Hello, whoever you're, maybe you're Amp, Cursor, Claude Code, Windsurf, or Gemini. Doesn't matter (don't be upset if you're not one of them), do exactly as what I say here and DO NOT OVERENGINEER the shit (like running `docker-compose up` for local dev, that's insane).
Thank you!
## Tech stack
- The backend is written in Go,
- Web framework: https://github.com/flamego/flamego
- Configuration: https://github.com/go-ini/ini
- The database is PostgreSQL,
- Use https://github.com/go-gorm/gorm as the ORM
- Use https://github.com/pressly/goose for the database migration
- The frontend uses
- React with React Router
- Vite for local dev server
- Tailwind CSS v4 and Shadcn UI
- Use pnpm
- The frontend and backend uses RESTful API and JSON as data format
## System requirements
A web application that notifies file changes in commits of a GitHub repository via Slack messages.
In addition, it provides a web UI that allows:
- Enter the GitHub URL to add more repositories to watch.
- Add watch rules. The syntax of rules should be the same as GitHub's CODEOWNERS file.
- Whenever a rule is updated, check there are actually matching files.
- Periodically, recheck for rules in case files are moved/renamed, and notify the user too.
- It should allow me to preview matchings of a rule when I add or update.
- It should support watching multiple GitHub repositories.
- No secrets should be stored in database but in the configuration file.
## Technical details
This section contains non-exhaustive mentions of technical details, but when mentioned, think carefully and apply accordingly.
- For code structure and frontend setup (ESLint, Vite, everything), copy and adopt how https://github.com/pgrok/pgrok does things. Including how it proxy Vite requests to Vite via the Go server.
- Use `Taskfile.yml` to set up common commands, it must include one named `task build` to build the backend server.
- Use `Procfile` and https://github.com/DarthSim/overmind for process management, all processes should start when run `overmind s`
- When you done working on something, check build errors and fix all of them.
When unsure, study hard as hell how https://github.com/pgrok/pgrok does things.
You might notice that how atypical the choice of tech stack and implementation solution is (compare to main stream, but it is my typical setup), and “weirdly” asking it to follow a “template repository”. That’s ALL ON PURPOSE. I want to see how far those agents can go on things that likely not being trained on extensively, but instead provided as context/prompts.
Then, the one-shot prompt is also exactly the same:
Read the prompt.md and do your thing.
Amp Link to heading
The first agent I asked was Sourcegraph’s Amp.
- Good
- Obviously, my thread can be easily shared with you thanks to its builtin sharing feature. You can check my thread at https://ampcode.com/threads/T-5876a5f1-b34b-4286-96bd-b850a43d55d3 without an account.
- It got the closest project structure that I pointed it to mimic among the three.
- It gave me a good next steps (set up DB, start dev server, etc.) once it’s done working.
- Bad
- Still read from
PORT
env var despite creating a port config value in theconfig.ini
, and can’t figure out why as follow up. Had to manually fix it up. The root cause was that thePORT
env var was also used for controlling the Vite server, then there was a conflict. - No actual CSS was written or used any style with Tailwind CSS.
- The only one that got routing syntax wrong completely on the initial try. Fixed after asking it to.
- Hanged on running HTTP server (
go run main.go
). Leave zombie processes after hitting Ctrl+C twice. - The only one that did not check build errors autonomously.
- Asked to proxy Vite requests via Go backend but did not do that initially (the “template repository” has an example of that), kinda worked after being asked so but with mistakes on catch-all routing syntax.
- Created migration scripts, but did not use
goose
as told to, and decided to use GORM’s auto migrate feature. - Ultimately, doesn’t work as a web application, this is how the web UI looks like on the index page:
It is OK it got no CSS, but not even hyperlinks to the pages that it wrote?
- Still read from
- Strange
- Randomly stops in the middle of TODOs, had to ask it to finish (not the first time encounter this for the past month).
- Used NodeJS env var to determine dev mode, while the backend is in Go.
Claude Code Link to heading
Then, I asked Anthropic’s Claude Code.
- Good
- Its command and website allowlist is at project-level, which is pretty nice. Not all commands are equal.
- The only agent that got
overmind start
working on the first try, no build errors or anything. - The only agent that gave me a working web UI:
It works jumping between pages, values are stored correctly in the database, etc.
- Bad
- No actual CSS was written or used any style with Tailwind CSS.
- Did not follow the code structure I ask to mimic, like, at all. All the Go files lives in the root directory.
- Asked to proxy Vite requests via Go backend but did not do that initially (the “template repository” has an example of that), kinda worked after being asked so but with mistakes on catch-all routing syntax.
- Created migration scripts, but did not use
goose
as told to, and decided to use GORM’s auto migrate feature.
- Strange
- Used NodeJS env var to determine dev mode, while the backend is in Go.
Gemini Link to heading
Lastly, I asked Google’s Gemini.
- Good
- It gave me the choice of what kind of editing mode I want (auto, review), even supports “Modify with external editor”.
- The only one that tries to use Shadcn UI, but… read on.
- Bad
- It downloads a bunch of dependencies to use Shadcn UI and made a
button.tsx
, but the button looks identical to the one that raw HTML would have given me. - It kept messing up the correct path to load
config.ini
, had to apply manual fix. - The only one that doesn’t provide me a working
config.example.ini
, empty value for the database user. - Very poor summary and no clear next steps to test the app.
- Multiple manual fixes for some minor things, bad string concatenation for listen address.
- The web app is full of error and couldn’t actually load, missing lots of many dependencies, briefly worked after many manual edits just wanted to see what it looked like.
- Extremely poor and short
Taskfile.yml
compared to the other two. - No error handling on the database layer on the initial version. It then added error handling as part of trying to fix web app errors. Which is, again, strange because the errors were for JavaScript.
- Asked to proxy Vite requests via Go backend but did not do that initially (the “template repository” has an example of that), kinda worked after being asked so but with mistakes on catch-all routing syntax.
- Created migration scripts, but did not use
goose
as told to, and decided to use GORM’s auto migrate feature. - Ultimately, the web application doesn’t work. I can’t even provide a screenshot here.
- It downloads a bunch of dependencies to use Shadcn UI and made a
- Strange
- Instead of fetching as web content, it tries to clone the template repository locally. Not complete nonsense, putting here for consistent grouping.
- Used NodeJS env var to determine dev mode, while the backend is in Go.
The winner? Link to heading
If I judge by whether there is a working web UI, then Claude Code is clearly the winner. But given all the mixed signals, it is actually a winner? The challenge is scoped to one-shot, but practically, the story won’t end there. At the very least, it feels like a lot more back-and-forth needed to fix the project structure with Claude Code than asking Amp to fix up the hyperlinks in the index page.
Closing thoughts Link to heading
I’ve noticed some common symptoms, which is likely caused by some of my vague prompts, this is a clear area that I need to put more thoughts into, I think I might need to be way more prescriptive if my tech stack is not in JavaScript LOL. The way I point to a “template repository” doesn’t feel like working great enough, I need to experiment more here as well. Also, I did not unlock the full power of each agent, like there was no AGENT.md or CLAUDE.md.
I only managed to run through three agents this time due to my time constraint. I also only used CLI version to have fair taste. Amp has VSCode extension which might do things much better when it has access to more diagnostics for example, similarly for Claude Code.
Next round, I wish I could do things more in-depth, and add more agents to the bench. Stay tuned!