Tech

How to solve hard problems with AI

This is probably something I’ll be updating over time but has helped me a ton when kicking off long-running agents.

  1. Find a hard problem. Something like creating a browser, adding raytracing to an old game, adding VR to an old game, decompiling an old game (are you starting to see a trend?)
  2. Get an agent structure that allows you to run an agent on an always-on machine. It should be the same platform as the machine that will run the “problem” - if it’s a game, probably Windows. I’m using Opencode + Kimaki to check in on Discord
  3. Ralph loops. I created a skill here. For the initial codebase, have it create a comprehensive PLAN.md, and then setup a Ralph Loop until the PLAN.md is complete. You can have it keep kicking off agents until there are no - [ ] left in the document.
  4. (But not necessarily last): make it so that the agent can somehow verify its output. A screenshot harness, the ability to directly set state to a certain position, are super super important as as your topic gets more niche, AI is less likely to output code that works on the first try.

AI is now a magic decompiler

AI Agents are now magic decompilers. Previously, I ran an agent-in-the-loop to try to decompile Super Smash Bros. Melee in Dec 2024 with gpt-4o, but found that the model didn’t tend to learn from its mistakes. Since then:

  • Models have gotten smarter
  • Tools have gotten better
  • Other people have started to create skills & tools to help AI

It’s not just a couple of functions here or there. I’ve merged around 20 functions, and have 80 more in review. Previously, it would take me 1 day a function. My results aren’t even the most impressive though. The writer of the decomp skills set a record for the most matches in a PR. Tons of people are using their own custom agents or just Claude Code in the Discord (channel is #smash-bros-melee) every day with great results. If this is something you’re interested in, come pop on by! We could always use more tokens 🤠.

Read more >

opencode or Claude Code?

Just so that people don’t get confused, this opencode. I’m not a shill I promise.

TL;DR: if you have time to experiment, use opencode with sonnet-4. Otherwise, use Claude Code.

I’ve spent a lot of time with opencode as well as Claude Code. I’m going to use this as a live document talking about the tradeoffs of using either tool.

First, Claude Code is roughly SotA for a terminal AI editor for fullstack work (my domain). I’ve also tried:

Read more >

ChatGPT isn’t a decompiler… yet

Previous article: What I’m up to

Abstract / Results

It feels a bit pretentious to open a blog post with an abstract. However, I wanted to communicate up front concisely what I tried to do, and what the open areas of exploration are. Those who are interested can dig more.

I wanted to make ChatGPT into a magic decompiler for PowerPC assembly to supercharge the Super Smash Bros. Melee (“Melee”) decompilation project. I observed over a year ago that ChatGPT was surprisingly good at understanding PowerPC assembly language and generating C code that was logically equivalent. I also saw other papers that were attempting to use LLMs as decompilers.

Read more >

What I’m up to

A lot of people have been asking what I’ve been up to since I left Plaid at the beginning of this month. I was at Plaid for 4 years, which were amazing and I am very thankful for the amazing people I’ve met and work I’ve been able to do.

I am not funemployed, and I don’t want to evoke concepts related to that. I’m grinding harder than I did while employed. It’s such a gift to be able to have software engineering skills that have been forged in a real tech company, and then let loose on personal projects. I’m working on learning as much as I can about the AI space and debating if I should make that my next 4-year move. AI has been moving faster and faster, and there are so many toy projects I want to build:

Read more >

Choosing a Blogging Platform: Aesthetic and Technical Considerations

In my journey as a blogger, I’ve published posts across platforms like Medium, Substack, and other proprietary blogging stacks. When writing more and more technical stuff, I realized that some stacks were definitely better than others.

When consuming other people’s blog posts, the first thing that stood out to me was aesthetics. You get an impression about the platform and the person simply by the details of how their text looks. Does their code have great, language-specific highlights? Do they use monospace + does their platform support it? How is the image formatting? What about the base color scheme?

Read more >

Why pay for Notion’s AI? I built my own auto-tagging tool in a week!

Originally posted on Medium.

I built the thing I talk about in this blog post — if you want to check it out, it’s here!

Notion, like every tech company, has been shoving AI features down our throats for the last couple of months at the cost of customer UX. So I disabled them. You can do this yourself by just messaging support — I got the idea from this Reddit thread, which is one of many. Ever since I disabled it, the UX has at least returned-to-normal, and performance of editing has increased (have you ever noticed how Notion lags a bit every time you press SPC so that it can show you the AI toolbar?).

Read more >

How I do window management in Mac OS X

Originally posted on Substack.

Brief doc I’m sending to my friendos

For transparency, I’m going to recommend a paid window switcher I use for Mac OS X called Contexts. It’s saved me so much time and has made using my computer a breeze; so much so, that I’ve bound it to CMD+Tab. I’ll attempt to justify this in the doc.

The default window management paradigm in Mac OS X, for me, left much to be desired. I grew up using Windows, which has a pretty different pattern for how you Alt+Tab between windows.

Read more >

I made a web app to get better at adding half-steps to notes

Originally posted on Substack.

Try it out here if you like pressing buttons as much as I do! GitHub if you like reading code.

During the holidays, I wanted to get better at answering questions like “what is 7 half steps up from A?1” I often found myself in the situation of having these problems as a lot of guitar chord sheets are written something like “A capo 7” which means you put the big barre thing on your guitar and play an A-shape chord. When using a capo, the actual underlying chord is ‘A + 7 half steps’. This means if you’re collaborating with another instrument or with someone not using a capo, you need to communicate the actual chord you’re playing. This requires some mental math, which I found slightly embarrassing as I didn’t always immediately know what chord I was playing.

Read more >

My Notes on Google’s TrueTime

Originally posted on Substack

edit: this blogpost was initially wrong when I published it. Thanks to some comments I got, I learned I didn’t fully understand TrueTime or Spanner — I’ve spent some time learning and understanding the core concept again, and have updated this artifact. This is an externalized resource for me that I hope can be helpful for others.

When I was reading the famous paper on Spanner, Google’s globally distributed linearizable database, I really struggled with the concept of TrueTime, which is a core component of why they were able to get their guarantees. After trying to wrap my head around it, I created the following artifact (IMO, TrueTime deserves a mini-ish paper or post on its own):

Read more >