Hi, I’m Stephen 👋

Can AI Agents build a graphics pipeline autonomously?

2026-03-01

I started working on adding raytracing (RTX) to The Legend of Zelda: Ocarina of Time. As in my other posts, I’m generally trying to drive the project via autonomous LLM agents.

I was hoping the project could be as simple as running Claude Opus 4.6 in a loop, saying “implement RTX into this engine, make no mistakes.” I got surprisingly far with this technique. However, I started noticing that it hit a wall, and that it seemed to be bottlenecked by its ability to understand the engine’s screenshot output, identify issues, and figure out what to work on next.

How to solve hard problems with AI

2026-02-22

tech

This is probably something I’ll be updating over time but has helped me a ton when kicking off long-running agents.

Find a hard problem. Something like creating a browser, adding raytracing to an old game, adding VR to an old game, decompiling an old game (are you starting to see a trend?)
Get an agent structure that allows you to run an agent on an always-on machine. It should be the same platform as the machine that will run the “problem” - if it’s a game, probably Windows. I’m using Opencode + Kimaki to check in on Discord
Ralph loops. I created a skill here. For the initial codebase, have it create a comprehensive PLAN.md, and then setup a Ralph Loop until the PLAN.md is complete. You can have it keep kicking off agents until there are no - [ ] left in the document.
(But not necessarily last): make it so that the agent can somehow verify its output. A screenshot harness, the ability to directly set state to a certain position, are super super important as as your topic gets more niche, AI is less likely to output code that works on the first try.

AI is now a magic decompiler

2026-02-10

tech

AI Agents are now magic decompilers. Previously, I ran an agent-in-the-loop to try to decompile Super Smash Bros. Melee in Dec 2024 with gpt-4o, but found that the model didn’t tend to learn from its mistakes. Since then:

Models have gotten smarter
Tools have gotten better
Other people have started to create skills & tools to help AI

It’s not just a couple of functions here or there. I’ve merged around 20 functions, and have 80 more in review. Previously, it would take me 1 day a function. My results aren’t even the most impressive though. The writer of the decomp skills set a record for the most matches in a PR. Tons of people are using their own custom agents or just Claude Code in the Discord (channel is #smash-bros-melee) every day with great results. If this is something you’re interested in, come pop on by! We could always use more tokens 🤠.

Re: Introducing…the Scrub

2025-10-13

gaming

Original Article

I’ve felt this tension between “scrubs” and people trying to have fun, especially when playing competitive Smash. I do agree that it’s frustrating when people purport to be competitive but are actually scrubs. There are some points I think are missing though:

Games can change over time, especially when you complain a ton. Within the last two decades in Melee, wobbling was discovered (2006) and then finally banned (2019). The technique was degenerate, and hated in almost all levels of play. If people just sucked it up and dealt with it, I don’t think it would have been banned. If you’re not familiar, wobbling results in you being unable to move and pummeled until you die for up to minutes at a time.
Being competitive is great; but it’s not like being competitive and “having fun” are more or less honorable than each other. When you engage in an activity with someone else, the goal isn’t always to “completely crush them.” Sometimes, it’s just to have fun, right? In the case where some techniques aren’t “fun”, it seems fine to limit yourself from using them.
- If anything it’s a bit socially awkward if you can’t sense the mismatch in competitive drive and don’t tailor to what your gaming partner is expecting.

Hollow Knight: much better with a guide!

2025-09-27

gaming

Score: 8/10

Hollow Knight: Silksong came out recently, and I got so much fomo as I had never completed the previous entry. I’ve attempted to clear the game 2 times, and just started my third playthrough. As of finishing this entry, I beat the final boss and ended up getting the 2nd best ending.

Hollow Knight is hard. If you only have a cursory interest in video games, I don’t recommend the game. As someone who is mildly masochistic and isn’t exactly good enough for Elden Ring, Hollow Knight is at the perfect threshold in difficulty for me where I can clear bosses after around 10 tries and a crashout in my journal.

`opencode` or Claude Code?

2025-08-29

tech

Just so that people don’t get confused, this opencode. I’m not a shill I promise.

TL;DR: if you have time to experiment, use opencode with sonnet-4. Otherwise, use Claude Code.

I’ve spent a lot of time with opencode as well as Claude Code. I’m going to use this as a live document talking about the tradeoffs of using either tool.

First, Claude Code is roughly SotA for a terminal AI editor for fullstack work (my domain). I’ve also tried:

Learning to read historical texts with empathy

2025-01-02

christianity lemma

Reading the Bible in an academic fashion has taught me a skill that has been extremely useful when reading other historical texts, but seems to be something that modern readers seem to be losing. I notice this especially when people struggle to read anything older than X year (e.g. 1960), because it has concepts that they fundamentally disagree with.

I call it “empathy for the author.” When reading a text and trying to understand it, my first goal is to try to learn as much about the author as possible and their worldview. Then, I attempt to understand the points they’re making. I think other people should do the same.

ChatGPT isn’t a decompiler… yet

2024-12-10

tech

Previous article: What I’m up to

Abstract / Results

It feels a bit pretentious to open a blog post with an abstract. However, I wanted to communicate up front concisely what I tried to do, and what the open areas of exploration are. Those who are interested can dig more.

I wanted to make ChatGPT into a magic decompiler for PowerPC assembly to supercharge the Super Smash Bros. Melee (“Melee”) decompilation project. I observed over a year ago that ChatGPT was surprisingly good at understanding PowerPC assembly language and generating C code that was logically equivalent. I also saw other papers that were attempting to use LLMs as decompilers.

What I’m up to

2024-09-23

tech

A lot of people have been asking what I’ve been up to since I left Plaid at the beginning of this month. I was at Plaid for 4 years, which were amazing and I am very thankful for the amazing people I’ve met and work I’ve been able to do.

I am not funemployed, and I don’t want to evoke concepts related to that. I’m grinding harder than I did while employed. It’s such a gift to be able to have software engineering skills that have been forged in a real tech company, and then let loose on personal projects. I’m working on learning as much as I can about the AI space and debating if I should make that my next 4-year move. AI has been moving faster and faster, and there are so many toy projects I want to build:

Choosing a Blogging Platform: Aesthetic and Technical Considerations

2024-09-14

tech meta

In my journey as a blogger, I’ve published posts across platforms like Medium, Substack, and other proprietary blogging stacks. When writing more and more technical stuff, I realized that some stacks were definitely better than others.

When consuming other people’s blog posts, the first thing that stood out to me was aesthetics. You get an impression about the platform and the person simply by the details of how their text looks. Does their code have great, language-specific highlights? Do they use monospace + does their platform support it? How is the image formatting? What about the base color scheme?

Older Posts >

Hi, I’m Stephen 👋

Abstract / Results#

Abstract / Results