Loop Engineering: Designing the Systems That Prompt AI Agents

Loop engineering is a big change in how people work with AI agents.

Instead of sitting there and typing a prompt, waiting for the answer, then typing another prompt, you now build small systems called loops. These loops find tasks on their own, give work to the agents, check if the work is good, and keep going until the goal is reached.

Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead.

You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.

Thinking

It Didn’t Start in 2026

The idea had been around for a while before it got a name.

Back in July 2025, Geoffrey Huntley showed a very simple trick he called a “Ralph loop.”¹ It was basically a short script that looked like this:

while :; do cat PROMPT.md | claude-code ; done

Every time the agent finished, the script started it again. The agent used the files on your computer as its memory instead of trying to remember everything in one long chat. Huntley made sure the agent only worked on one small task at a time and used real tests to check its own work.

Then in March 2026, Andrej Karpathy released a project called autoresearch.² He let an agent change machine-learning code, run experiments, measure if things got better, and repeat hundreds of times with almost no help from him.

In early June 2026, Addy Osmani wrote an essay that connected all these ideas and showed how the new features in tools like Claude Code and OpenAI Codex supported them.³ That’s when people started calling it “loop engineering.”

The Simple Idea Behind It

There are two kinds of loops.

The first kind is inside the agent itself.

Reason — agent thinks
Act — agent does something
Observes — sees what happened

Repeat.

This pattern has been around since 2022 research.

Loop engineering is really about the second kind of loop, the one that wraps around the agent.

Instead of you telling the agent what to do every single time, this outer loop handles the managing part. It looks for new work on its own. For example, it can check for bugs or open tasks every morning or every few minutes. It also gives each agent its own safe workspace so they do not mess up each other’s changes.

The loop keeps special instruction files that explain how your project should work. These files help the agent follow your rules without you having to explain everything again. It can also connect to the tools you already use, such as GitHub or a task board. This lets the loop open pull requests or update tickets by itself.

Often the loop uses more than one agent working together. One agent writes or changes the code. Another agent checks the work carefully to make sure it is correct. All the important details — like what has been done and what still needs work — get saved in a simple file or board. This way the loop can remember everything even after it stops and starts again later.

Modern tools make it easy to run loops like this. In Claude Code, you can give the agent a clear goal, such as “keep working until all tests pass and there are no errors.”³ The agent will keep going, fixing things and checking the results, until it actually reaches that goal. Other tools have similar features that let the agent run until the job is truly finished.

Why People Are Excited

A well-built loop can do real work while you sleep.

It can look at failing tests from last night, open a safe copy of the project, ask an agent to fix the problem, run the tests again to check, and open a pull request with a link back to the original issue.

Companies like Stripe are already doing this at scale. They have background agents that have produced over a thousand machine-written pull requests in a week.⁴

But There Are Real Risks

Loops make both good work and bad work happen faster.

Token costs can get very high if the loop keeps trying the same thing. Agents sometimes say “I’m done” even when the work is only half finished. This is sometimes called the Ralph Wiggum problem — the agent is trying to be helpful but stops too early.

Another risk is that you stop understanding the code you ship. Because the loop is doing so much, it becomes easy to lose track of what actually changed and why.

What Comes Next

The next step people are talking about is called “loopcraft” — it's about learning how to stack several loops on top of each other. One loop might find problems. Another might fix small ones. A third might study what worked and improve the instructions for the other loops.

As the tools get better, the skill that matters most is no longer writing the perfect prompt. It is designing small, trustworthy systems that can keep working safely on their own.

The simple rule that keeps coming up is this: build powerful loops, but build them like someone who still plans to stay in control.

Ralph Wiggum as a "software engineer" by Geoffrey Huntley — the July 2025 post that introduced the Ralph loop. ↩
autoresearch by Andrej Karpathy — an autonomous ML experiment loop that edits code, runs experiments, and repeats with minimal human input. ↩
Loop Engineering by Addy Osmani — the June 2026 essay that named the practice and mapped its primitives to Claude Code and OpenAI Codex. ↩ ↩²
Minions: Stripe's one-shot, end-to-end coding agents by Stripe Engineering — background agents that merge more than a thousand machine-written pull requests per week. ↩