Write Loops, Not Prompts

In June 2026, Boris Cherny, who leads Claude Code at Anthropic, put it this way: “I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.” Addy Osmani at Google Cloud and Peter Steinberger at OpenAI have been saying versions of the same thing in recent weeks. The emerging shorthand is “loop engineering.” The models write most of the prompts; the humans design the loop.

This is the shape of the latest phase of knowledge work. And it raises the question: what do you need to know to write useful loops?

What a Loop Requires

A loop is a small program with intent. You specify what the agent is trying to accomplish, what tools it can use, what counts as evidence that it succeeded, what to do when it fails, what it should remember between runs, and when it should stop. A useful loop has several design surfaces: discovery, delegation, verification, persistence, scheduling, and state. Osmani frames the implementation layer as automations, worktrees, skills, plugins/connectors, sub-agents, and memory.

Discovery requires you to know what context the agent needs and where to find it.
Handoff requires you to know which sub-tasks belong to which agents and what they pass to each other.
Verification requires you to know what “done” looks like, and to write the test that catches a fake done.
Persistence requires you to know what the system should remember and what it should forget.
Scheduling requires you to know when the work should happen and what should trigger it.

Loops will run without strong opinions and the output may look plausible. They also have a very good chance of being wrong. You know the old saying: “If you ask the wrong question (or don’t properly specify your goals), you’re going to get the wrong answer.”

A Quick Vocabulary

Before going further, it is worth pinning down the three terms that have been used interchangeably in nearly every AI conversation for the past two years. Anthropic draws a useful line between workflows and agents. I’ll add “task” as the base case.

A task is a single model call. Summarize this email. Classify that document. Extract these fields. One question in, one answer out.

A workflow is multiple model calls in a predefined control flow. You decide the steps, and the model fills them in.

An agent is a model using tools in a loop, deciding its own trajectory. You give it a goal, a set of tools, and a system prompt.

Put simply: With a workflow, you own the plumbing. With an agent, the model owns the plumbing. Every other tradeoff is based on that single structural choice. When you can map the decision tree in advance, build the workflow. When the task is ambiguous enough that the decision tree cannot be mapped, write the loop.

For well-defined work, a well-designed workflow can match or beat an agent on quality, predictability, cost, and auditability. Agents matter when the path cannot be mapped in advance. The difference is who stays in the loop, on what cadence, and how the work scales. Workflows ask you to be present at almost every junction; agents ask you to be present at the rubric.

My Own Practice

We run several agents inside The Palmer Group. They draft emails, monitor competitor moves, score my own writing against our corpus, summarize meeting notes, audit invoices, prioritize inbox traffic, handle expense receipts, write our internal morning brief, and dozens of other things. To make this work, you need strong opinions about every aspect of every output.

The evolution of our proposal writer is a good case study. It moved through all three definitions above as we got more capable, and the output quality stayed consistently good at every stage. What changed was the amount of human bandwidth required to produce each proposal and the volume of proposals we could produce in a given week.

Version 1, a task. When generative AI was new, the proposal writer was a single prompt. An RFP came in, we opened a chat window, pasted the RFP, asked for a first draft, and read what came back. The first answer was generic, the voice was generally sub-optimal, and the pricing was made up. We iterated the prompt, fed in voice samples, added pricing context, and over a session of guided back-and-forth we could land on a usable draft. The output, on the days it worked, was good. The human bandwidth was high. Producing one proposal could take a couple of hours. Throughput was bounded by how many of those sessions a senior person had time to run.

Version 2, a workflow. We broke the work into a sequence and owned the plumbing ourselves. Step one, an LLM call to extract requirements, scope, due date, and decision criteria from the RFP. Step two, a retrieval call to pull the five most relevant past proposals from our corpus. Step three, an LLM call to assemble a first draft using the retrieved examples as voice anchors. Step four, compute pricing and timelines. Step five, a formatter to apply the proposal template. Step six, human review. Because a senior person designed each step and reviewed each output, the drafts came out at the same quality level Version 1 reached on its best day, only consistently and predictably. The human bandwidth per proposal dropped meaningfully; the steps that used to take several hours took under an hour. Throughput rose.

Version 3, an agent. We wrote a loop. The loop watches our inboxes. When an RFP lands, an orchestrator spawns four sub-agents in parallel. One parses the RFP. One retrieves and ranks past proposals by similarity to the new request. One pulls our capabilities from our knowledge base. One estimates pricing and timelines. The orchestrator passes their outputs to a writer sub-agent that produces a draft. A separate critic sub-agent then scores the draft against four explicit criteria: voice match to our corpus, full coverage of every requirement in the RFP, clarity of the pricing section, and compliance with our preferred format. If any score falls below threshold, the critic returns specific revisions, the writer regenerates only the affected sections, and the loop runs again. When every score passes, the draft lands in the client folder and the system sends a notification. I review, I edit, I send. All SME edits feed back into the rules file so the next loop runs with the new lesson included.

Version 3 is designed to make output quality comparable to Version 2: the critic encodes the same standards I would apply in a human review. It still needs human review, but the review starts later in the process and at a higher level. What is different is what the human is doing. Version 2 needs a person or multiple people for every cycle; Version 3 needs a person to design the rubric once and review finished drafts. The system runs on its own schedule, runs many proposals in parallel, and improves over time as the rules file accumulates lessons. Volume stopped being a function of headcount. It became a function of how many RFPs we choose to respond to.

Rubric-Driven Improvement

SME edits feed back into the rules file, so the next run starts with better instructions. Some loops can also improve their own prompts, tests, or procedures when the success criteria are objective. That is not magic and it is not guaranteed. It works only when the loop has a reliable verifier, regression checks, and a human-owned standard for what “better” means.

People With Agency Create Better Agents

An SME with agency produces good output. An SME with agency who has built the right rubric and spawned a fleet of agents produces a lot more good output, on more parallel fronts than any single human could supervise step by step. The quality comes from the SME’s opinion about what good looks like, and the scale comes from encoding that opinion into a loop the system can enforce on its own.

The Org Chart Is Now an Agent Chart

I run multiple terminal windows on my laptop and use Tmux on my phone to manage them when I am away from a desk. At any given moment, several agents are working on something for me in parallel. Some are sub-agents inside an orchestrator. Some are independent loops running on a schedule. Some are one-off agents I spun up because I wanted a specific task handled while I was on a call.

This is heading toward everyone running a small organization of agents, with an org chart that looks a lot like the org chart of a small business. Some agents will be permanent staff (they run every day on a schedule, like the morning brief). Some will be specialists you call in for a project (a proposal writer, a deal evaluator). Some will be temps you spin up for an afternoon and shut down when you are done. Each one has a job description, a verification standard, and a budget. The work of the person in charge is to know what to assign to which agent, how to know when each agent has finished, and how to learn from each completed run.

Leading a workforce of agents sounds different than the leadership we grew up with, but the cognitive skills are the same. We are managing capability. We are setting standards. We are deciding when work is done.

What to Do Right Now

Pick one task you do every week with clear inputs and clear outputs. A weekly report. A status email. A standard contract review. A pricing analysis. Something repeatable, something with a discoverable definition of “done.”

If you can map the steps you would take to do it, build a workflow. You will own the plumbing. The throughput gain comes from the speed of automation; the quality comes from your design of the steps.

If the work is ambiguous enough that the steps would change from one instance to the next, write the loop and let the model own the plumbing. The throughput gain comes from running the loop on its own schedule with no human in the per-step path; the quality comes from your design of the rubric.

Either way, do not write a prompt. Specify the goal. Specify the tools. Specify what success looks like, in enough detail that the system can test for it without you. Specify what failure looks like and what should happen when it occurs. Specify when the work should stop.

Run it once. Read the output. Find what is wrong with it. Feed it back into system. Run it again. Poof! You’re a loop engineer.

The step from this first loop to a working agent is smaller than you think. The step from one agent to forty-three is mostly a matter of repetition. The step from being someone who prompts to being someone who builds loops is the giant leap for AI. Write loops, not prompts.

Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.

Loop Engineering Example: A Daily Sector Intelligence Loop

Here’s an example working spec for a recurring loop that produces a daily internal briefing on a defined public-equity coverage universe.

This is not a prompt. It is not a fixed pipeline. It is an agentic loop with a goal, tools, memory, verification rules, and stopping conditions. The human specifies what good work looks like. The agent decides what to inspect next.

A workflow says: run these five steps every morning. A loop says: here is the objective, here are the tools, here is the definition of done, here is what failure looks like, now keep working until the brief is right or until you hit a stop condition.

The Goal

Every trading day, produce a one-page internal brief on a defined sector for a business analyst. The brief answers five questions:

What changed?
Why did it change?
Which companies matter today?
What is still unknown?
What should the analyst investigate next?

The brief lands in the analyst’s Drive folder by 6:30 a.m. ET, ready to read before the market opens.

The Coverage Universe

AI infrastructure coverage universe.

Tickers: NVDA, AMD, AVGO, MRVL, ANET, MU, ASML, TSM, ARM, ORCL, DELL, SMCI, CRDO, ALAB.

The ticker list lives in the rules file. The agent may recommend additions, removals, or watchlist changes, but it may not alter the coverage universe on its own.

What Makes This a Loop

The system does not run the same investigation every morning. It starts with a blank investigation ledger and builds the day’s work from evidence.

The loop follows a simple cycle:

Observe. Pull fresh signals from markets, filings, news, transcripts, calendars, and internal notes.
Decide. Rank open leads by materiality, uncertainty, and analyst relevance.
Act. Choose the next tool or spawn a temporary specialist agent.
Verify. Check each claim against sources, thresholds, and the analyst’s rubric.
Update. Close resolved leads, open new leads, mark gaps, and revise the draft.
Repeat. Continue until the brief passes the rubric or hits a stop condition.

That is the work the human used to do manually: notice, investigate, check, revise, decide whether enough is known. The loop does it inside explicit boundaries.

The Tools

The loop may use:

A market-data API: prior close, premarket or latest available price, volume, relative volume, 52-week range, and historical volatility.
A news API with structured filtering: Bloomberg, Reuters, Benzinga, Polygon, FactSet, AlphaSense, or equivalent.
An SEC EDGAR and disclosure connector: annual, quarterly, current, and foreign-issuer filings as appropriate for each company.
An earnings-transcript and investor-presentation repository.
A corporate event calendar: earnings dates, investor days, product events, conference appearances, lockups, analyst days, and regulatory deadlines.
The firm’s internal research notes: analyst memos, prior briefs, thesis documents, call notes, and model-change logs.
A source registry: allowed sources, blocked sources, paywall rules, citation requirements, and source hierarchy.
A claim ledger: every generated claim, its source, confidence level, and verification status.
A rules file: coverage universe, thresholds, writing style, materiality rules, and standing instructions.
A lesson file: human-approved corrections from prior runs.
A writer model for synthesis.
A critic model for verification.
A final auditor model for source and format checks.
A Drive write API for publication.
A Slack, Teams, or Telegram API for notification.

The State Objects

The loop maintains four live objects during each run.

The investigation ledger tracks open leads. Each lead has a ticker, source, signal type, materiality score, uncertainty score, next action, and status.

The claim ledger tracks every sentence-level claim that may appear in the brief. No claim can enter the final draft unless the ledger links it to a source or marks it as an explicit uncertainty.

The gap ledger tracks unresolved questions, missing data, source conflicts, tool failures, and anything the analyst should not treat as settled.

The lesson file stores human-approved improvements from prior runs. The agent reads it at the start of each run but may not write durable lessons into it without analyst approval.

The state is what turns a prompt into a loop. The system knows what it has tried, what it has proved, what it still doubts, and when to stop.

The Loop Logic

Trigger: 5:30 a.m. ET, every U.S. trading day.

At startup, the loop reads the rules file, lesson file, yesterday’s brief, yesterday’s gap ledger, and the current coverage universe.

It then performs a first-pass scan. This scan is not the brief. It is a search for leads.

The scan asks:

Which tickers moved beyond their normal range?
Which companies filed something new?
Which companies appeared in credible news?
Which themes appeared across multiple companies?
Which previous gaps now have new evidence?
Which internal analyst questions remain open?
Which source conflicts need resolution?

The agent turns those signals into the initial investigation ledger.

Then the loop begins.

Lead Selection

At each turn, the agent chooses the highest-value open lead.

A lead’s priority rises when:

The price move is large.
The news comes from a primary or high-trust source.
The event affects more than one company.
The topic maps to a standing analyst theme.
A source conflict exists.
The item changes a prior thesis.
The market reaction has no identified cause.
The lead connects to yesterday’s unresolved gaps.

A lead’s priority falls when:

The story is a republication.
The item is routine.
The claim cannot be sourced.
The affected company is outside the coverage universe.
The news is old, promotional, or immaterial.

This is where the loop earns its keep. It does not summarize everything. It decides what deserves attention.

Dynamic Specialist Agents

The loop may spawn temporary specialist agents, but only when a lead requires them. There is no fixed daily roster.

Examples:

Price-Move Investigator. Used when a ticker moves beyond threshold and the cause is unclear. It checks news, filings, peer moves, index moves, factor moves, and recent analyst notes. If it cannot find a cause, it must say so.

Disclosure Reader. Used when a company files a new disclosure. It extracts what changed, what numbers matter, what language differs from prior filings, and whether the filing creates an analyst question.

Theme Mapper. Used when one story may affect several companies. For example: export controls, memory pricing, hyperscaler capex, supply constraints, margin pressure, power availability, packaging capacity, or customer concentration.

Transcript Reader. Used when a new earnings call, investor-day transcript, or conference transcript appears. It extracts management commentary on demand, supply, pricing, capex, gross margin, backlog, customer concentration, and guidance.

Source Adjudicator. Used when credible sources disagree. It ranks the sources, identifies the contradiction, and decides whether the brief can resolve it or must mark it as unresolved.

Internal-Research Comparator. Used when new information may change the firm’s existing thesis. It compares today’s evidence with prior analyst notes and flags conflicts.

Each specialist receives a narrow job, writes a structured result into the investigation ledger, and exits. The orchestrator decides whether the result closes the lead, opens a new lead, or requires another tool call.

Example Lead Path

Suppose MRVL is down 3.8% premarket and no primary-source news explains the move.

The loop opens a lead:

Ticker: MRVL
Signal: unusual price move
Known cause: none
Priority: high
Next action: investigate source of move

The Price-Move Investigator checks the market-data API, news API, filings, peer moves, index moves, and internal notes.

It finds three possibilities: a peer read-through, a negative analyst note, and a broader semiconductor selloff. The analyst note is behind a source the firm does not license. The peer read-through appears in two credible sources. The broader selloff is real but smaller than MRVL’s move.

The Source Adjudicator marks the analyst note as “not usable,” the peer read-through as “supported,” and the broad selloff as “partial context.”

The claim ledger now permits this sentence:

“MRVL underperformed the group premarket; available sources point to peer read-through and sector weakness, but the loop did not identify a company-specific primary-source catalyst.”

The gap ledger adds:

“Check whether sell-side downgrade note is available through licensed research terminal.”

The brief gets a useful sentence and a useful analyst question. It does not invent certainty.

The Brief Format

The final brief follows the same format every day, even though the investigation path changes.

Sector summary. Three to four sentences on what changed and why it matters.
Top themes. Maximum five bullets. Each theme must touch at least two companies or one company with sector-level implications.
Company watch. Ordered by materiality, not alphabetically. Every ticker receives a status: material update, minor update, no material update, or unresolved.
Market moves without identified cause. Any large move that lacks a credible explanation gets called out explicitly.
Source conflicts and data gaps. Anything the loop could not verify.
Outstanding analyst questions. Maximum seven. Every question must name a ticker, source, event, number, date, or unresolved claim.
Source appendix. Links or references for every source used in the claim ledger.

The Rubric

A critic agent scores the draft against six criteria.

Coverage. Every ticker in the coverage universe has a status. No company disappears because the agent found nothing interesting. Pass test: ticker set in draft equals ticker set in rules file.
Evidence. Every factual claim has a source in the claim ledger. Unsupported claims are removed or converted into explicit uncertainty. Pass test: every sentence with a factual assertion maps to a source, a calculation, or a gap flag.
Materiality. The brief separates signal from noise. Routine items do not crowd out market-moving items. Pass test: every top item explains why it matters to the analyst.
Causality discipline. Price moves without confirmed causes are not explained with fake confidence. Pass test: every flagged move has a named cause, partial cause, or “no identified cause” label.
Analyst utility. The questions are investigable. They are not generic. Pass test: every question contains at least one ticker, source, number, filing, named entity, event, or date.
Format. The brief is one page, uses the standard section order, and includes data quality flags. Pass test: the final auditor can parse every required section.

If the draft fails, the critic does not simply say “make it better.” It returns a structured failure:

Failed criterion
Affected section
Specific claim or missing item
Required next action: gather more evidence, rewrite, remove, escalate, or mark as unresolved

The loop then decides what to do next.

Revision Logic

When the critic finds a problem, the loop does not automatically rewrite the whole brief.

It chooses one of four actions:

Gather more evidence when a claim is plausible but under-sourced.
Rewrite when the evidence is sufficient but the prose is unclear.
Remove when the claim is unsupported or immaterial.
Escalate when the issue requires human judgment, a licensed source, or a source the loop cannot access.

This prevents the most common agent failure: polishing bad information until it sounds true.

Stop Conditions

The loop exits when one of the following occurs:

The critic and auditor both pass the brief.
The clock reaches 6:25 a.m. ET.
The loop hits its tool-call or cost budget.
A primary data source is unavailable.
A high-severity contradiction cannot be resolved.
A required source is blocked by licensing rules.
The loop reaches five revision cycles without convergence.

If the loop exits without a clean pass, it still ships the best available brief, but the top line must say:

“Partial brief: unresolved gaps remain.”

The gap ledger appears immediately below the sector summary.

No quiet failures. No hidden uncertainty. No infinite loops.

Persistence

The loop remembers what helps future runs and forgets what would pollute them.

The rules file remembers:

Coverage universe
Materiality thresholds
Source hierarchy
Blocked sources
Preferred brief structure
Standing analyst themes
Known false positives
Escalation rules
Budget limits
Holiday calendar

The state file remembers:

Yesterday’s brief
Yesterday’s unresolved gaps
Recent theme history
Recent unusual price moves
Recent source conflicts
Rubric failures from prior runs

The lesson file remembers only human-approved corrections, such as:

“Do not treat routine 144 filings as material unless insider-sale thresholds are met.”
“Always check hyperscaler capex commentary when NVDA, AVGO, ANET, or DELL move on the same morning.”
“When ASML or TSM appears in the brief, identify whether the signal came from U.S. filings, local-market disclosures, or company materials.”
“Do not attribute a premarket move to an analyst note unless the note is available through a licensed source.”

The agent may propose new lessons at the end of a run. The analyst decides whether they become durable memory.

Scheduling

Trigger: 5:30 a.m. ET on U.S. trading days.
Cutoff: 6:25 a.m. ET.
Publication target: 6:30 a.m. ET.
Manual override: the analyst can trigger the loop at any time with a focus instruction, such as “Run only on NVDA, AVGO, and hyperscaler capex” or “Re-run after Powell remarks.”
Holiday calendar: skip U.S. market holidays unless manually triggered.
Emergency mode: if a major event occurs outside the normal schedule, the analyst can run a focused version that produces an event brief instead of the full daily brief.

What the Analyst Does

The analyst does not supervise every step. The analyst owns the standard.

The analyst:

Defines the coverage universe.
Sets materiality thresholds.
Approves sources and blocked sources.
Writes the rubric.
Reviews the final brief.
Corrects misses.
Approves or rejects proposed lessons.
Updates the rules when the loop drifts.

The analyst is in the loop at the level of judgment, not keystrokes.

How the Vocabulary Maps to This Example

Task. A single model call or tool call. Examples: classify this news item, extract capex commentary from this transcript, summarize this filing section, compute whether today’s move exceeds threshold.
Workflow. A predefined sequence inside the system. Examples: final audit, source check, formatting pass, Drive publication, notification.
Agent. The investigator that chooses which lead to pursue next, which tool to use, which specialist to spawn, and whether it has enough evidence to stop.
Loop. Observe, decide, act, verify, update, repeat.
Rubric. The encoded judgment of the analyst. It defines what counts as useful, sourced, material, and done.
Persistence. The memory that survives one run and improves the next one without turning speculation into fact.

Why This Is Loop Engineering

The human did not write: “Give me a daily sector brief.” The human wrote the operating system for the brief. The loop knows the goal. It knows the tools. It knows what evidence counts. It knows what uncertainty looks like. It knows when to ask for help. It knows when to stop. Prompting asks the model for an answer. Loop engineering designs the conditions under which the system can keep working until the answer is good enough to use.