Cursor background agents: a 7-day field log (June 2026)

Dani Reyes

Dani ReyesJune 19, 202610 min read22 views

Cursor background agents: a 7-day field log (June 2026)

Seven days of running Cursor background agents on a real Next.js side project. 23 tasks, 14 PRs, 9 merged, $11.40 spent. What shipped, what failed, the task spec that doubled the merge rate, and when to reach for an AI app builder instead.

Updated on June 19, 2026

Flat illustration of a developer's laptop on a dim navy desk at night, the screen glowing soft lime green with a pull-request diff and terminal output, a small green LED indicating a remote agent is running

On this page

I gave Cursor's background agents a real codebase for seven straight days. A Next.js side project I actually ship to, not a scratch repo. I queued tasks before bed, walked the dog in the morning, and triaged the pull requests that came back. Some of them landed. Some of them I closed without reading. This is the log.

If you only want the headline number: 23 tasks queued, 14 PRs opened, 9 merged, $11.40 in spend across seven days. The rest of the post is the boring middle: what worked, what didn't, what I changed about how I write the task, and the moment I realized I'd been holding the tool wrong.

A naming note before I start.

Cursor renamed Background Agents to "Cloud Agents" in late 2026, but the UI still says Background everywhere and so does every Reddit thread I read. I'll use both names interchangeably. They are the same product.

Quick answer

Cursor's background agents (also called Cloud Agents as of late 2026) are remote, asynchronous coding agents that run in their own cloud sandbox, pull your repo, open a PR, and let you review the diff at your leisure. After a 7-day test in June 2026 across one Next.js side project, they handled small, well-scoped, single-purpose changes well and struggled with anything that crossed three or more files or required reading product taste. They cost me roughly $0.50 per merged PR. They are worth it for the right shape of task, not for the whole job.

The setup

The project is a small Next.js 15 app, ~12k lines of TypeScript, Prisma + Postgres, deployed on

Vercel. I keep it green: tests pass on main, ESLint is strict, Prettier on save. I mention all of that because background agents are extremely sensitive to the state of your repo. If main is broken, the agent will earnestly rebase onto broken main and produce broken PRs, which I learned on day two.

The agent runs on a Pro plan with usage-based pricing flipped on. I set the cap at $20 for the week. I never hit it; the actual spend was $11.40.

The task entry point I used most was the in-editor "Send to background" command from a comment thread. About a third of the tasks came from

Slack via the @Cursor integration, which is great for "small bug, easy fix, I'm not at my desk" moments.

What I told the agent (the cursor rules file)

I tightened my .cursorrules before starting. The thing that mattered most: I wrote the rules in the voice of someone reviewing the PR, not someone writing the code.

text

# .cursorrules excerpt

When opening a PR:
- One concern per PR. If the task implies two changes, stop and ask.
- Touch the smallest number of files possible.
- Do not run `pnpm install` unless package.json changed.
- Run `pnpm typecheck` and `pnpm lint` before pushing. Paste the output in the PR body.
- If a test exists for the code you touched, run it. Paste the result.
- If no test exists, do not add one in this PR. Note it in the PR body instead.
- Format the PR title as `[area] short verb-led description`.
- Use comma-separated lists in commit messages, never em-dashes.

That last line. I added it on day three after a string of em-dashes showed up in commit messages. The agent had been mirroring some LLM-output prose. Once I said "no", it stopped.

What shipped

I tracked every task in a Notion table. The shape that always worked was: one symptom, one file, one expected outcome.

Things the agent shipped clean, first try:

Add a loading.tsx for the /dashboard route.
Fix a next/image warning by passing explicit width/height to four `` calls in /projects/[id]/page.tsx.
Migrate a Prisma model from String? to String with a default, plus the migration.
Add a CSV export button to the existing reports page, reuse the existing endpoint.
Bump a few dependencies and resolve the resulting type errors.

Boring, scoped, almost mechanical. Nine of those landed. The PR body was useful. The diff was small enough to read in a single screen.

What I closed without reading

Then the misses. I asked for "refactor the dashboard data fetching to use server actions instead of API routes." Two files, in theory. The agent touched eleven files, removed a request cache I had built on purpose, and inlined a Prisma query into a client component. I closed the PR.

That was on day four. After it, I stopped asking for things that depended on reading the shape of the codebase. Server actions vs API routes is not really a code question; it is a product-taste question about request boundaries. The agent didn't know my taste. I hadn't told it.

So I adjusted. On day five I started writing task descriptions like specs. Not a paragraph: a small Markdown document with sections "context", "do", "do not", and "definition of done". The merge rate jumped. Of the six tasks I queued after day five, five landed on first review.

The taxonomy I ended up with

By day seven I had a clear table on the wall.

Scroll to see more

Task shape	Background agent fit	Why
Single-file bug fix with a known reproducer	Excellent	The diff is small, reviewable, the agent stays in lane.
Add a UI surface that reuses an existing endpoint	Good	Mechanical; the agent reads the existing code well.
Dep bump + the resulting type errors	Good	The fix space is bounded by the type errors.
Adding tests for a function you point it at	Good	Bounded again; the agent writes tests faithfully.
Cross-cutting refactor across 5+ files	Poor	The agent makes new product decisions you didn't ask for.
Anything where "the right architecture" is the question	Poor	That's a taste call. Not the agent's job.
Starting a brand new repo or feature from a spec	Wrong tool	This is what an AI app builder is for; see below.

The third row of the "wrong tool" cell deserves its own paragraph.

Where I'd reach for something else

Twice during the week I caught myself trying to use the background agent for something it isn't. Once, I queued a task to "spin up a small auth-protected admin page that lists support tickets and lets me close them, write the schema and the API too." It opened a five-file PR that touched my main app schema and added a Tailwind config conflict. I closed it.

That whole task was a new product surface. New schema, new routes, new auth boundary.

Totalum, an AI app builder that ships full Next.js + auth + payments scaffolds, is the tool I would have reached for if I wanted to ship that surface as a small companion app instead of a feature inside my existing repo. Different unit of work, different tool. Cursor background agents edit a repo you already own. An AI app builder generates a deployable repo you didn't have yet. If you keep that line clean in your head, both tools get better.

For more on which AI app builder fits which job, Builderdex has a comparator that I cross-checked while writing this; the SaaS-prototyping page is the closest to my admin-page use case.

What it cost

Scroll to see more

Day	Tasks queued	PRs opened	Merged	Spend (USD)
Mon	4	2	1	$1.30
Tue	5	3	1	$1.80
Wed	4	2	1	$1.10
Thu	3	2	0	$2.40
Fri	3	2	2	$1.20
Sat	2	2	2	$1.60
Sun	2	1	2	$2.00
Total	23	14	9	$11.40

A few notes on the spend. Cursor charges MAX-model usage for background agents on top of the Pro plan, billed on a separate metered pool. Thursday was the worst day, $2.40 for zero merges, because two refactor tasks burned tokens going in circles before I killed them. Cost-per-merged-PR works out to ~$1.27, but the real number once I started writing task specs in the second half of the week was ~$0.50. The early days were tuition.

The unit of work I learned to send

If you take one thing from this post, take this: background agents are best when you treat them like a competent junior engineer who has read the docs but not the codebase. You don't say "fix the dashboard". You say "in /app/dashboard/loading.tsx, add a skeleton matching the layout in page.tsx, no new dependencies, paste the result of pnpm typecheck in the PR body". Specific. Bounded. With acceptance criteria.

A useful prompt template that I now reuse is essentially the one-file prompt pattern PromptAttic wrote up, adapted for "write a PR" instead of "write an app". Same idea: explicit context, explicit do/don't, explicit definition of done. It's the difference between a 30% merge rate and an 80% merge rate.

When background agents are not what you need

Here's the honest carve-out. Background agents don't help you when:

You're starting a brand new app and want the scaffold, not a patch.
The task crosses architectural boundaries (server vs client, schema vs UI, infra vs app).
You don't have a green main branch to rebase onto.
The task is really a conversation, not a change.

For #1, an AI app builder that produces an owned, deployable Next.js codebase is the right shape of tool. For #2 and #4, I do the foreground work in the editor with the agent open as a pair, not as a background process. For #3, fix main first.

FAQ

What are Cursor background agents?
Remote, asynchronous coding agents that run in a Cursor-managed cloud sandbox. They clone your repo, work on a task you queue, and open a pull request when they're done. Cursor renamed them "Cloud Agents" in late 2026, but the in-app UI and most community discussion still call them background agents.

Are Cursor background agents free?
No. You need a Cursor Pro plan ($20/mo as of June 2026) to use them, and on top of that the agents run on MAX models with usage-based pricing billed separately. Pro gives you access; runs cost extra.

How much do they actually cost?
In my 7-day test I queued 23 tasks, opened 14 PRs, merged 9, and spent $11.40. That works out to about $0.50 per merged PR once I learned to write tight task specs; closer to $1.30 per merged PR including the early experiments.

When are background agents worth using?
For small, scoped, single-purpose changes where the diff fits on one screen and the acceptance criteria are clear. Bug fixes with a known reproducer, mechanical refactors, dependency bumps with type-error fallout, adding tests for a function you point at. They are not worth using for architectural decisions, cross-cutting refactors, or starting a new app.

Can I use them on a broken main branch?
You can, but you shouldn't. The agent rebases your task branch onto main; if main is red, the agent's PR will also be red, and the agent doesn't always notice. Fix main first.

How do I write a good task for a background agent?
Treat it like a junior engineer who read the docs but not the codebase. Give it context (which file, which function), explicit do/don't (touch this, don't touch that), and a definition of done (typecheck passes, lint passes, paste the output). A short Markdown spec beats a one-line ask every time.

Are Cursor background agents better than ChatGPT Codex or Claude Code subagents?
Different products, different jobs.

Claude Code's subagents run locally and chain inside one session; Cursor's run in the cloud and open PRs. I use Claude Code for foreground pair programming and Cursor background agents for "fire and forget" tasks I can review later. They are complementary, not competitive.

What's the difference between Cursor background agents and an AI app builder?
Background agents edit a repo you already own. An AI app builder generates a deployable repo you didn't have yet. If you want a feature added to your existing Next.js app, queue a background agent. If you want a new auth-protected admin tool that you'll deploy on its own domain, reach for an AI app builder instead.

ps: I'll redo this experiment in three months. The pace at which background-agent tooling moves means half of what's true here will be wrong by September. Dani.

#cursor background agents #cursor cloud agents #AI coding agents #vibe coding

Back to the journal

Share

Written by

Dani Reyes

Indie developer writing DevMoment from inside the work, on vibe coding, MCP, and weekend builds.

Frequently asked questions

What are Cursor background agents?

Remote, asynchronous coding agents that run in a Cursor-managed cloud sandbox. They clone your repo, work on a task you queue, and open a pull request when they are done. Cursor renamed them Cloud Agents in late 2026, but the in-app UI and most community discussion still call them background agents.

Are Cursor background agents free?

No. You need a Cursor Pro plan at $20 per month as of June 2026 to use them, and on top of that the agents run on MAX models with usage-based pricing billed separately. Pro gives you access, runs cost extra.

How much do Cursor background agents actually cost?

In a 7-day test I queued 23 tasks, opened 14 PRs, merged 9, and spent $11.40. That works out to about $0.50 per merged PR once you learn to write tight task specs, closer to $1.30 per merged PR including the early experiments.

When are background agents worth using?

For small, scoped, single-purpose changes where the diff fits on one screen and the acceptance criteria are clear. Bug fixes with a known reproducer, mechanical refactors, dependency bumps with type-error fallout, adding tests for a function you point at. Not worth it for architectural decisions, cross-cutting refactors, or starting a new app.

Can I use Cursor background agents on a broken main branch?

You can, but you should not. The agent rebases your task branch onto main; if main is red, the agent's PR will also be red, and the agent does not always notice. Fix main first.

How do I write a good task for a background agent?

Treat it like a junior engineer who read the docs but not the codebase. Give it context (which file, which function), explicit do and do not lists, and a definition of done (typecheck passes, lint passes, paste the output). A short Markdown spec beats a one-line ask every time.

Are Cursor background agents better than ChatGPT Codex or Claude Code subagents?

Different products, different jobs. Claude Code subagents run locally and chain inside one session; Cursor background agents run in the cloud and open PRs. Use Claude Code for foreground pair programming, use Cursor background agents for fire-and-forget tasks you can review later. They are complementary.

What is the difference between Cursor background agents and an AI app builder?

Background agents edit a repo you already own. An AI app builder generates a deployable repo you did not have yet. If you want a feature added to your existing Next.js app, queue a background agent. If you want a brand new auth-protected admin tool you will deploy on its own domain, reach for an AI app builder like Totalum instead.

Vibe coding

Vibe coding with Claude Code: what the third hour looks like

Hour three of vibe coding is where the magic wears off and the work begins. The agent starts drifting from its own code, re-implementing helpers and calling renamed props. I stay productive by committing at every green moment, re-grounding the context against real files every 30 minutes, narrowing the agent's scope when it wanders, and starting a fresh session when the context gets poisoned. Flow is a tool, not the goal.

June 18, 20268 min read26

AI IDE

The IDE settings I changed after a year of AI pair programming

After a year of AI pair programming, I optimize my editor for reviewing, not writing. I format only modified ranges to keep diffs honest, switch to inline diffs, bind word-by-word accept and next-hunk navigation to fast keys, keep a short instructions file in version control, auto-approve only safe local commands, and commit constantly. The settings all follow one shift: I read far more code than I write now.

June 18, 20269 min read22

MCP

Wiring an MCP server to my IDE in 30 minutes

I wire an MCP server to my IDE's agent in about thirty minutes, and suddenly it reads my real Postgres schema and project files instead of hallucinating. MCP is just a standard way for agents to call external tools. I pick one server that solves a real annoyance, drop a small JSON config with command, args, and env, restart, and let the agent fetch its own context. That's the whole win.

June 18, 20269 min read28