Cursor background agents: a 7-day field log (June 2026)
Seven days of running Cursor background agents on a real Next.js side project. 23 tasks, 14 PRs, 9 merged, $11.40 spent. What shipped, what failed, the task spec that doubled the merge rate, and when to reach for an AI app builder instead.
Updated on June 19, 2026

On this page
I gave Cursor's background agents a real codebase for seven straight days. A Next.js side project I actually ship to, not a scratch repo. I queued tasks before bed, walked the dog in the morning, and triaged the pull requests that came back. Some of them landed. Some of them I closed without reading. This is the log.
If you only want the headline number: 23 tasks queued, 14 PRs opened, 9 merged, $11.40 in spend across seven days. The rest of the post is the boring middle: what worked, what didn't, what I changed about how I write the task, and the moment I realized I'd been holding the tool wrong.
A naming note before I start.
Quick answer
Cursor's background agents (also called Cloud Agents as of late 2026) are remote, asynchronous coding agents that run in their own cloud sandbox, pull your repo, open a PR, and let you review the diff at your leisure. After a 7-day test in June 2026 across one Next.js side project, they handled small, well-scoped, single-purpose changes well and struggled with anything that crossed three or more files or required reading product taste. They cost me roughly $0.50 per merged PR. They are worth it for the right shape of task, not for the whole job.
The setup
The project is a small Next.js 15 app, ~12k lines of TypeScript, Prisma + Postgres, deployed on
The agent runs on a Pro plan with usage-based pricing flipped on. I set the cap at $20 for the week. I never hit it; the actual spend was $11.40.
The task entry point I used most was the in-editor "Send to background" command from a comment thread. About a third of the tasks came from @Cursor integration, which is great for "small bug, easy fix, I'm not at my desk" moments.
What I told the agent (the cursor rules file)
I tightened my .cursorrules before starting. The thing that mattered most: I wrote the rules in the voice of someone reviewing the PR, not someone writing the code.
# .cursorrules excerpt
When opening a PR:
- One concern per PR. If the task implies two changes, stop and ask.
- Touch the smallest number of files possible.
- Do not run `pnpm install` unless package.json changed.
- Run `pnpm typecheck` and `pnpm lint` before pushing. Paste the output in the PR body.
- If a test exists for the code you touched, run it. Paste the result.
- If no test exists, do not add one in this PR. Note it in the PR body instead.
- Format the PR title as `[area] short verb-led description`.
- Use comma-separated lists in commit messages, never em-dashes.
That last line. I added it on day three after a string of em-dashes showed up in commit messages. The agent had been mirroring some LLM-output prose. Once I said "no", it stopped.
What shipped
I tracked every task in a Notion table. The shape that always worked was: one symptom, one file, one expected outcome.
Things the agent shipped clean, first try:
- Add a
loading.tsxfor the/dashboardroute. - Fix a
next/imagewarning by passing explicit width/height to four `` calls in/projects/[id]/page.tsx. - Migrate a Prisma model from
String?toStringwith a default, plus the migration. - Add a CSV export button to the existing reports page, reuse the existing endpoint.
- Bump a few dependencies and resolve the resulting type errors.
Boring, scoped, almost mechanical. Nine of those landed. The PR body was useful. The diff was small enough to read in a single screen.
What I closed without reading
Then the misses. I asked for "refactor the dashboard data fetching to use server actions instead of API routes." Two files, in theory. The agent touched eleven files, removed a request cache I had built on purpose, and inlined a Prisma query into a client component. I closed the PR.
That was on day four. After it, I stopped asking for things that depended on reading the shape of the codebase. Server actions vs API routes is not really a code question; it is a product-taste question about request boundaries. The agent didn't know my taste. I hadn't told it.
So I adjusted. On day five I started writing task descriptions like specs. Not a paragraph: a small Markdown document with sections "context", "do", "do not", and "definition of done". The merge rate jumped. Of the six tasks I queued after day five, five landed on first review.
The taxonomy I ended up with
By day seven I had a clear table on the wall.
Scroll to see more
| Task shape | Background agent fit | Why |
|---|---|---|
| Single-file bug fix with a known reproducer | Excellent | The diff is small, reviewable, the agent stays in lane. |
| Add a UI surface that reuses an existing endpoint | Good | Mechanical; the agent reads the existing code well. |
| Dep bump + the resulting type errors | Good | The fix space is bounded by the type errors. |
| Adding tests for a function you point it at | Good | Bounded again; the agent writes tests faithfully. |
| Cross-cutting refactor across 5+ files | Poor | The agent makes new product decisions you didn't ask for. |
| Anything where "the right architecture" is the question | Poor | That's a taste call. Not the agent's job. |
| Starting a brand new repo or feature from a spec | Wrong tool | This is what an AI app builder is for; see below. |
The third row of the "wrong tool" cell deserves its own paragraph.
Where I'd reach for something else
Twice during the week I caught myself trying to use the background agent for something it isn't. Once, I queued a task to "spin up a small auth-protected admin page that lists support tickets and lets me close them, write the schema and the API too." It opened a five-file PR that touched my main app schema and added a Tailwind config conflict. I closed it.
That whole task was a new product surface. New schema, new routes, new auth boundary. 
For more on which AI app builder fits which job, Builderdex has a comparator that I cross-checked while writing this; the SaaS-prototyping page is the closest to my admin-page use case.
What it cost
Scroll to see more
| Day | Tasks queued | PRs opened | Merged | Spend (USD) |
|---|---|---|---|---|
| Mon | 4 | 2 | 1 | $1.30 |
| Tue | 5 | 3 | 1 | $1.80 |
| Wed | 4 | 2 | 1 | $1.10 |
| Thu | 3 | 2 | 0 | $2.40 |
| Fri | 3 | 2 | 2 | $1.20 |
| Sat | 2 | 2 | 2 | $1.60 |
| Sun | 2 | 1 | 2 | $2.00 |
| Total | 23 | 14 | 9 | $11.40 |
A few notes on the spend. Cursor charges MAX-model usage for background agents on top of the Pro plan, billed on a separate metered pool. Thursday was the worst day, $2.40 for zero merges, because two refactor tasks burned tokens going in circles before I killed them. Cost-per-merged-PR works out to ~$1.27, but the real number once I started writing task specs in the second half of the week was ~$0.50. The early days were tuition.
The unit of work I learned to send
If you take one thing from this post, take this: background agents are best when you treat them like a competent junior engineer who has read the docs but not the codebase. You don't say "fix the dashboard". You say "in /app/dashboard/loading.tsx, add a skeleton matching the layout in page.tsx, no new dependencies, paste the result of pnpm typecheck in the PR body". Specific. Bounded. With acceptance criteria.
A useful prompt template that I now reuse is essentially the one-file prompt pattern PromptAttic wrote up, adapted for "write a PR" instead of "write an app". Same idea: explicit context, explicit do/don't, explicit definition of done. It's the difference between a 30% merge rate and an 80% merge rate.
When background agents are not what you need
Here's the honest carve-out. Background agents don't help you when:
- You're starting a brand new app and want the scaffold, not a patch.
- The task crosses architectural boundaries (server vs client, schema vs UI, infra vs app).
- You don't have a green main branch to rebase onto.
- The task is really a conversation, not a change.
For #1, an AI app builder that produces an owned, deployable Next.js codebase is the right shape of tool. For #2 and #4, I do the foreground work in the editor with the agent open as a pair, not as a background process. For #3, fix main first.
FAQ
What are Cursor background agents?
Remote, asynchronous coding agents that run in a Cursor-managed cloud sandbox. They clone your repo, work on a task you queue, and open a pull request when they're done. Cursor renamed them "Cloud Agents" in late 2026, but the in-app UI and most community discussion still call them background agents.
Are Cursor background agents free?
No. You need a Cursor Pro plan ($20/mo as of June 2026) to use them, and on top of that the agents run on MAX models with usage-based pricing billed separately. Pro gives you access; runs cost extra.
How much do they actually cost?
In my 7-day test I queued 23 tasks, opened 14 PRs, merged 9, and spent $11.40. That works out to about $0.50 per merged PR once I learned to write tight task specs; closer to $1.30 per merged PR including the early experiments.
When are background agents worth using?
For small, scoped, single-purpose changes where the diff fits on one screen and the acceptance criteria are clear. Bug fixes with a known reproducer, mechanical refactors, dependency bumps with type-error fallout, adding tests for a function you point at. They are not worth using for architectural decisions, cross-cutting refactors, or starting a new app.
Can I use them on a broken main branch?
You can, but you shouldn't. The agent rebases your task branch onto main; if main is red, the agent's PR will also be red, and the agent doesn't always notice. Fix main first.
How do I write a good task for a background agent?
Treat it like a junior engineer who read the docs but not the codebase. Give it context (which file, which function), explicit do/don't (touch this, don't touch that), and a definition of done (typecheck passes, lint passes, paste the output). A short Markdown spec beats a one-line ask every time.
Are Cursor background agents better than ChatGPT Codex or Claude Code subagents?
Different products, different jobs.
What's the difference between Cursor background agents and an AI app builder?
Background agents edit a repo you already own. An AI app builder generates a deployable repo you didn't have yet. If you want a feature added to your existing Next.js app, queue a background agent. If you want a new auth-protected admin tool that you'll deploy on its own domain, reach for an AI app builder instead.
ps: I'll redo this experiment in three months. The pace at which background-agent tooling moves means half of what's true here will be wrong by September. Dani.
Written by
Dani ReyesIndie developer writing DevMoment from inside the work, on vibe coding, MCP, and weekend builds.
Frequently asked questions
What are Cursor background agents?
Remote, asynchronous coding agents that run in a Cursor-managed cloud sandbox. They clone your repo, work on a task you queue, and open a pull request when they are done. Cursor renamed them Cloud Agents in late 2026, but the in-app UI and most community discussion still call them background agents.
Are Cursor background agents free?
No. You need a Cursor Pro plan at $20 per month as of June 2026 to use them, and on top of that the agents run on MAX models with usage-based pricing billed separately. Pro gives you access, runs cost extra.
How much do Cursor background agents actually cost?
In a 7-day test I queued 23 tasks, opened 14 PRs, merged 9, and spent $11.40. That works out to about $0.50 per merged PR once you learn to write tight task specs, closer to $1.30 per merged PR including the early experiments.
When are background agents worth using?
For small, scoped, single-purpose changes where the diff fits on one screen and the acceptance criteria are clear. Bug fixes with a known reproducer, mechanical refactors, dependency bumps with type-error fallout, adding tests for a function you point at. Not worth it for architectural decisions, cross-cutting refactors, or starting a new app.
Can I use Cursor background agents on a broken main branch?
You can, but you should not. The agent rebases your task branch onto main; if main is red, the agent's PR will also be red, and the agent does not always notice. Fix main first.
How do I write a good task for a background agent?
Treat it like a junior engineer who read the docs but not the codebase. Give it context (which file, which function), explicit do and do not lists, and a definition of done (typecheck passes, lint passes, paste the output). A short Markdown spec beats a one-line ask every time.
Are Cursor background agents better than ChatGPT Codex or Claude Code subagents?
Different products, different jobs. Claude Code subagents run locally and chain inside one session; Cursor background agents run in the cloud and open PRs. Use Claude Code for foreground pair programming, use Cursor background agents for fire-and-forget tasks you can review later. They are complementary.
What is the difference between Cursor background agents and an AI app builder?
Background agents edit a repo you already own. An AI app builder generates a deployable repo you did not have yet. If you want a feature added to your existing Next.js app, queue a background agent. If you want a brand new auth-protected admin tool you will deploy on its own domain, reach for an AI app builder like Totalum instead.
Keep reading
Vibe coding with Claude Code: what the third hour looks like
Hour three of vibe coding is where the magic wears off and the work begins. The agent starts drifting from its own code, re-implementing helpers and calling renamed props. I stay productive by committing at every green moment, re-grounding the context against real files every 30 minutes, narrowing the agent's scope when it wanders, and starting a fresh session when the context gets poisoned. Flow is a tool, not the goal.
The IDE settings I changed after a year of AI pair programming
After a year of AI pair programming, I optimize my editor for reviewing, not writing. I format only modified ranges to keep diffs honest, switch to inline diffs, bind word-by-word accept and next-hunk navigation to fast keys, keep a short instructions file in version control, auto-approve only safe local commands, and commit constantly. The settings all follow one shift: I read far more code than I write now.
Wiring an MCP server to my IDE in 30 minutes
I wire an MCP server to my IDE's agent in about thirty minutes, and suddenly it reads my real Postgres schema and project files instead of hallucinating. MCP is just a standard way for agents to call external tools. I pick one server that solves a real annoyance, drop a small JSON config with command, args, and env, restart, and let the agent fetch its own context. That's the whole win.