Back to Blog
Claude CodeAIDeveloper ToolsAgentsBuild In Public

Claude Code Routines Are Running My Codebase While I Sleep

Claude Code Routines launched five weeks ago. I set one up before bed, and woke up to three automated PRs waiting in my queue. Here's what changed.

May 18, 2026·7 min read

I closed my laptop at 11pm on a Tuesday.

Wednesday morning: three draft PRs sitting in my GitHub queue. Dependency bumps, a dead export flagged for removal, and a failing test pattern with a root cause hypothesis attached. All accurate. None of them touched by me.

That's what Claude Code Routines actually are.

What a Routine Actually Is

Not a cron job with an AI wrapper.

A routine is a saved Claude Code configuration: a prompt, one or more repos, and a set of connectors — packaged once and run automatically. On a schedule. On a webhook. On a GitHub event. On Anthropic's infrastructure, not your laptop, so the work keeps running when you're asleep or offline.

The distinction that matters: it's fully agentic. When a routine hits something unexpected mid-run, it doesn't abort and emit an error code. It reasons through the problem, tries alternatives, and adapts its approach. That's not a script hitting an API. That's judgment running on a schedule.

Routines shipped April 14th as a research preview. You create them from inside Claude Code with a /schedule command — conversational setup, no YAML, no CI config to wire up. I had one running the same night they launched.

Before Routines

I want to be honest about what maintenance looked like before.

I had a recurring task in my project tracker: "check deps." It moved from this week to next week to next week. When I actually did it, I'd open npm outdated, scan the list, google a few changelogs, bump the ones that seemed safe, run tests, commit. About an hour. Happened maybe once every six weeks.

Dead code was worse. I knew it was accumulating. I didn't know how much. I'd occasionally grep for an export that felt orphaned, but I'd never done a systematic sweep. Every so often I'd find a component that had been unused for months, quietly sitting there.

Test analysis: I'd look at failures when tests failed. Flake patterns I'd notice eventually, usually after losing half an hour to a test that kept failing for no good reason across multiple unrelated PRs.

None of this was neglect. There are only so many hours. These tasks kept losing to shipping.

Three Routines I Actually Use

Nightly dependency audit. Runs at 2am UTC. Claude scans package.json, checks npm for outdated packages, flags anything with a CVE from the last 30 days, and opens a draft PR with the proposed updates sorted by risk level — critical first, cosmetic last. I review it with my coffee, approve or dismiss. 90 seconds of my time. The task that used to happen every six weeks now happens every night.

Weekly dead code sweep. Every Sunday at 8am. Scans for exports with zero imports, components not referenced in any route, utility functions with no callers. Drops the list as a comment on a pinned GitHub issue. I'm not having it delete anything automatically — not yet. But I have a clean sweep every Monday morning instead of a growing pile I pretend doesn't exist.

Post-push test analysis. Triggered on every push to main. Reads test output, identifies tests that failed three consecutive runs, distinguishes flakes from patterns, and opens an issue with a root cause hypothesis and a link to the relevant diff. Not right every time. Right often enough that I act on it before it becomes a problem.

Setting up the first one took about two minutes inside a Claude Code session:

/schedule Create a routine called "nightly-deps" that runs daily at 2am UTC.
Repo: ishansa/ishansa.dev.v2
Prompt: Audit package.json for outdated deps and CVEs.
Open a draft PR with updates ranked by risk. Label it routine-output.

Claude walked me through confirming the details. I approved. That was it. No server to manage. No pipeline to maintain. No YAML I'll forget exists.

The Shift That Actually Matters

Interactive AI is still how I spend most of my time with Claude Code. Terminal open, prompts flying, iterating. That's not going anywhere.

But routines represent something structurally different: ambient AI. The model is doing work on your codebase at 2am, on Sunday morning, on every push to main — with no human in the loop. Not sometimes. On a schedule. Always.

That shifts the question. Not "when do I use AI to help with this?" but "what is my AI doing between sessions?"

Most developers don't have an answer to that second question yet. The teams that figure it out will run noticeably tighter codebases — not because they're smarter, but because their codebase is under active maintenance even when nobody is actively maintaining it.

The compounding here is real. Every dep that gets patched the night it ages is a dep that doesn't turn into a CVE six months later. Every dead export removed in a weekly sweep is context that doesn't bloat the next model call. These are small wins. They accumulate.

The Uncomfortable Part

Here's what most builders are going to do wrong with routines.

They'll wire one up, give it write access to a real branch, and assume it's fine because the first few runs looked fine. No audit log. No review step built in. Just: agent runs, code ships.

That will work until it doesn't. One day a routine will make a perfectly reasonable-looking refactor that breaks something subtle two levels down — and because there's no structured record of what the agent decided and why, you'll spend your morning debugging the codebase when you should be debugging the agent.

Set up observability before you set up routines.

I covered this a few months back in flight recorder for AI agents. Same principle applies here. Any time you hand an AI agent unsupervised autonomy over your codebase, you need a record: what it changed, what it decided, why it decided it. Claude Code's built-in session logs are a start. Add a Slack webhook that fires on every routine completion — the diff, the output artifact, a link to the session log. Takes 20 minutes to wire up and gives you the audit trail that makes trust possible.

Review every routine output for the first two weeks even when it looks fine. That's how you catch the edge cases before they become incidents. Once you've seen it run 15 times cleanly and understand where its edges are, you can start trusting it to run without review.

Don't skip this. The agents aren't untrustworthy. But trust should be earned with receipts, not assumed.

Usage Limits Worth Knowing

Pro users get 5 routine runs per day. Max gets 15. Team and Enterprise get 25.

If you're on Pro, that's enough for one or two well-scoped daily routines plus a handful of event-triggered ones. Don't try to automate everything on day one. One narrow routine that runs reliably for a month is worth more than five ambitious ones that occasionally make a mess you have to clean up.

Same pattern I saw with Skills and subagents: design deliberately, not all at once. Build the routine. Watch it run ten times. Earn the trust incrementally. Then expand.

Where to Start

Pick the maintenance task you defer most consistently.

The one that lives in your head as "I should do this every week" and hasn't been done in two months. Dependency audit, dead code sweep, stale TODO comments, failing test triage, checking that your environment variable docs match what's actually in .env.example — any of these work.

Scope it narrow. Give the routine a single, specific job. A clear output artifact: a draft PR, a GitHub issue comment, a Slack message. Something you can read, verify, and act on in under two minutes.

Run it. Watch it. Learn where its edges are. Give it a month before you add a second one.

The limiting factor isn't what Claude Code can do unsupervised. The limiting factor is how much autonomy you're prepared to grant to a process you can't watch in real time. Build that up with receipts, not wishful thinking.

Let the agent run at night.