The mode is getting lost because the task you gave it, is way up the context chain from what it is currently working on. It loses track of its task and starts working on other things that it notices.
The way to get around this is to never have the model just “do the thing”. Have it create a plan and create a todo list from the plan (it will do this on its own typically in Claude Code), the. You “approve” the plan, then start working against that todo list and plan.
This ensures that the “task” is never very large (it is always just the next thing on the todo list, which has already been scoped to be small) and there is never any ambiguity over what to do next.
So for your prompt I would ask it to find all locations that use the old job api and put them in a planning document. For each location, have it note if it anticipates any difficulty transitioning to the new api in the planning document. If you want to get fancy, have it use the Task tool to have a subset do the analysis, this keeps the context of the main model less cluttered. I usually use planning mode for this in Claude Code. Then look at the plan, approve it (or tweak it) and have it execute that plan.
When instructing it to execute the plan, i would even have it only execute step 1, then you review. then step 2, review and so on. this prevents it from going down a rabbit hole and you can adjust the plan's steps or change course more easily.
Are you providing the relevant files in your prompt as context? Additionally, even with files added as context, your prompt isn't very well focused. You essentially said, "do the thing", with a couple of details.
Outline through examples what you want the model to do. You mentioned there's a new signature to follow. Add that to the prompt.
Maybe try forcing it to properly plan ahead, break it down into small steps, and ask you to approve the plan.
Of course add a CLAUDE.md, put clear development guidelines into it, let it verify the git changes he did against this guidelines and of course things like a lint.
It will go off rails, especially after compaction, but you can make it correct mistakes on it's own.
- Make tasks short and break them into smaller steps. E.g. don't say "Add a UI button, and a handler that does the thing". But first add the button, confirm its showing as expected then move on to a handler, and so on.
- Do give warnings like "Don't modify unrelated code" but again what the model considers related and what you consider may not be the same (this is much easier if the tasks are small so see point 1).
- If the model keeps making similar mistakes or repeating the same broken thing its because it doesn't know how to solve the problem. These models don't have a way to tell you "I don't know" so they will just keep producing busted code. Give the model additional information to help it like you would a coworker who can't seem to make progress.
While you aren't getting consistent, usable results from your setup you shouldn't allow it to many any changes that you disagree with. For every change it proposes, review thoroughly and tell it what you'd prefer instead. It will get better during the session, and you can ask it to codify the rules you've communicated implicitly in the CLAUDE.md file.
Don’t tell it what not to do. Roughly, it doesn’t have the concept of ”not foobar”: mentioning such a negation in a prompt doesn’t do what a human would expect, and will instead cause ”foobar” activation and possibly also everything that is ”not” + ”foobar”, leading to inattention/off-task behavior as seen here.
You want your prompt to resonate with the desired output in a perfect harmony, and a ”don’t do X” is a bum note.
the task is pretty unclear. youve got one real ask, for it to remove a log statement, but you've buried it in irrelevant context.
its better if you can let it read the before code, then read the after code, then you give it the ask - remove the log, switch callers to the new interface.
then, have it write up a work/prompt ppan, and keep progress in an markdown file
You're asking Claude to refactor multiple different job types all at once, which creates too much complexity in a single pass. The prompt itself is also somewhat unclear about the specific transformations needed.
Try this:
1. Break it down by job type. Instead of "refactor the codebase to make use of the new JobDefinition.create", identify each distinct job type and refactor them one at a time. This keeps the context focused and prevents the agent from getting overwhelmed.
2. For many jobs, script it. If you have dozens/hundreds of jobs to refactor, write a shell script that:
for job_type in "EmailJob" "DataProcessingJob" "ReportJob"; do
claude --dangerously-skip-permissions -p "Refactor only ${job_type} to use the new JobDefinition.create signature: make it async, pass databaseClient at creation, remove return value and 'Job created' logs. Change ONLY ${job_type} files."
git add -A && git commit -m "Refactor ${job_type} to new signature"
done
This creates atomic commits you can review/revert individually.
3. Consider a migration shim. Have Claude create a compatibility layer so jobs can work with either the old or new signature during the refactor. This lets you test incrementally without breaking everything at once.
4. Your prompt needs clarity. Here's a clearer version:
Refactor ONLY [SpecificJobName] class to match the new JobDefinition.create signature:
- OLD: create(batch) returns result, synchronous
- NEW: create(queue, databaseClient) returns void, async
- Remove any "Job created" console.log statements
- Do NOT modify unrelated code, reorder parameters, or rename variables
The issue with your original prompt is it doesn't clearly specify the before/after states or which specific files to target. Claude Code works best with precise, mechanical instructions rather than contextual descriptions like "Previously... Now it takes..."
Pro tip: Use Claude itself to improve your prompts! Try:
claude -p "Help me write a clearer prompt for this refactoring task: [paste your original prompt]"
and save the result to a markdown file for reuse.
The key insight is that agentic tools excel at focused, well-defined transformations but struggle when the scope is too broad or the instructions are ambiguous. "Don't do anything else" is not an instruction that Claude does a good job of interpreting. The "going off the rails" behavior you're seeing is Claude trying to be helpful by "improving" code it encounters, which is why explicit constraints ("ONLY do X") are crucial rather than specifying a broad directive concerning what it shouldn't do.
A while ago, a HN comment linked to this blog by Mario Zechner https://mariozechner.at/posts/2025-06-02-prompts-are-code/ which was exactly what I needed back then. The workflow helped me analyse about 300 files. It definitely made up stuff along the way and missed valuable context that I would have seen had I done the analysis manually, but it's a good starting point.
Without a structured workflow, as you said, after 5 files, it's starts going berserk.
That said, Claude has been really bad the last few weeks, especially in VSCode Copilot. If you ask it repeatedly, it admits that's its Sonnet 3.5 and not Sonnet 4. Not sure if it's true, but Sonnets workflow has degraded the last few weeks.
You would get better results probably by using search and replace or a python script to make those changes.
The comments that insist you'd get better results by talking to it like you would to a human, I hope they're trolling. The idea that the tone of your prompt has any influence at all, is laughable.
> Plan how to refactor the codebase to use the new JobDefinition.create function introduced in git commit <git commit hash>. Split task into subtasks, if needed. Write the plan to todo.md.
...
> Start working on the task in @todo.md. Write code that follows the "Keep it simple, stupid!" principle.
be encouraging. say please. don’t speak roughly. models perform better when treated with respect. speak to it as you would speak to someone you respect.
have it create a plan. them verify its plan. then proceed to execute.
The mode is getting lost because the task you gave it, is way up the context chain from what it is currently working on. It loses track of its task and starts working on other things that it notices.
The way to get around this is to never have the model just “do the thing”. Have it create a plan and create a todo list from the plan (it will do this on its own typically in Claude Code), the. You “approve” the plan, then start working against that todo list and plan.
This ensures that the “task” is never very large (it is always just the next thing on the todo list, which has already been scoped to be small) and there is never any ambiguity over what to do next.
So for your prompt I would ask it to find all locations that use the old job api and put them in a planning document. For each location, have it note if it anticipates any difficulty transitioning to the new api in the planning document. If you want to get fancy, have it use the Task tool to have a subset do the analysis, this keeps the context of the main model less cluttered. I usually use planning mode for this in Claude Code. Then look at the plan, approve it (or tweak it) and have it execute that plan.
I agree with this, break it down further.
When instructing it to execute the plan, i would even have it only execute step 1, then you review. then step 2, review and so on. this prevents it from going down a rabbit hole and you can adjust the plan's steps or change course more easily.
Are you providing the relevant files in your prompt as context? Additionally, even with files added as context, your prompt isn't very well focused. You essentially said, "do the thing", with a couple of details.
Outline through examples what you want the model to do. You mentioned there's a new signature to follow. Add that to the prompt.
Maybe try forcing it to properly plan ahead, break it down into small steps, and ask you to approve the plan.
Of course add a CLAUDE.md, put clear development guidelines into it, let it verify the git changes he did against this guidelines and of course things like a lint.
It will go off rails, especially after compaction, but you can make it correct mistakes on it's own.
A few tips:
- Make tasks short and break them into smaller steps. E.g. don't say "Add a UI button, and a handler that does the thing". But first add the button, confirm its showing as expected then move on to a handler, and so on.
- Do give warnings like "Don't modify unrelated code" but again what the model considers related and what you consider may not be the same (this is much easier if the tasks are small so see point 1).
- If the model keeps making similar mistakes or repeating the same broken thing its because it doesn't know how to solve the problem. These models don't have a way to tell you "I don't know" so they will just keep producing busted code. Give the model additional information to help it like you would a coworker who can't seem to make progress.
While you aren't getting consistent, usable results from your setup you shouldn't allow it to many any changes that you disagree with. For every change it proposes, review thoroughly and tell it what you'd prefer instead. It will get better during the session, and you can ask it to codify the rules you've communicated implicitly in the CLAUDE.md file.
Don’t tell it what not to do. Roughly, it doesn’t have the concept of ”not foobar”: mentioning such a negation in a prompt doesn’t do what a human would expect, and will instead cause ”foobar” activation and possibly also everything that is ”not” + ”foobar”, leading to inattention/off-task behavior as seen here.
You want your prompt to resonate with the desired output in a perfect harmony, and a ”don’t do X” is a bum note.
Telling the agent to very much not do something is a lost battle. It will make everything worse, not just the stuff it messed up already.
If you expect a genuine understanding of your instructions, you will be very disappointed, no matter what you do.
The way to success is not caring about those small issues and fixing them up in the review.
If you get 95% there, then i'd say you did as well as you can hope for.
the task is pretty unclear. youve got one real ask, for it to remove a log statement, but you've buried it in irrelevant context.
its better if you can let it read the before code, then read the after code, then you give it the ask - remove the log, switch callers to the new interface.
then, have it write up a work/prompt ppan, and keep progress in an markdown file
You're asking Claude to refactor multiple different job types all at once, which creates too much complexity in a single pass. The prompt itself is also somewhat unclear about the specific transformations needed.
Try this:
1. Break it down by job type. Instead of "refactor the codebase to make use of the new JobDefinition.create", identify each distinct job type and refactor them one at a time. This keeps the context focused and prevents the agent from getting overwhelmed.
2. For many jobs, script it. If you have dozens/hundreds of jobs to refactor, write a shell script that:
This creates atomic commits you can review/revert individually.3. Consider a migration shim. Have Claude create a compatibility layer so jobs can work with either the old or new signature during the refactor. This lets you test incrementally without breaking everything at once.
4. Your prompt needs clarity. Here's a clearer version:
The issue with your original prompt is it doesn't clearly specify the before/after states or which specific files to target. Claude Code works best with precise, mechanical instructions rather than contextual descriptions like "Previously... Now it takes..."Pro tip: Use Claude itself to improve your prompts! Try:
and save the result to a markdown file for reuse.The key insight is that agentic tools excel at focused, well-defined transformations but struggle when the scope is too broad or the instructions are ambiguous. "Don't do anything else" is not an instruction that Claude does a good job of interpreting. The "going off the rails" behavior you're seeing is Claude trying to be helpful by "improving" code it encounters, which is why explicit constraints ("ONLY do X") are crucial rather than specifying a broad directive concerning what it shouldn't do.
This is agentic programming, if you're not having fun I suggest you jump ship.
A while ago, a HN comment linked to this blog by Mario Zechner https://mariozechner.at/posts/2025-06-02-prompts-are-code/ which was exactly what I needed back then. The workflow helped me analyse about 300 files. It definitely made up stuff along the way and missed valuable context that I would have seen had I done the analysis manually, but it's a good starting point.
Without a structured workflow, as you said, after 5 files, it's starts going berserk.
That said, Claude has been really bad the last few weeks, especially in VSCode Copilot. If you ask it repeatedly, it admits that's its Sonnet 3.5 and not Sonnet 4. Not sure if it's true, but Sonnets workflow has degraded the last few weeks.
You would get better results probably by using search and replace or a python script to make those changes.
The comments that insist you'd get better results by talking to it like you would to a human, I hope they're trolling. The idea that the tone of your prompt has any influence at all, is laughable.
It smells like AI in here.
$ claude
> Plan how to refactor the codebase to use the new JobDefinition.create function introduced in git commit <git commit hash>. Split task into subtasks, if needed. Write the plan to todo.md.
...
> Start working on the task in @todo.md. Write code that follows the "Keep it simple, stupid!" principle.
be encouraging. say please. don’t speak roughly. models perform better when treated with respect. speak to it as you would speak to someone you respect.
have it create a plan. them verify its plan. then proceed to execute.