Yes, but it's also something that proper training can fix, and that's the level at which the fix should probably be implemented.
The current behavior amounts to something like "attempt to complete the task at all costs," which is unlikely to provide good results, and in practice, often doesn't.
I was including RLHF in "training". And even the system prompt, really.
If it's true that models can be prevented from spiraling into dead ends with "proper prompting" as the comment above claimed, then it's also true that this can be addressed earlier in the process.
As it stands, this behavior isn't likely to be useful for any normal user, and it's certainly a blocker to "agentic" use.
The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.
The model should genwralize and understand when its reached a road block in its higher level goal. The fact that it needs a uuman to decide that for it means it wont be able to do that on its own. This is critical for the software engineer tasks we are expecting agentic models to do
You seem to be getting downvoted, but I have to agree. I put it in my rules to ask me for confirmation before going down alternate paths like this, that it's critically important to not "give up" and undo its changes without first making a case to me about why it thinks it ought to do so.
Yeah I don’t understand why, it seems like people think that “everything should be in the model”, which is just not true. Tuning the system prompt and user prompts to your needs is absolutely required before you’ll have a great time with these tools.
Just take a look at zen-mcp to see what you can achieve with proper prompting and workflow management.
Imagine an intern did the same thing, and you say "we just need better instructions".
No! The intern needs to actually understand what they are doing. It is not just one more sentence "by the way, if this fails, check ...", because you can never enumerate all the possible situations (and you shouldn't even try), but instead you need to figure out why as soon as possible.