This is something that proper prompting can fix.

antonvs · 2025-07-23T03:26:04 1753241164

Yes, but it's also something that proper training can fix, and that's the level at which the fix should probably be implemented.

The current behavior amounts to something like "attempt to complete the task at all costs," which is unlikely to provide good results, and in practice, often doesn't.

samrus · 2025-07-23T04:05:14 1753243514

But are LLMs the right models to even be able to learn such long horizon goals and how to not cheat at them?

I feel like we need a new base model where the next token prodiction itself is dynamical and RL based to be able to handle this issue properly

antonvs · 2025-07-23T05:06:11 1753247171

I was including RLHF in "training". And even the system prompt, really.

If it's true that models can be prevented from spiraling into dead ends with "proper prompting" as the comment above claimed, then it's also true that this can be addressed earlier in the process.

As it stands, this behavior isn't likely to be useful for any normal user, and it's certainly a blocker to "agentic" use.

samrus · 2025-07-24T07:31:27 1753342287

The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.

samrus · 2025-07-23T04:03:37 1753243417

Tgats running into the bitter lesson again.

The model should genwralize and understand when its reached a road block in its higher level goal. The fact that it needs a uuman to decide that for it means it wont be able to do that on its own. This is critical for the software engineer tasks we are expecting agentic models to do

4ndrewl · 2025-07-23T07:58:25 1753257505

"works with my prompt" is the new "works on my machine"

syndeo · 2025-07-23T03:19:24 1753240764

You seem to be getting downvoted, but I have to agree. I put it in my rules to ask me for confirmation before going down alternate paths like this, that it's critically important to not "give up" and undo its changes without first making a case to me about why it thinks it ought to do so.

So far, at least, that seems to help.

stingraycharles · 2025-07-23T07:08:11 1753254491

Yeah I don’t understand why, it seems like people think that “everything should be in the model”, which is just not true. Tuning the system prompt and user prompts to your needs is absolutely required before you’ll have a great time with these tools.

Just take a look at zen-mcp to see what you can achieve with proper prompting and workflow management.

philipwhiuk · 2025-07-23T07:10:40 1753254640

Because companies are claiming this stuff is intelligent

stingraycharles · 2025-07-23T07:18:35 1753255115

Intelligence is one thing, context is the other. Prompts provide context and instructions and are tailored towards your needs.

rs186 · 2025-07-23T11:49:36 1753271376

Imagine an intern did the same thing, and you say "we just need better instructions".

No! The intern needs to actually understand what they are doing. It is not just one more sentence "by the way, if this fails, check ...", because you can never enumerate all the possible situations (and you shouldn't even try), but instead you need to figure out why as soon as possible.

voidUpdate · 2025-07-23T08:06:03 1753257963

"you're holding the prompt wrong"