Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is something that proper prompting can fix.


Yes, but it's also something that proper training can fix, and that's the level at which the fix should probably be implemented.

The current behavior amounts to something like "attempt to complete the task at all costs," which is unlikely to provide good results, and in practice, often doesn't.


But are LLMs the right models to even be able to learn such long horizon goals and how to not cheat at them?

I feel like we need a new base model where the next token prodiction itself is dynamical and RL based to be able to handle this issue properly


I was including RLHF in "training". And even the system prompt, really.

If it's true that models can be prevented from spiraling into dead ends with "proper prompting" as the comment above claimed, then it's also true that this can be addressed earlier in the process.

As it stands, this behavior isn't likely to be useful for any normal user, and it's certainly a blocker to "agentic" use.


The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.


Tgats running into the bitter lesson again.

The model should genwralize and understand when its reached a road block in its higher level goal. The fact that it needs a uuman to decide that for it means it wont be able to do that on its own. This is critical for the software engineer tasks we are expecting agentic models to do


"works with my prompt" is the new "works on my machine"


You seem to be getting downvoted, but I have to agree. I put it in my rules to ask me for confirmation before going down alternate paths like this, that it's critically important to not "give up" and undo its changes without first making a case to me about why it thinks it ought to do so.

So far, at least, that seems to help.


Yeah I don’t understand why, it seems like people think that “everything should be in the model”, which is just not true. Tuning the system prompt and user prompts to your needs is absolutely required before you’ll have a great time with these tools.

Just take a look at zen-mcp to see what you can achieve with proper prompting and workflow management.


Because companies are claiming this stuff is intelligent


Intelligence is one thing, context is the other. Prompts provide context and instructions and are tailored towards your needs.


Imagine an intern did the same thing, and you say "we just need better instructions".

No! The intern needs to actually understand what they are doing. It is not just one more sentence "by the way, if this fails, check ...", because you can never enumerate all the possible situations (and you shouldn't even try), but instead you need to figure out why as soon as possible.


"you're holding the prompt wrong"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: