Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The finding I did not expect: model quality matters more than token speed for agentic coding.

I'm really surprised how that was not obvious.

Also, instead of limiting context size to something like 32k, at the cost of ~halving token generation speed, you can offload MoE stuff to the CPU with --cpu-moe.

 help



Why would token speed matter for anything other than getting work done faster? It's in the name - "speed".

Yeah, it’s like drinking coffee when being really tired. You’re still tired, just “faster”, it’s a weird sensation.

It's even more strange how its not obvious to someone who uses codex extensively daily.

The rate limiting step is the LLM going down stupid rabbit holes or overthinking hard and getting decision paralysis.

The only time raw speed really matters is if you are trying to add many many lines of new code. But if you are doing that at token limiting rates you are going to be approaching the singularity of AI slop codebase in no time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: