> there's a lot more involved in senior dev work beyond producing code that work...

rurp · on Dec 16, 2024

> It's provably true that LLM's can produce working code.

You've restated this point several times but the reason it's not more convincing to many people is that simply producing code that works is rarely an actual goal on many projects. On larger projects it's much more about producing code that is consistent with the rest of the project, and is easily extensible, and is readable for your teammates, and is easy to debug when something goes wrong, is testable, and so on.

The code working is a necessary condition, but is insufficient to tell if it's a valuable contribution.

simianparrot · on Dec 16, 2024

The code working is the bare minimum. The code being right for the project and context is the basic expectation. The code being _good_ at solving its intended problem is the desired outcome, which is a combination of tradeoffs between performance, readability, ease of refactoring later, modularity, etc.

LLM's can sometimes provide the bare minimum. And then you have to refactor and massage it all the way to the good bit, but unlike looking up other people's endeavors on something like Stack Overflow, with the LLM's code I have no context why it "thought" that was a good idea. If I ask it, it may parrot something from the relevant training set, or it might be bullshitting completely. The end result? This is _more_ work for a senior dev, not less.

Hence why it has never passed my sniff test. Its code is at best the quality of code even junior developers wouldn't open a PR for yet. Or if they did they'd be asked to explain how and why and quickly learn to not open the code for review before they've properly considered the implications.

munk-a · on Dec 16, 2024

> It's provably true that LLM's can produce working code.

This is correct - but it's also true that LLMs can produce flawed code. To me the cost of telling whether code is correct or flawed is larger than the cost of me just writing correct code. This may be an AuDHD thing but I can better comprehend the correctness of a solution if I'm watching (and doing) the making of that solution than if I'm reading it after the fact.

nrawe · on Dec 16, 2024

As a developer, while I do embrace intellisense, I don't copy/paste code, because I find typing it out is a fast path to reflection and finding issues early. Copilot seems to be no better than mindlessly copy/pasting from StackOverflow.

From what I've seen of Copilots, while they can produce working code, I've not seen that much that it offers beyond the surface level which is fast enough for me to type. I am also deeply perturbed from some interviews I've done for senior candidates recently who are using them and, when asked to disable them for collaborative coding task, completely fall apart because of their dependency over knowledge.

This is not to say I do not see value in AI, LLMs or ML (I very much do). However, I code broadly at the speed of thought, and that's not really something I think will be massively aided by it.

At the same time, I know I am an outlier in my practice relative to lots around me.

While I don't doubt other improvements that may come from LLM in development, the current state of the art feels less like a mechanical drill and more like an electric triangle.

lukev · on Dec 16, 2024

Code is a liability, not an asset. It is a necessary evil to create functional software.

Senior devs know this, and factor code down to the minimum necessary.

Junior devs and LLMs think that writing code is the point and will generate lots of it without worrying about things like leverage, levels of abstraction, future extensibility, etc.

danenania · on Dec 17, 2024

LLMs can be prompted to write code that considers these things.

You can write good code or bad code with LLMs.

skydhash · on Dec 17, 2024

The code itself, whether good or bad, is a liability. Just like a car is a liability, in a perfect world, you'd teleport yourself to your destination, instead you have to drive. And because of that, roads and gas stations have to be built, you have to take care of the car, etc,.... It's all a huge pain. The code you write, you will have to document, maintain, extend, refactor, relearn, and a bunch of other activities. So yo do your best to only have the bare minimum to take care of. Anything else is just future troubles.

danenania · on Dec 17, 2024

Sure, I don’t dispute any of that. But it's not a given that using LLMs means you’re going to have unnecessary code. They can even help to reduce the amount of code. You just have to be detailed in your prompting about what you do and don’t want, and work through multiple iterations until the result is good.

Of course if you try to one shot something complex with a single line prompt, the result will be bad. This is why humans are still needed and will be for a long time imo.

lukev · on Dec 17, 2024

I'm not sure that's true. A LLM can code because it is trained on existing code.

Empirically, LLMs work best at coding when doing completely "routine" coding tasks: CRUD apps, React components, etc. Because there's lots of examples of that online.

I'm writing a data-driven query compiler and LLM code assistance fails hard, in both blatant and subtle ways. There just isn't enough training data.

Another argument: if a LLM could function like a senior dev, it could learn to program in new programming languages given the language's syntax, docs and API. In practice they cannot. It doesn't matter what you put into the context, LLMs just seem incapable of writing in niche languages.

Which to me says that, at least for now, their capabilities are based more on pattern identification and repetition than they are on reasoning.

danenania · on Dec 17, 2024

Have you tried new languages or niche languages with claude sonnet 3.5? I think if you give it docs with enough examples, it might do ok. Examples are crucial. I’ve seen it do well with CLI flags and arguments when given docs, which is a somewhat similar challenge.

That said, you’re right of course that it will do better when there’s more training data.

DaiPlusPlus · on Dec 17, 2024

> It's provably true that LLM's can produce working code

ChatGPT, even now in late-2024, still hallucinates standard-library types and methods more-often-than-not whenever I ask it to generate code for me. Granted, I don’t target the most popular platforms (i.e. React/Node/etc; I’m currently in a .NET shop, which is a minority platform now, but ChatGPT’s poor performance is surprising given the overall volume and quality of .NET content and documentation out there.

My perception is that “applications” work is more likely to be automated-away by LLMs/copilots because so much of it is so similar to everyone else’s, so I agree with those who say LLMs are only as good as there are examples of something online, whereas asking ChatGPT to write something for a less-trodden area, like Haskell or even a Windows driver, is frequently a complete waste of time as whatever it generates is far beyond salvaging.

Beyond hallucinations, my other problem lies in the small context window which means I can’t simply provide all the content it needs for context. Once a project grows past hundreds of KB of significant source I honestly don’t know how us humans are meant to get LLMs to work on them. Please educate me.

I’ll declare I have no first-hand experience with GitHub Copilot and other systems because of the poor experiences I had with ChatGPT. As you’re seemingly saying that this is a solved problem now, can you please provide some details on the projects where LLMs worked well for you? (Such as which model/service, project platform/language, the kinds of prompts, etc?). If not, then I’ll remain skeptical.

qup · on Dec 17, 2024

> still hallucinates standard-library types and methods more-often-than-not whenever I ask it to generate code for me

Not an argument, unsolicited advice: my guess is you are asking it to do too much work at once. Make much smaller changes. Try to ask for as roughly much as you would put into one git commit (per best practices)-- for me that's usually editing a dozen or less lines of code.

> Once a project grows past hundreds of KB of significant source I honestly don’t know how us humans are meant to get LLMs to work on them. Please educate me.

https://github.com/Aider-AI/aider

Edit: The author of aider puts the percentage of the code written by LLMs for each release. It's been 70%+. But some problems are still easier to handle yourself. https://github.com/Aider-AI/aider/releases

DaiPlusPlus · on Dec 18, 2024

Thank you for your response - I've aksed these questions before in other contexts but never had a reply, so pretty-much any online discussion about LLMs feels like I'm surrounded by people role-playing being on LinkedIn.

JTyQZSnP3cQGa8B · on Dec 17, 2024

> It's provably true that LLM's can produce working code

Then why can't I see this magical code that is produced? I mean a real big application with a purpose and multiple dependencies, not yet another ReactJS todo list. I've seen comments like that a hundred times already but not one repository that could be equivalent to what I currently do.

For me the experience of LLM is a bad tool that calls functions that are obsolete or do not exist at all, not very earth-shattering.