Hacker Newsnew | past | comments | ask | show | jobs | submit | cientifico's commentslogin

I find some parallelism between writing articles and Pull requests.

We are moving to a point in time, where we don't care if the PR was written by AI. We care that the author understand what is about, that it tested it and in general, we want the ownership.

With articles is the same. I don't care if it was written by AI, if the content is interesting, and ai make it easier to digest... That's a win win.

The problem is not the presentation. Is the content.


If the prompt is interesting, why not just post the prompt?

In my case. The prompt is normally a collection of ideas connected over time. Ai groups, structure, challenges and help me organize that ideas. Then, once I see something that I consider worth sharing, I ask to draft a blog post. 20 iterations over, and I have a blog post.

The prompt is normally larger than the content generated.


Unpopular opinion.

Until now, ideas were only relevant when the owner was able to communicate then regardless of the impact of the idea.

LLM "democratize"(VC term) sharing ideas, as people with low communication skills can be heard.


Could you give an example?

My self.

LLM helps me communicate my ideas better.

Thinking in different angles, focus on the main idea, structure in a post series... It constantly challenge my mess.

Opus and I, iterate over 20 times a single blog post.


For most users that wanted to run LLM locally, ollama solved the UX problem.

One command, and you are running the models even with the rocm drivers without knowing.

If llama provides such UX, they failed terrible at communicating that. Starting with the name. Llama.cpp: that's a cpp library! Ollama is the wrapper. That's the mental model. I don't want to build my own program! I just want to have fun :-P


Llama.cpp now has a gui installed by default. It previously lacked this. Times have changed.

Having read above article, I just gave llama.cpp a shot. It is as easy as the author says now, though definitely not documented quite as well. My quickstart:

brew install llama.cpp

llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000

Go to localhost:8000 for the Web UI. On Linux it accelerates correctly on my AMD GPU, which Ollama failed to do, though of course everyone's mileage seems to vary on this.


Was hoping it was so easy :) But I probably need to look into it some more.

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model

Edit: @below, I used `nix-shell -p llama-cpp` so not brew related. Could indeed be an older version indeed! I'll check.


As it has been discussed in a few recent threads on HN, whenever a new model is released, running it successfully may need changes in the inference backends, such as llama.cpp.

There are 2 main reasons. One is the tokenizer, where new tokenizer definitions may be mishandled by the older tokenizer parsers.

The second reason is that each model may implement differently the tool invocations, e.g. by using different delimiter tokens and different text layouts for describing the parameters of a tool invocation.

Therefore running the Gemma-4 models encountered various problems during the first days after their release, especially for the dense 31B model.

Solving these problems required both a new version of llama.cpp (also for other inference backends) and updates in the model chat template and tokenizer configuration files.

So anyone who wants to use Gemma-4 should update to the latest version of llama.cpp and to the latest models from Huggingface, because the latest updates have been a couple of days ago.


I just hit that error a few minutes ago. I build my llama.cpp from source because I use CUDA on Linux. So I made the mistake of trying to run Gemma4 on an older version I had and I got the same error. It’s possible brew installs an older version which doens’t support Gemma4 yet.

Ah it was indeed just that!

I'm now on:

$ llama --version version: 8770 (82764d8) built with GNU 15.2.0 for Linux x86_64

(From Nix unstable)

And this works as advertised, nice chat interface, but no openai API I guess, so no opencode...



Good stuff, thanx!

And that's exactly why llama.cpp is not usable by casual users. They follow the "move fast and break things" model. With ollama, you just have to make sure you're getting/building the latest version.

Its not possible to run the latest model architectures without 'moving fast'. The only thing broken here is that they are trying to use an old version with a new model.

and Ollama suffered the same fate when wanting to try new models

What fate?

the impedance mismatch between when models are released and the capability of Ollama and other servers capability for use.

I'm a bit unsure what that has to do with someone running an outdated version of the program while trying to use a model that is supported in the latest release.

While that might be true, for as long as its name is “.cpp”, people are going to think it’s a C++ library and avoid it.

This is the first I'm learning that it isn't just a C++ library.

In fact the first line of the wikipedia article is:

> llama.cpp is an open source software library


It would make sense to just make the GUI a separate project, they could call it llama.gui.

It would make even more sense to rename it to ollama, get a copyright for the name, and see how thieves complain they've been robbed :>


LlamaBarn is the MacOS app, not the HTTP API server, which is "llama-server".

On non-Apple PCs, "llama-server" is what you use, and you can connect to it either with a browser or with an application compatible with the OpenAI API.

Perhaps using "llama-server" as the name of the project would have been less confusing for newbies than "llama.cpp".

I confess that when I first heard about "llama.cpp" I also thought that it is just a library and that I have to write my own program in order to implement a complete LLM inference backend.


this looks nice but is macos only.

This is correct, and I avoided it for this reason, did not have the bandwidth to get into any cpp rabbit hole so just used whatever seemed to abstract it away.

Wait, it isn't? The name very strongly suggests that it is a text file containing C++ source code; is that not the case?

Frankly I think the cli UX and documentation is still much better for ollama.

It makes a bunch of decisions for you so you don't have to think much to get a model up and running.


I don't care about the GUI so much. Ollama lets me download, adjust and run a whole bunch of models and they are reasonably fast. Last time I compared it with Llama.cpp, finding out how to download and install models was a pain in Llama.cpp and it was also _much_ slower than Ollama.

That is not true.

If you today visit a models page on huggingface, the site will show you the exact oneliner you need to run to it on llama.cpp.

I didn't measure it, but both download and inference felt faster than ollama. One thing that was definitely better was memory usage, which may be important if you want to run small models on SCB.


"LM Studio… Jan… Msty… koboldcpp…"

Plenty of alternatives listed. Can anyone with experience suggest the likely successor to Ollama? I have a Mac Mini but don't mind a C/L tool.

I think, as was pointed out, Ollama won because of how easy it is to set up, pull down new models. I would expect similar for a replacement.


If you don't want to have to think about it, LM Studio is probably the best choice.

How about kobold.cpp then? Or LMStudio (I know it's not open source, but at least they give proper credit to llama.cpp)?

Re curation: they should strive to not integrate broken support for models and avoid uploading broken GGUFs.


> For most users that wanted to run LLM locally, ollama solved the UX problem

This does not absolve them from the license violation


agree. We can easily compare it with docker. Of course people can use runc directly, but most people select not to and use `docker run` instead.

And you can blame docker in a similar manner. LXC existed for at least 5 years before docker. But docker was just much more convenient to use for an average user.

UX is a huge factor for adoption of technology. If a project fails at creating the right interface, there is nothing wrong with creating a wrapper.


>solved the UX problem.

>One command

Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`, and that running things in the terminal is already a gigantic UX blocker (Ollama's popularity comes from the fact that it has a GUI), why are you putting the blame back on an open source project that owes you approximately zero communication ?


> Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`

There is a TON of difference. Ollama downloads the model from its own model library server, sticks it somewhere in your home folder with a hashed name and a proprietary configuration that doesn't use the in built metadata specified by the model creator. So you can't share it with any other tool, you can't change parameters like temp on the fly, and you are stuck with whatever quants they offer.


This was my issue with current client ecosystem. I get a .guff file. I should be able to open my AI Client of choice and File -> Open and select a .guff. Same as opening a .txt file. Alternatively, I have cloned a HF model, all AI Clients should automatically check for the HF cache folder.

The current offering have interfaces to HuggingFace or some model repo. They get you the model based on what they think your hardware can handle and save it to %user%/App Data/Local/%app name%/... (on windows). When I evaluated running locally I ended up with 3 different folders containing copies of the same model in different directory structures.

It seems like HuggingFace uses %user%/.cache/.. however, some of the apps still get the HF models and save them to their own directories.

Those features are 'fine' for a casual user who sticks with one program. It seems designed from the start to lock you into their wrapper. In the end they are all using llama cpp, comfy ui, openvino etc to abstract away the backed. Again this is fine but hiding the files from the user seems strange to me. If you're leaning on HF then why now use their own .cache?

In the end I get the latest llama.cpp releases for CUDA and SYCL and run llama-server. My best UX has been with LM Studio and AI Playground. I want to try Local AI and vLLM next. I just want control over the damn files.


Check out Koboldcpp. The dev has a specific philosophy about things (minimal or no dependencies, no installers, no logs, don't do anything to user's system they didn't ask for explicitly) that I find particularly agreeable. It's a single exec and includes the kitchen sink so there is no excuse not to try it.

That's one of my major annoyances with the current state of local model infrastructure: All the cruft around what should be a simple matter of downloading and using a file. All these cache directories and file renaming and config files that point to all of these things. The special, bespoke downloading cli tools. It's just kind of awkward from the point of view of someone who is used to just using simple CLI tools that do one thing. Imagine if sqlite3 required all of these paths and hashes and downloaders and configs rather than letting you just run:

   sqlite3 my database.db

> Ollama's popularity comes from the fact that it has a GUI

It's not the GUI, it's the curated model hosting platform. Way easier to use than HF for casual users.


It also made easy for casual users to think that they were running deepseek.

LM Studio also offers curation, while giving credit to llama.cpp and also easy search across all of Huggingface's GGUF's

But if you’re just a GUI wrapper then at least attribute the library you created the GUI for

but if ollama is much slower, that's cutting on your fun and you'll be having better fun with a faster GUI

You’ve completely missed the point.

Whip that llama! Oh wait, that's a different program.


Click on any Youtube video from any web in android. If you press anything that is not the back button immediately, you will loose the option to go back.

So this coming from google... it's funny. Welcome, but funny.


Thanks, I agree.

I’ve been asking myself the same thing for years. My take:

1. Peter Principle: people get promoted to their level of incompetence.

2. In many companies, it’s the only way to increase salary.

3. Some developers think it gives them more leverage or impact.

But honestly, most of the time it’s simpler: stakeholders want more output, and the best dev gets pushed into leading because there’s no one else.

It’s often less a “promotion” and more a gap the company needs to fill.


This feels jumping on the train of complain about Europe and fully bias.

If I apply my own bias I will call it:

    American companies abuse their dominance my enforcing non documented requirements, making other companies not able to reach their users.


Github says 2.8k files when selecting c (including headers...) https://github.com/search?q=repo%3Asystemd%2Fsystemd++langua...

If the project is even split in different parts that you need to understand... already makes the point.


Well to be fair, you don't need to understand how SystemD is built to know how to use it. Unit files are pretty easy to wrap your head around, it took me a while to adjust but I dig it now.

To make an analogy: another part of LFS is building a compiler toolchain. You don't need to understand GCC internals to know how to do that.


> Well to be fair, you don't need to understand how SystemD is built to know how to use it.

The attitude that you don't need to learn what is inside the magic black box is exactly the kind of thing LFS is pushing against. UNIX traditionally was a "worse is better" system, where its seen as better design to have a simple system that you can understand the internals of even if that simplicity leads to bugs. Simple systems that fit the needs of the users can evolve into complex systems that fit the needs of users. But you (arguably) can't start with a complex system that people don't use and get users.

If anyone hasn't read the full Worse Is Better article before, its your lucky day:

https://www.dreamsongs.com/RiseOfWorseIsBetter.html


LFS is full of packages that fit your description of a black box. It shows you how to compile and configure packages, but I don't remember them diving into the code internals of a single one.

I understand not wanting to shift from something that is wholly explainable to something that isn't, but it's not the end of the world.


No, its not the end of the world. And I agree, LFS isn't going to be the best resource for learning how a compiler works or cron or ntp. But the init process & systemd is so core to linux. I can certainly see the argument that they should be part of the "from scratch" parts.


You still build it from scratch (meaning you compile from source).. they don't dive into Linux code internals either.

They still explain what an init system is for and how to use it.


The problem is ultimately that by choosing one, the other gets left out. So whatever is left out just has one more nail in its coffin. With LFS being the "more or less official how-to guide of building a Linux system", therefore sysvinit is now essentially "officially" deprecated by Linux. This is what is upsetting people here.

I'm OK with that in the end because my system is a better LFS anyhow. The only part that bothers me is that the change was made with reservations, rather than him saying no and putting his foot down, insisting that sysvinit stay in regardless of Gnome/KDE. But I do understand the desire to get away from having to maintain two separate versions of the book.

Ultimately I just have to part ways with LFS for good, sadly. I'm thankful for these people teaching me how to build a Linux system. It would have been 100x harder trying to do it without them.


Linux is just a kernel, that does not ship with any sort of init system.. so I don't see how anything is being deprecated by Linux.

The LFS project is free to make any decisions that they want about what packages they're going to include in their docs. If anyone is truly that upset about this then they should volunteer their time to the project instead of commenting here about what they think the project should do IMO.


The whole point of LFS is to understand how the thing works.


nothing is actually stopping people from understanding systemd-init except a constant poorly justified flame war. it's better documented than pretty much everything that came before it.


Looking for postgres unikernel seems like some people are trying seriously...

https://nanovms.com/dev/tutorials/running-postgres-as-a-unik...


I don't know how it is in US, but in Europe, the amount of scams is growing. Twitter blue checkmark was created to distinguish real humans vs scammers.

The fine was to protected the users from that scam.

I like paying taxes to protected the users that don't have the ability to detect scams as we all here have (most of the time).

EU miss the point equally to the Congress in uuss when non tech people believe they can rule (or just lobbied).

But on this case, there will be no problem if Twitter had decided to use another checkmark for pro accounts.


That it's an osx ONLY app.


MacOS, iOS, Windows, and Linux


I was going to comment on the Mac exclusivity too which might be a bad idea now that Linux is on the rise. But you're right, there's a Linux beta too now. Thanks for the pointer.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: