I think it's absurd to pretend like you can know how a stranger thinks.
If I had to predict either way, I would guess that it is significantly AI generated, but that isn't the same thing as being sure.
Almost every link submitted to HN has a comment about the content being AI generated, many of which are not, I would rather talk about the "tells" rather than make confident assertions that I can't prove.
For example, https://mydetector.ai/ai-code-detector/ says 90% likely AI. Not that I trust the tools, but there are telltales to me in this function from the site:
Certain ergonomics are hard to miss since a human who writes heavy FP would opt for a `(r) => r.date` lambda, where the computer has no problem writing out inline `function(r)`-style declarations. Similarly, the HTML mapping function could go either way, but mixing in large sets of text with hard constants would be really uncommon for humans to write.
JavaScript is always a mess, but it's a _different_ mess between humans and AI, and this function `loadCommunityReports` really reads AI-first to me.
I’ve only seen this snippet (on phone so no source access), but var + no fat arrow could also indicate someone who learned js a long time ago and use as what they’re used to.
> you will just never be open to a personal friendship with anyone you ever work with?
Building relationships with colleagues is possible but I have tried to be careful. I have made some friends over time that were once co-workers. However, they were only able to move to full friend once they moved on to other teams or companies. I don't see someone I work with day-to-day as a personal friend. I compartmentalize them, keep the relationship professional and cordial.
Moving someone to a personal friend has risks, especially if there is a chance you may work for or with them again. Some personal friendships may be able to outlast work drama, but so far I haven't had that happen for me. I've lost a few along the way due to negative conditions at work.
Have you had a personal friend that stayed around after leaving a bad situation at work? Any pointers?
My best friend is someone i worked with, and we hit it off immediately. He also was one of the people who interviewed me before hire, too. I left the company because of medication induced issues with co-workers (long boring story... careful with SSRIs kids!)
and we still ... actually he just called so i gotta cut this short
we talk 5 hours a week on the phone plus we run a PBX and chat server and stuff so we're constantly in contact.
If you want LLMs to have knowledge of the Norwegian language, wouldn't the most obvious thing to do be to build a good training dataset and make the dataset widely available? Why go to the expense of training your own model, especially when it will be inferior to state of the art models.
I task GPT/Claude with researching stuff that pertains to very specific cultural or legal aspects in French politics, on a daily basis.
Even though French is a way more common language globally than Norwegian, these models still haven't figured out that, no matter the language I myself speak to them (German or English depending on my mood) their web searches need to be done in French to return reasonable results. I have to remind them every time lest they come back with "uh, didn't find anything relevant, here take some hallucinations instead."
So, given the anglo-centrism of current models, my confidence in American providers giving any shits about non-american users/use-cases is pretty low. And lower the smaller the language community is.
I've noticed that it also imposes american moral judgements on certain things, even though it reasons (sometimes) in the native language.
I was trying to work out how and when to use swear words, and the relative power index of them. it translated english swear words into the target language then lectured me on not using them.
It took a bunch of prodding for it to actually think as the target language to then get the (mostly) correct response.
Would be curious about the model and the prompt for this.
Not kidding at all. I had a similar issue with a project where I needed to classify images into specific demographics, and Gemini, while capable, was entirely not going to do the task… until in my JSON response I left room for it to tell me why this was not a good idea and why it was culturally insensitive. Then boom… full JSON array: hair color, eye color, skin color, fitness level, likely ethnicity, likely country of origin, and about 10 other values.
You’re probably wondering what on earth I was working on. I was matching Ai gen headshots to Ai voices so that in an app the voice picker had human (Ai) faces.
Aren’t you already using English in the LLM convo? Telling the model to use French for research or to find resources in French seems like a reasonable step.
If you’re doing this on a daily basis, then you should have an AGENTS.md that accumulates directional instructions like this.
This is how you use the tool correctly.
There’s this weird pattern I’ve noticed where people expect LLMs to require zero effort or proficiency on their part, and when the LLM isn’t perfect without it, of course it wasn’t; LLMs suck.
The issue is that French, Italian, African, Japanese people shouldn't have the inconvenience of instructing the LLM tool to get the basic facts about their own culture. They should use an LLM that has already been trained like that by default. Nobody has obligation to use a tool that thinks it is talking to an American. If I go to Google for example I want to get facts about my own country in my own language.
Wouldn't those people be asking the questions in their own language in the first place? The model will reply in the language you use. This thread is about people asking for information about a language that is not the one they are messaging the LLM in
Even if the model will reply in my language, I often notice it searching in english. Or thinking in english. There's always something lost in translation. Sometimes it's just minor nuances. Other times it mangles the legal facts with those of other countries.
This sounds like the problem of people calling "911" as the emergency number which they see in so much US-American media but which is not the emergency number in their own country.
I remember being bored as a teenager on a family holiday to New Zealand in the 1990s, so I went and dialled 911 from a payphone to see what would happen-I got a recorded message saying that in New Zealand, the emergency number isn’t 911, it is 111. Dialling 000 (the Australian emergency number) produced a similar recorded message.
They always sound like an obnoxious American tourist talking through a translator, the chatbot training dataset is the same and foundation models are always built with >50% American English data for some reason.
> Aren’t you already using English in the LLM convo? Telling the model to use French for research or to find resources in French seems like a reasonable step.
Most ordinary people will just use their native language and they have no way of knowing that the model always reasons in English and therefore is strongly biased toward using English search terms. So they don't know they have to remind the model to search in their local language.
If you ask in French, it searches in French, right?
I have the opposite problem, where I'll ask in English, about something in a foreign country, the results it finds will all be in that foreign language, and the LLM will switch languages and respond in that language (which I don't speak).
So then I have to ask it "can you repeat that in English please."
I keep waiting for the new GPT-Definitelty-AGI-For-Real-This-Time to fix it but it's still there.
> If you ask in French, it searches in French, right?
not necessarily. i often prompt Claude in German and then see the reasoning happening in English. of course it will eventually reply in German, but that does not mean that the tooling in the background was using German.
Same for me - I mostly ask stuff in English but sometimes add specific terms or names in Japanese as needed. My Japanese is intermediate, but it will often switch immediately and reply only and entirely in Japanese. I'm pretty sure they have a system prompt with hairline triggers for foreign languages BECAUSE of the overrepresentation of English in the training corpora.
> their web searches need to be done in French to return reasonable results.
I wonder how much of this is also just the search engine's region setting.
It's a big problem I regularly have with Google. I almost always want English language, US-centric results, so I have my region set to the US. But occasionally I want results relevant to my actual country, and even searching in my native language usually yields much worse results than just opening an incognito tab and letting it default to my real location.
I have the opposite problem. I often have to ask ChatGPT about things related to Norway and I have to constantly correct it when it keeps switching to responding in Norwegian no matter how many times I tell it to only answer in Norwegian when I request it.
What incentives does OpenAI have to make sure the AI actually works well with Norwegian beyond capturing a (small) Norwegian market? What incentives do they have to take Norwegian values into consideration, or to preserve Norwegian culture into the future? The matter is also a question of national sovereignty, so to simply release the data and nicely ask foreign companies to solve the problem for you, would be a fool's move
It's also a bit funny because Norway definitely has enough money to hire a team of Anthropic's best to go out there and train them a model that does whatever they want. They probably have enough money to fund their own Anthropic competitor.
I highly doubt that hiring people who don't even speak the language would result in a better model for Norwegian. If anything, they could pay Anthropic for some tips and tricks for training. But that does not seem necessary as Deepseek & co detail everything for free
Considering the fact that the US is complaining about Norway putting too much money into the US market, imagine what would happen if all that money was spent in Norway. It would be chaos.
It was tried in early 1980s and nearly drove any non oil-related industry in the country extinct.
Norway has a manpower bottleneck. The UK had spent its oil windfall domestically and it barely registered. But for a nation of then some 4 million the economy melts down with so much monetary mass.
So blaming population is a cheap excuse that doesn't hold water. Especially that you can always import the skilled people you lack, when you have virtually unlimited money and some of the highest standards of living in the world.
Yeah, was about to comment that too, instead of training a new model and new weights exclusively for Norwegian (and expecting/wanting every other small/medium-sized country to do the same) which seems infinity harder, they could have made high quality transcriptions and translations of the stories currently described only in Norwegian into English, and making it all public. I guess there still would be a worry that it'd be counted as "less important" compared to other history, news and culture about other countries.
Oddly enough, my wife was recently involved in a project to translate historical crime novels from Norwegian; since all the available late 20th century Scandinavian crime novels have already been translated and turned into popular TV series, the plan was to go further back. Into the 1930s. The first cut was done with LLMs, but encountered the problem that (a) Norwegian itself has changed noticeably since then, in both major dialects, and (b) the machine translation deteriorated on large sections, resulting in entirely missing paragraphs and pages in a few places. Not to mention the usual translation issues (what police role does lensman map to?) and localisation (to what extent should the casual antisemitism be left in or removed?)
Translation is never a bijective process. It's never quite the same experience in translation as it is in the original, due to the cultural differences between reader and writer. Larger in this case because 1930s Norway is very different even from 2020s Norway.
Ultimately this was not a success due to marketing difficulties; it is very difficult to get a book noticed.
Sorry if I was unclear, I didn't want to give the impression I think translations or even transcriptions in some cases is easy, or without problems, or not painstakingly time-consuming, it very much is.
I just think building a LLM from scratch is ever harder, with more potential problems that are harder to solve, more time-consuming and even more resource-intensive.
It would require an investment, but those will pay dividends later, as it becomes easier to train LLMs on/for Norwegian. If we need to translate everything to English we might as well just drop using Norwegian altogether. Practically everyone speaks English fluently already...
> as it becomes easier to train LLMs on/for Norwegian
Why would it be easier in the future? The advances we see with LLMs today require a huge amount of data, and it's getting hard getting the amount of data just using any language, I'm having a hard time seeing how it'd get easier for Norwegians to build their own LLM, unless they seriously start to ramp up how much Norwegian content they're putting out.
> If we need to translate everything to English we might as well just drop using Norwegian altogether. Practically everyone speaks English fluently already...
Yeah I mean with that black and white perspective you can pretty much do anything and it won't matter for anything :) I think for the rest of us, what we speak daily and what we rely on professionally, can differ, and that's OK. But maybe this is just my broken Swedish mind being so used to using English professionally but then conversing in Spanish outside of work daily, YMMV.
These models will never compete with frontier models and do not need to - it is about hitting a good-enough, not being the best.
Behind the frontier, getting to a certain performance level, is getting easier over time - both sample and compute efficiency is going up.
Furthermore one can reuse investments in data (both agreements, infrastructure and datasets), compute (GPUs, servers) and know-how (training scripts, experienced engineers).
But are you seriously under the belief that all of that, plus all the other things you're forgetting about, is easier, cheaper and faster than transcriptions and translations?
I understand and agree building the LLMs yourself comes with more benefits, long-term ones especially, but still it's harder, more expensive and really time consuming work.
I do not know which is easier. I am not sure that is even well established in research for generative text tasks whether a translation-first or native-language-first is the most sample efficient?
But for a national lab I think it is money well spent to figure out the possibilities and limitations of a native-language LLMs for languages with order of 5M-10M speakers.
Yes, why wouldn't it be easier to transcribe and translate, skills humanity had for centuries, compared to LLMs that we've only learnt to build these last few years, and even require a frikken computer to do? Of course one of these is harder than the other...
Look at it from this lens: translating and transcribing these stories hasn't happened for the centuries they existed, while as you point out the skills where always there. In contrast LLMs have been here for a few years at most and everyone and their dogs are trying to get in on the "race".
With absolutely no insight into why, which one has better odds to happen first is obvious to me.
Sure, it isn't as "hot" to translate stuff as it used to be some hundreds of years ago, and building LLMs surely is "hot" today, I don't doubt more people are attempting to build LLMs today than translating huge datasets, especially if we narrow the two to exclusively "In Norwegian".
Having insights into both translations, transcriptions and attempting to build LLMs myself, I'm fairly sure which effort would be successful first, regardless of how many attempt it first.
Copyrights and statutes don't allow them to do that. The mandate of the National Library maybe permits them to make an LLM through (though I won't at all be surprised if someone sues them anyway).
wouldn't the most obvious thing to do be to build a good training dataset and make the dataset widely available?
Only if you believe other people will value that enough to expend the effort necessary to use it. If you believe other people will see it as low value and ignore it then you'd be better off doing the training yourself in order to guarantee it happens.
There's also a secondary benefit that your team doing the work will learn some useful skills while they do it.
Permissions, probably. Copyrights and statutes. Knowing the librarians, unfortunately the prestige of their job is more vested in denying you access than giving you access.
I mean it's their job to give people access to information, and they certainly do, but the mark of a professional, in their eyes, is guarding information. It's much more embarrassing for them professionally to give too much access than too little.
LLM training gives them a "respectable" way of bypassing that and give the world their information (which, in fairness, they probably all really want to do if they could).
If they wanted to they all have scanners and access to information on how to create torrents. Setting the information free isn't complicated, so it'd seem most of them, do not want to.
Where do you seed a 60 petabyte torrent? I'm sure some choice cuts of what individuals feel is important have made it to Anna's, but I don't think refusal to go on a full data liberation spree is evidence they don't care.
Any kind of radio control should be discounted when attacking a US carrier fleet, they will just be jammed.
Autonomous optically guided missiles/drones would fare better, but those are still vulnerable to being blinded by laser systems like HELIOS[0], and of course being shot down by anti air missiles or CIWS.
that would be an odd criticism because we never generated any meaningful amount of electricity from oil (and started importing Russian fossil resources 30 years before we turned nuclear power plants off). The chief source for energy in Germany was coal. Gas is primarily an industry and heating input rather than a source of power generation, gas plants have only become more popular in recent years.
What replaced all other fossil fuel sources are renewables, which at 50% are now by far the single largest source of energy.
In 2021, Russia was the source of 52% of Germany's gas. Following the expansion of the Ukraine conflict in 2022 imports from Russia quickly dropped. The biggest suppliers of gas to Germany last year were Norway, Netherlands and Belgium. LNG from the United States and other countries has increased to 10% which is not nearly enough to replace the Russian imports. The phaseout was accomplished by drastically reducing overall gas usage.
Lots of brown coal, which famously emits more radioactivity than any nuclear power plant ever did. The anti nuclear movement is a fucking joke, and the people I know personally here in Germany who oppose(d) nuclear still think the danger is an atomic bomb level explosion, or generational crippling mutations, or losing their house.
And I'm increasingly concerned that one could vibe-code a massive payload that does all of these at once - including subtle things like trying to get itself installed into personal projects and forks, so it can persist across a system wipe. We're only seeing the beginning of these attacks.
Pi (pi.dev) is fine. I'm using it with DS v4 right now. It's not close to Claude code but I think that's the point.
By the way OpenRouter version is very slow for some reason. DeepSeek platform is faster (and cheaper with the discount) if you don't mind passing the credit card number / email to this company.
As sibling said, Pi is great, and you can absolutely run it directly (there's even a plugin to use itself as a sub agent), but I mainly run it as a sub agent from other harnesses, for example running a more capable model in copilot, and then delegating simpler chunks to pi (using a cheaper model) as the sub agent. I've tried gas town and some others but never got into that way of working. I'm going to try opencode though as a less vendor specific harness than copilot/claude/gemini.
I've been using OpenCode for a few days now, I like it. It doesn't feel any "less heavy" than Claude Code (they're both massive piles of vibe-coded typescript) but for me it's essentially a 1:1 replacement for Claude Code.
Sidenote, I've been trying deepseek-v4-flash and I'm blown away. It's no Opus, but it's as cheap as tap water and punches far above its weight as a Flash model. I keep throwing tasks at it out of curiosity and it keeps solving them.
reply