Hacker Newsnew | past | comments | ask | show | jobs | submit | WatchDog's commentslogin

By the same token, you aren't entitled to see the website content.

True, but that's at the discretion of the content author/publisher, not Cloudflare Turnstile.

It's the publisher that enabled Turnstile.

Your theory about token density seems reasonable, but your data doesn't seem to really match it.

Very little difference between TypeScript and JavaScript, which are essentially the same language, just one has more tokens.

Functional languages like Clojure and OCaml are pretty dense, I would have expected them to feature lower.

Kotlin is in some ways a more token dense version of Java, yet Kotlin leads, and Java is almost last.


It might be, I'm not sure.

The code is interesting though, it's not minified, it's very readable, and nicely indented with lots of comments.

The curated data center list is just some inline JSON.

The javascript uses var instead of let or const, I'm not sure if this is just style choice, or there is some code post processing.

It doesn't use react, AI seems to almost always opt for react for front end design, unless told otherwise.


[flagged]


I think it's absurd to pretend like you can know how a stranger thinks.

If I had to predict either way, I would guess that it is significantly AI generated, but that isn't the same thing as being sure.

Almost every link submitted to HN has a comment about the content being AI generated, many of which are not, I would rather talk about the "tells" rather than make confident assertions that I can't prove.


For example, https://mydetector.ai/ai-code-detector/ says 90% likely AI. Not that I trust the tools, but there are telltales to me in this function from the site:

     Object.values(zipBuckets).forEach(function(b) {
       var latest = b.reports.map(function(r){return r.date;}).sort().reverse()[0];
       // ...
       var popup =
         '<div style="font-family: Inter, sans-serif; min-width:220px; max-width:280px;">' +
         (noteLines ? '<ul style="font-size:12px; color:#3a4a2a; padding-left:16px; margin:0;">' + noteLines + '</ul>' : '') +
       // ...
     }
Certain ergonomics are hard to miss since a human who writes heavy FP would opt for a `(r) => r.date` lambda, where the computer has no problem writing out inline `function(r)`-style declarations. Similarly, the HTML mapping function could go either way, but mixing in large sets of text with hard constants would be really uncommon for humans to write.

JavaScript is always a mess, but it's a _different_ mess between humans and AI, and this function `loadCommunityReports` really reads AI-first to me.


I’ve only seen this snippet (on phone so no source access), but var + no fat arrow could also indicate someone who learned js a long time ago and use as what they’re used to.

People who provide nothing but comments like “this is ai!” actually contribute far less than AI responses somehow.

> ...I never will do is cross the "just business"/"personal" line with anyone I may or am working with.

Just in an interview situation, or you will just never be open to a personal friendship with anyone you ever work with?


> you will just never be open to a personal friendship with anyone you ever work with?

Building relationships with colleagues is possible but I have tried to be careful. I have made some friends over time that were once co-workers. However, they were only able to move to full friend once they moved on to other teams or companies. I don't see someone I work with day-to-day as a personal friend. I compartmentalize them, keep the relationship professional and cordial.

Moving someone to a personal friend has risks, especially if there is a chance you may work for or with them again. Some personal friendships may be able to outlast work drama, but so far I haven't had that happen for me. I've lost a few along the way due to negative conditions at work.

Have you had a personal friend that stayed around after leaving a bad situation at work? Any pointers?


My best friend is someone i worked with, and we hit it off immediately. He also was one of the people who interviewed me before hire, too. I left the company because of medication induced issues with co-workers (long boring story... careful with SSRIs kids!)

and we still ... actually he just called so i gotta cut this short we talk 5 hours a week on the phone plus we run a PBX and chat server and stuff so we're constantly in contact.


If you want LLMs to have knowledge of the Norwegian language, wouldn't the most obvious thing to do be to build a good training dataset and make the dataset widely available? Why go to the expense of training your own model, especially when it will be inferior to state of the art models.

I task GPT/Claude with researching stuff that pertains to very specific cultural or legal aspects in French politics, on a daily basis. Even though French is a way more common language globally than Norwegian, these models still haven't figured out that, no matter the language I myself speak to them (German or English depending on my mood) their web searches need to be done in French to return reasonable results. I have to remind them every time lest they come back with "uh, didn't find anything relevant, here take some hallucinations instead."

So, given the anglo-centrism of current models, my confidence in American providers giving any shits about non-american users/use-cases is pretty low. And lower the smaller the language community is.


I've noticed that it also imposes american moral judgements on certain things, even though it reasons (sometimes) in the native language.

I was trying to work out how and when to use swear words, and the relative power index of them. it translated english swear words into the target language then lectured me on not using them.

It took a bunch of prodding for it to actually think as the target language to then get the (mostly) correct response.


Would be curious about the model and the prompt for this.

Not kidding at all. I had a similar issue with a project where I needed to classify images into specific demographics, and Gemini, while capable, was entirely not going to do the task… until in my JSON response I left room for it to tell me why this was not a good idea and why it was culturally insensitive. Then boom… full JSON array: hair color, eye color, skin color, fitness level, likely ethnicity, likely country of origin, and about 10 other values.

You’re probably wondering what on earth I was working on. I was matching Ai gen headshots to Ai voices so that in an app the voice picker had human (Ai) faces.


Aren’t you already using English in the LLM convo? Telling the model to use French for research or to find resources in French seems like a reasonable step.

If you’re doing this on a daily basis, then you should have an AGENTS.md that accumulates directional instructions like this.

This is how you use the tool correctly.

There’s this weird pattern I’ve noticed where people expect LLMs to require zero effort or proficiency on their part, and when the LLM isn’t perfect without it, of course it wasn’t; LLMs suck.


The issue is that French, Italian, African, Japanese people shouldn't have the inconvenience of instructing the LLM tool to get the basic facts about their own culture. They should use an LLM that has already been trained like that by default. Nobody has obligation to use a tool that thinks it is talking to an American. If I go to Google for example I want to get facts about my own country in my own language.

Wouldn't those people be asking the questions in their own language in the first place? The model will reply in the language you use. This thread is about people asking for information about a language that is not the one they are messaging the LLM in

Even if the model will reply in my language, I often notice it searching in english. Or thinking in english. There's always something lost in translation. Sometimes it's just minor nuances. Other times it mangles the legal facts with those of other countries.

This sounds like the problem of people calling "911" as the emergency number which they see in so much US-American media but which is not the emergency number in their own country.

I remember being bored as a teenager on a family holiday to New Zealand in the 1990s, so I went and dialled 911 from a payphone to see what would happen-I got a recorded message saying that in New Zealand, the emergency number isn’t 911, it is 111. Dialling 000 (the Australian emergency number) produced a similar recorded message.

In a lot of countries, they redirect the number or put a voice message to the correct emergency number

They always sound like an obnoxious American tourist talking through a translator, the chatbot training dataset is the same and foundation models are always built with >50% American English data for some reason.

>Nobody has obligation to use a tool that thinks it is talking to an American

Very very emphatic agree from my end, thanks.


> Nobody has obligation to use a tool that thinks it is talking to an American.

Then add top-level instructions saying what country you're from, what country you live in now, and which language you speak. This isn't that hard.


None of that even addresses the problem described, because none of the languages you mentioned would be French in the described example.

> Aren’t you already using English in the LLM convo? Telling the model to use French for research or to find resources in French seems like a reasonable step.

Most ordinary people will just use their native language and they have no way of knowing that the model always reasons in English and therefore is strongly biased toward using English search terms. So they don't know they have to remind the model to search in their local language.


If you ask in French, it searches in French, right?

I have the opposite problem, where I'll ask in English, about something in a foreign country, the results it finds will all be in that foreign language, and the LLM will switch languages and respond in that language (which I don't speak).

So then I have to ask it "can you repeat that in English please."

I keep waiting for the new GPT-Definitelty-AGI-For-Real-This-Time to fix it but it's still there.


> If you ask in French, it searches in French, right?

not necessarily. i often prompt Claude in German and then see the reasoning happening in English. of course it will eventually reply in German, but that does not mean that the tooling in the background was using German.


Same for me - I mostly ask stuff in English but sometimes add specific terms or names in Japanese as needed. My Japanese is intermediate, but it will often switch immediately and reply only and entirely in Japanese. I'm pretty sure they have a system prompt with hairline triggers for foreign languages BECAUSE of the overrepresentation of English in the training corpora.

> their web searches need to be done in French to return reasonable results.

I wonder how much of this is also just the search engine's region setting.

It's a big problem I regularly have with Google. I almost always want English language, US-centric results, so I have my region set to the US. But occasionally I want results relevant to my actual country, and even searching in my native language usually yields much worse results than just opening an incognito tab and letting it default to my real location.


I gave up on Google's language and region settings a long time ago, years before giving up on google as a product.

To this day they still think I'm in Sweden sometimes, in Paris other times, or in Germany, while I haven't lived in any of those places for years.


Have you tried asking it to translate the prompt to French, and then feeding it the translated prompt?

I have the opposite problem. I often have to ask ChatGPT about things related to Norway and I have to constantly correct it when it keeps switching to responding in Norwegian no matter how many times I tell it to only answer in Norwegian when I request it.

What incentives does OpenAI have to make sure the AI actually works well with Norwegian beyond capturing a (small) Norwegian market? What incentives do they have to take Norwegian values into consideration, or to preserve Norwegian culture into the future? The matter is also a question of national sovereignty, so to simply release the data and nicely ask foreign companies to solve the problem for you, would be a fool's move

It's also a bit funny because Norway definitely has enough money to hire a team of Anthropic's best to go out there and train them a model that does whatever they want. They probably have enough money to fund their own Anthropic competitor.

I highly doubt that hiring people who don't even speak the language would result in a better model for Norwegian. If anything, they could pay Anthropic for some tips and tricks for training. But that does not seem necessary as Deepseek & co detail everything for free

>They probably have enough money to fund their own Anthropic competitor.

Which is bizarre to me Norway doesn't have a booming tech sector with all hat wealth fund acting as the biggest VC.

They instead use their wealth fund to invest in US's tech sector. Baffling.


The point of the fund is to invest outside of Norway so as to avoid the Norwegian economy overheating and increasing inflation

Considering the fact that the US is complaining about Norway putting too much money into the US market, imagine what would happen if all that money was spent in Norway. It would be chaos.

> imagine what would happen if all that money was spent in Norway.

It would create jobs, sovereignty, intellectual property and soft power?

Instead it goes to strengthening the tech monopoly of a country that threatens to invade your neighbour.


It was tried in early 1980s and nearly drove any non oil-related industry in the country extinct.

Norway has a manpower bottleneck. The UK had spent its oil windfall domestically and it barely registered. But for a nation of then some 4 million the economy melts down with so much monetary mass.


There's only so much you can do with 5 million people. Especially in a field where network effects amd scale matter a lot.

Finland has same population as Norway, has way less money, but has 3x the scaleups. Even bigger difference with vs Netherlands.

Even Norway themselves admit they're the underperformers of the Nordics. https://skywlkr.no/wp-content/uploads/2019/10/TechScaleupNor...

So blaming population is a cheap excuse that doesn't hold water. Especially that you can always import the skilled people you lack, when you have virtually unlimited money and some of the highest standards of living in the world.


The fund is specifically mandated to not invest inside Norway to avoid making an enormous bubble and sky high inflation.

Yeah, was about to comment that too, instead of training a new model and new weights exclusively for Norwegian (and expecting/wanting every other small/medium-sized country to do the same) which seems infinity harder, they could have made high quality transcriptions and translations of the stories currently described only in Norwegian into English, and making it all public. I guess there still would be a worry that it'd be counted as "less important" compared to other history, news and culture about other countries.

Oddly enough, my wife was recently involved in a project to translate historical crime novels from Norwegian; since all the available late 20th century Scandinavian crime novels have already been translated and turned into popular TV series, the plan was to go further back. Into the 1930s. The first cut was done with LLMs, but encountered the problem that (a) Norwegian itself has changed noticeably since then, in both major dialects, and (b) the machine translation deteriorated on large sections, resulting in entirely missing paragraphs and pages in a few places. Not to mention the usual translation issues (what police role does lensman map to?) and localisation (to what extent should the casual antisemitism be left in or removed?)

Translation is never a bijective process. It's never quite the same experience in translation as it is in the original, due to the cultural differences between reader and writer. Larger in this case because 1930s Norway is very different even from 2020s Norway.

Ultimately this was not a success due to marketing difficulties; it is very difficult to get a book noticed.

( https://www.amazon.co.uk/Iron-Chariot-Nordic-Crime-Library/d... )


Sorry if I was unclear, I didn't want to give the impression I think translations or even transcriptions in some cases is easy, or without problems, or not painstakingly time-consuming, it very much is.

I just think building a LLM from scratch is ever harder, with more potential problems that are harder to solve, more time-consuming and even more resource-intensive.


It would require an investment, but those will pay dividends later, as it becomes easier to train LLMs on/for Norwegian. If we need to translate everything to English we might as well just drop using Norwegian altogether. Practically everyone speaks English fluently already...

> as it becomes easier to train LLMs on/for Norwegian

Why would it be easier in the future? The advances we see with LLMs today require a huge amount of data, and it's getting hard getting the amount of data just using any language, I'm having a hard time seeing how it'd get easier for Norwegians to build their own LLM, unless they seriously start to ramp up how much Norwegian content they're putting out.

> If we need to translate everything to English we might as well just drop using Norwegian altogether. Practically everyone speaks English fluently already...

Yeah I mean with that black and white perspective you can pretty much do anything and it won't matter for anything :) I think for the rest of us, what we speak daily and what we rely on professionally, can differ, and that's OK. But maybe this is just my broken Swedish mind being so used to using English professionally but then conversing in Spanish outside of work daily, YMMV.


These models will never compete with frontier models and do not need to - it is about hitting a good-enough, not being the best. Behind the frontier, getting to a certain performance level, is getting easier over time - both sample and compute efficiency is going up.

Furthermore one can reuse investments in data (both agreements, infrastructure and datasets), compute (GPUs, servers) and know-how (training scripts, experienced engineers).


But are you seriously under the belief that all of that, plus all the other things you're forgetting about, is easier, cheaper and faster than transcriptions and translations?

I understand and agree building the LLMs yourself comes with more benefits, long-term ones especially, but still it's harder, more expensive and really time consuming work.


I do not know which is easier. I am not sure that is even well established in research for generative text tasks whether a translation-first or native-language-first is the most sample efficient?

But for a national lab I think it is money well spent to figure out the possibilities and limitations of a native-language LLMs for languages with order of 5M-10M speakers.


> in both major dialects

Nynorsk and bokmål is not dialects but variants of written Norwegian.


> high quality transcriptions and translations of the stories currently described only in Norwegian into English

You make it sound like an easier task than training an LLM. I'd argue it's not obvious, and would assume the contrary.


Yes, why wouldn't it be easier to transcribe and translate, skills humanity had for centuries, compared to LLMs that we've only learnt to build these last few years, and even require a frikken computer to do? Of course one of these is harder than the other...

Look at it from this lens: translating and transcribing these stories hasn't happened for the centuries they existed, while as you point out the skills where always there. In contrast LLMs have been here for a few years at most and everyone and their dogs are trying to get in on the "race".

With absolutely no insight into why, which one has better odds to happen first is obvious to me.


Sure, it isn't as "hot" to translate stuff as it used to be some hundreds of years ago, and building LLMs surely is "hot" today, I don't doubt more people are attempting to build LLMs today than translating huge datasets, especially if we narrow the two to exclusively "In Norwegian".

Having insights into both translations, transcriptions and attempting to build LLMs myself, I'm fairly sure which effort would be successful first, regardless of how many attempt it first.


Copyrights and statutes don't allow them to do that. The mandate of the National Library maybe permits them to make an LLM through (though I won't at all be surprised if someone sues them anyway).

absolutely. somebody online was wanting an LLM with Georgian language support, and that's exactly what i suggested: start digitizing Georgian text.

wouldn't the most obvious thing to do be to build a good training dataset and make the dataset widely available?

Only if you believe other people will value that enough to expend the effort necessary to use it. If you believe other people will see it as low value and ignore it then you'd be better off doing the training yourself in order to guarantee it happens.

There's also a secondary benefit that your team doing the work will learn some useful skills while they do it.


Because state of the art models are owned and controlled by foreign agents.

Because you have so much money you don’t know what to do with it any more.

Permissions, probably. Copyrights and statutes. Knowing the librarians, unfortunately the prestige of their job is more vested in denying you access than giving you access.

I mean it's their job to give people access to information, and they certainly do, but the mark of a professional, in their eyes, is guarding information. It's much more embarrassing for them professionally to give too much access than too little.

LLM training gives them a "respectable" way of bypassing that and give the world their information (which, in fairness, they probably all really want to do if they could).


If they wanted to they all have scanners and access to information on how to create torrents. Setting the information free isn't complicated, so it'd seem most of them, do not want to.

Where do you seed a 60 petabyte torrent? I'm sure some choice cuts of what individuals feel is important have made it to Anna's, but I don't think refusal to go on a full data liberation spree is evidence they don't care.

> Why go to the expense of training your own model, especially when it will be inferior to state of the art models.

Uuh.. No? Especially of the training data, as in this case, is of better quality.


> Why go to the expense...

Answer: idiocy of decision makers and the desire to get resources by those who created the proposal.

I assumed Scandinavia has better decision processes but apparently I was wrong.


Any kind of radio control should be discounted when attacking a US carrier fleet, they will just be jammed.

Autonomous optically guided missiles/drones would fare better, but those are still vulnerable to being blinded by laser systems like HELIOS[0], and of course being shot down by anti air missiles or CIWS.

[0]: https://en.wikipedia.org/wiki/High_Energy_Laser_with_Integra...


If the suggested political impact of this music is to be believed, the music might be one of the biggest environmental disasters of all time.

Germany has been pretty widely criticized for decommissioning it's nuclear power program, only to replace it with Russian oil.


that would be an odd criticism because we never generated any meaningful amount of electricity from oil (and started importing Russian fossil resources 30 years before we turned nuclear power plants off). The chief source for energy in Germany was coal. Gas is primarily an industry and heating input rather than a source of power generation, gas plants have only become more popular in recent years.

What replaced all other fossil fuel sources are renewables, which at 50% are now by far the single largest source of energy.


>> only to replace it with Russian oil

with Russian gas.


s/Russian/American/

Either way, Germany has perfected the efficient foot bullet, at least.

I could imagine Kraftwerk devising a stonkin’ “Fußkugel” track, actually ..


In 2021, Russia was the source of 52% of Germany's gas. Following the expansion of the Ukraine conflict in 2022 imports from Russia quickly dropped. The biggest suppliers of gas to Germany last year were Norway, Netherlands and Belgium. LNG from the United States and other countries has increased to 10% which is not nearly enough to replace the Russian imports. The phaseout was accomplished by drastically reducing overall gas usage.

[0] https://www.bundesnetzagentur.de/SharedDocs/Pressemitteilung... [1] https://www.bundesnetzagentur.de/SharedDocs/Pressemitteilung...


Check your facts, please: Germans figuratively shoot themselves in the knee, not the foot ;)


See also: the back!

:)


Lots of brown coal, which famously emits more radioactivity than any nuclear power plant ever did. The anti nuclear movement is a fucking joke, and the people I know personally here in Germany who oppose(d) nuclear still think the danger is an atomic bomb level explosion, or generational crippling mutations, or losing their house.


There a million ways that malware can persist without root.


And I'm increasingly concerned that one could vibe-code a massive payload that does all of these at once - including subtle things like trying to get itself installed into personal projects and forks, so it can persist across a system wipe. We're only seeing the beginning of these attacks.


Their main DNS is 1.1.1.1 but their secondary is 1.0.0.1 not 1.1.0.0, so close but not quite.


What coding agent(ideally CLI) have people found works well with this?

Occasionally I go and try different agents with openrouter models, but nothing seems to really get close to the proprietary ones like claude-code.


Pi (pi.dev) is fine. I'm using it with DS v4 right now. It's not close to Claude code but I think that's the point.

By the way OpenRouter version is very slow for some reason. DeepSeek platform is faster (and cheaper with the discount) if you don't mind passing the credit card number / email to this company.


As sibling said, Pi is great, and you can absolutely run it directly (there's even a plugin to use itself as a sub agent), but I mainly run it as a sub agent from other harnesses, for example running a more capable model in copilot, and then delegating simpler chunks to pi (using a cheaper model) as the sub agent. I've tried gas town and some others but never got into that way of working. I'm going to try opencode though as a less vendor specific harness than copilot/claude/gemini.


I've been using OpenCode for a few days now, I like it. It doesn't feel any "less heavy" than Claude Code (they're both massive piles of vibe-coded typescript) but for me it's essentially a 1:1 replacement for Claude Code.

Sidenote, I've been trying deepseek-v4-flash and I'm blown away. It's no Opus, but it's as cheap as tap water and punches far above its weight as a Flash model. I keep throwing tasks at it out of curiosity and it keeps solving them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: