There wasn't a study or analysis. It was just lazy speculation that felt good because it could be bound up in a "evil white countries exploiting the developing world" narrative. Where exploiting was "paying to do a job".
Again, there is effectively zero real data showing this. Further, RLHF isn't likely to reinforce such word selection regardless.
A more logical, likely scenario is that training data is biased heavily towards higher grade level material, so word selection veers towards writings that you find in those realms.
> It was just lazy speculation that felt good because it could be bound up in a "evil white countries exploiting the developing world" narrative. Where exploiting was "paying to do a job".
Exploitation like that is in fact happening (see pretty much everything having to do with social media content moderation and RLHF to avoid disturbing content.
Also "paying to do a job" is not the moral panacea you seem to think it is.
tinfoil had theory: they implanted watermarks already, so that AI generated text can be flagged for future training runs or as a service, such that some phrases are coaxed to become statistical beacons.
That's not really a tinfoil hat theory. That's been possible for some years and OpenAI reportedly does watermark their outputs, and can detect it. They just haven't released it as a service because it'd annoy all the users who are using it for cheating :)
Yeah I would like to see some evidence of this too. It's just asserted as truth in the article. Delve doesn't seem like a particularly unusual word to me, especially in the context of scientific abstracts, and LLMs could totally learn random weird things. How common is "it's important to remember" in Nigeria?
then again, most history consists of whitewashing back when northern countries were exploiting everywhere else in various ways: imperialism, colonialism, neocolonialism, capitalism, financialization,...
typical people prefer to pretend this is simply "order" and "progress"; seemingly blind to their own ideological baggage like fish in water
It was submitted as https://news.ycombinator.com/item?id=40623629
Again, there is effectively zero real data showing this. Further, RLHF isn't likely to reinforce such word selection regardless.
A more logical, likely scenario is that training data is biased heavily towards higher grade level material, so word selection veers towards writings that you find in those realms.