If GPT-5.5 Pro really was Spud, and two years of pretraining culminated in one release, WOW, you cannot feel it at all from this announcement. If OpenAI wants to know why they like they’ve fallen behind the vibes of Anthropic, they need to look no further than their marketing department. This makes everything feel like a completely linear upgrade in every way.
Clearly they felt a big backlash when version 5 was released. Now they are afraid of another response like this. And effectively, for the user it will likely only be a small update.
1. You can't understand the nuances, but there is a general pattern: new inventions may make us slightly less proficient at specifics, yet more powerful overall
2. Imagine a hunter gatherer is time travelled to 2026. You have lunch go to a cafe with him, and he learns that food is cheap, delicious, and abundant. He sees your house, and thinks it's amazing compared to his cave. He thinks that 2026 must be absolute paradise. You explain to him, well kinda, but also not really. Is the hunter gatherer right?
Alternatively he sees that you live in your house alone and feel lonely all the time. Maybe you have a small family and a few friends but it's nothing compared to the tribal life he knows.
He sees you spend your day working but rarely get to go outside or do anything active. Even when you're not working you sit behind a desk staring at a screen.
He wonders why you bother will all the technology when it made your life worse. Is he right?
I agree partially, but also misses the wonder he would have for: relaxing bathtubs, funny livestreams, wireless earbuds, huge libraries, and even globes.
And yeah, you could make a list of struggles we have today he never did. But that’s kind of my point - it’s complicated.
Yeah I think we're actually in agreement with the point about it being complicated. In reality I think different people would have react differently but they would all have mixed feelings. So it's impossible to ask "would they be right?" in a sense. Their feelings would be as valid as they would be varied.
Alternatively, he sees you alone and thinks how excellent to not have to deal with tribesmen- the elders and their rules, the children and their needs, the others hunters and their mind numbing chatter …
The hunter-gatherer will wonder why you spend so much time working. He only spends 2-3 hours a day gathering and preparing food, maybe an hour maintaining tools and shelter; with the rest dedicated to leisure and social activities.
> 1. You can't understand the nuances, but there is a general pattern: new inventions may make us slightly less proficient at specifics, yet more powerful overall
No. It's not a phenomenon with a pattern, maybe there's a coincidental pattern to some subset of inventions, but there's no logical reason that would apply to some arbitrary next invention (e.g. the pattern of biotechnology intention have allowed us to live longer and healthier lives...until some guy invented some experimental pathogen that wipes out the species).
> 2. Imagine a hunter gatherer is time travelled to 2026....
You're kinda missing my point. Many people smugly assume the present is better than the past, and and can point to cherry-picked this-and-that to feel confident about their claim. But almost every modern person has no sense of what was lost, and what prior generations mourned losing. There's a temptation to smugly dismiss the thoughts of those who lived through those transitions as stupid and ignorant, but they have insight that's no longer available to us first hand.
Some of these inventions we're so proud of having may not have resulted in a net-positive effect on our lives, but we don't have the experience to realize that anymore (like someone in a community that's been living knee-deep in shit all the time doesn't have the experience to realize it's terrible life compared to his distant ancestors').
I’ve been thinking about AI robotics lately… if internally at labs they have a GPT-2, GPT-3 “equivalent” for robotics, you can’t really release that. If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.
So there might be awesome progress behind the scenes, just not ready for the general public.
I ended up watching Bicentennial Man (1999) with Robin Williams over the weekend. If you haven't seen I thought it was a good and timely thing to watch and is kid friendly. Without giving away the plot, the scene where it was unloading the dishwasher...take my money!
> If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.
That's a bit exaggerated, no? Early roombas would get tangled in socks, drag pet poop all over the floor, break glass stuff and so on, and yet the market accepted that, evolved, and now we have plenty of cleaning robots from various companies, including cheap spying ones from china.
I actually think that there's a lot of value in being the first to deploy bots into homes, even if they aren't perfect. The amount of data you'd collect is invaluable, and by the looks of it, can't be synth generated in a lab.
I think the "safer" option is still the "bring them to factories first, offices next and homes last", but anyway I'm sure someone will jump straight to home deployments.
VLA models essentially take a webcam screenshot + some text (think "put the red block in the right box") and output motor control instructions to achieve that.
Note: "Gemini Robotics-ER" is not a VLA, though Gemini does have a VLA model too: "Gemini Robotics".
If someone paid 100 grand for you to load and unload the the dishwasher, and the research to be able to do it costed hundreds of billions, decades of research, hundreds of thousands of researchers, and that was the ONLY thing you could do, yes, you WOULD be a massive failure.
From an economic standpoint the industry is anyway the most relevant by far. Its easier as the env is a lot more controlled, professionals configure and maintain the robots, they buy in bulk and have more money.
My concern with a household robot is not the dishwasher but the tv screen, the glas door, glas table, animals (fish/aquarium) etc. the robot might walk through, touch through or fall onto.
> If a robot unloading your dishwasher breaks one of your dishes once, this is a massive failure.
Depending on what the rate of breaking dishes is, this would be a massive improvement on me, a human being, since I break a really important dish I needed to use like ~2x per month on average.
This would have been an amazing release 6 months ago. But the industry moves so fast, this is a trite release. Maybe it’s best for Meta to sell their superintelligence division. I don’t think Zuck’s vision is particularly compelling.
If the model is truly on par with Opus 4.6/Gemini 3.1/GPT 5.4 (beyond benchmarks) this still puts MSL in the frontier lab category, which is no small feat given that they pretty much rebooted last year
Many labs aren't able to keep up with the frontier, xAI, Mistral
Fourth place means you're not reliant on any of the external providers for internal AI use, which is important for organizational health and negotiating with those other providers.
I’m not sure it’s useful for negotiating, the capex to build it was surely orders of magnitude more than it would cost to just use one of the other frontier models.
It’s like someone negotiating by saying, “I’ll waste even MORE money to build something worse if you don’t give me a deal.”
I’m not discounting there may be other advantages to doing it. I just don’t think negotiating is one.
Why would you use this instead of the other more proven models? Unless it's significantly cheaper. The general population mostly wants it free, and the more professional users are willing to pay for good/better responses.
You wouldn't use this as an API. You would "use" this inside the meta properties. Have a shop on fb marketplace? Now you have copy, images, support, chat, translations, erp, esp, fps and all the other acronyms :) and so on for your mom and pop shop @200$/mo. Probably worse than say claude/gemini but it's right there, one button away. "Click here to upgrade to AI++" or something.
I won't use it, but I'm excited to see it for the same reason why I'm excited to see a near-frontier open-source release: more competition pushes prices down and reduces monopoly/cartel risk. I won't use Muse or Grok or GLM at this point but they're good for the ecosystem.
Their new Contemplating mode gives this model a Deep Research ability (akin to existing models from GPT and Gemini) that might make it quite comparable to the just-announced Mythos.
I never understood why meta decided to join the race. They don’t sell compute like Google or Microsoft. Why not let others do the hard work and integrate their LLMs in your systems if needed?
I assume it’s because they have Instagram, Facebook, WhatsApp, Thread data and feel they should be the ones using them for training, but it’s really not obvious how having a frontier AI lab benefits their business
Adtech Money. They've got GPUs, they've got the infrastructure, and they've got the advertisement platform, and the point is getting AI that can exploit the adtech and create a flywheel effect, maximizing return from the data they collect from Insta, WhatsApp, Facebook, etc.
It's not just about LLMs, it's about being able to model consumers and markets and psychology and so on. Meta is also big in the manipulation side of things, any sort of cynical technological exploitation of humans you can imagine but that is technically legal, they're doing it for profit.
> I never understood why meta decided to join the race.
I can think of at least two reasons. Price and customizability. If they train their own models on their own data, they potentially have a better model at a better price, and they're not at the mercy of Anthropic's decisions when they decide to raise prices. Additionally, if you use someone else's model, you use it the way they create it and permit you to use it. In a couple years, who has any idea how these models are used. Arguably, a company the size of Meta should be in control of their AI models.
You basically have to be involved if you're meta. Even if there's only 5% chance this AI stuff is as disruptive as the labs claim it is, you can't afford to miss out. Even if you're lagging frontier, you must develop the competency internally. Otherwise you ignored a 5% chance of total annihilation, probably even exposing you to shareholder lawsuits.
Because there's a realistic chance this is the only important software technology moving forward, and commoditizes Metas's entire business which is software.
Meta’s business is human attention, human connections, and all derived data. They can use AIs for their systems, but the question is why do they feel the need to spend billions on training and running their own frontier model
From what I heard Meta is spending hundreds of millions each month in Claude credits for developers. So that’s a huge saving if they have own models that match Opus.
Spending tons of money on Claude and the recent token benchmarks came WELL after Meta's huge investments in compute infrastructure for AI as well as the long history of language model development inside science divisions at the company.
LLMs/Chat-based systems will reach a point where Facebook, WhatsApp, Threads, Instagram, etc. are all unnecessary. The idea of opening a browser or a specific app to do a thing will seem antiquated. You can do it all with your chat-based agent. Meta wants to be part of that.
2) decent ML is critical to catagorising content at scale, the more accurate and fast the category, the finer the recommendations can be (ie instead of woman, outside as a tag for a video, woman, age, hair colour, location, subjects in view, main subject of video, video style) doing that as fast as possible with as little energy as possible is mission critical
3) The llama leak basically evaporated the moat around openAI who _could_ have become a competitor
4) for the AR stuff, all of these models (and visual models) are required to make the platform work. They also need complete ownership so that it can be distilled to make it run on tiny hardware
5) dick swinging
6) they genuinely want to become a industrial behemoth, so robots, hardware, etc are now all in scope.
I think they just want to be a winner in the “next thing.” They hit social networking, but missed mobile operating systems and didn’t compellingly win at social media. Eventually an ambitious person with a bazillion dollars wants a clear win, right?
First and most importantly is the fact they have a lot of very valuable data they wouldn't want to siphon to a competitor. This data is a key strategic asset in the space where they do business.
Secondly though, I think it has to do with the fact Meta is big enough to worry about vertical integration and full control of their business.
The whole reason they've been trying to make AR/VR happen for over a decade now is the assumption of a worst case and best case scenario. The worst case is Apple and Google wants them gone. This isn't as far fetched as it seems, Google has historically been Meta's biggest competitor and even tried to release its own social network back when Meta was threatening them. If either pulls Meta apps from their respective stores, it'd be an immense blow to Meta; their whole trillion-dollar business depends on competitor's platforms.
Meta tried making inroads into the phone business but failed; it is a very crowded market after all. So they changed their strategy. Instead of playing catch-up, they'd invent "the next iPhone" and be the first to a brand new market. This is the best case scenario; they invent a new platform where they can be dominant from day 1 and stop depending on competitor's hardware, not only removing that risk factor for them, but also unlocking a new market they can control.
AI ties into all this because it appears to be key for this next platform to happen. You will communicate with these smart glasses via voice, hand gestures, or subtle movements that a model will have to interpret. The features that could make them stand out as more than just a screen on your face are all AI related; object detection, world understanding, context awareness, etc. If all this were done via a 3rd party Meta would effectively be back on square one: a competitor could easily yank away its model access, or sell it to a competitor. Meta would be again at the mercy of others.
Compared to other big-tech players, I think it's easy to see how Meta is in a riskier position. There's little Google or Microsoft can do to kill the iPhone. There's little Apple or Google can do to kill Amazon's online store. There's little Amazon or Apple can do to kill Microsoft's business deals. Google and Meta are primarily in the business of capturing people's data, attention, and selling ads, and both Google and Apple could do quite some damage to Meta. Beyond expanding it, it's important for them to invest in ways to protect their money-printing machine.
you dont understand why zuck, who paid $1B for instagram when they had no revenue and 7 employees because he is paranoid about platform shifts, decided to join the race for (what is seeming highly possibly) the biggest platform shift in human history?
My job may have become part of the training data with how much coverage there is around it. Perhaps another career would be a better test of LLM capabilities.
The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.
A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.
Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.
Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?
If you've been using each new step is very noticeable and so have the mindshare. Around Sonnet 3.7 Claude Code-style coding became usable, and very quickly gained a lot of marketshare. Opus 4 could tackle significant more complexity. Opus 4.6 has been another noticable step up for me, suddenly I can let CC run significantly more independently, allowing multiple parallel agents where previously too much babysitting was required for that.
I think this is where there's a huge distinction between ability/performance/benchmark figures and utility. You can have smooth improvements to performance, but marked step changes in utility as they cross thresholds where you're able to use them for new tasks.
In terms of real work, it was the 4 series models. That raised the floor of Sonnet high enough to be "reliable" for common tasks and Opus 4 was capable of handling some hard problems. It still had a big reward hacking/deception problem that Codex models don't display so much, but with Opus 4.5+ it's fairly reliable.
I had not used Claude much until an hour ago since probably before GPT5.
I had only been using Gemini the last 3 months.
Sonnet 4.6 extended on the free plan is just incredible. I am just complete floored by it. The conversation I just had with it was nuts. It was from Dario mentioning something like a 20% chance Claude is conscious or something crazy like that. I have always tried that conversation with previous models but it got boring so fast.
There is something with the way it can organize context without getting lost that completely blows Gemini away.
Maybe even more so that it was the first time it felt like a model pushed back a little and the answers were not just me ultimately steering it into certain answers. For the free plan that is nuts.
In terms of being conscious, it is the first time I would say I am not 100% certain it is just a very useful, very smart , stochastic parrot. I wouldn't want to say more than that but 15-20% doesn't sound so insane to me as it did 2 hours ago.
I always grew up hearing “competition is good for the consumer.” But I never really internalized how good fierce battles for market share are. The amount of competition in a space is directly proportional to how good the results are for consumers.
Remember when GPT-2 was “too dangerous to release” in 2019? That could have still been the state in 2026 if they didn’t YOLO it and ship ChatGPT to kick off this whole race.
I was just thinking earlier today how in an alternate universe, probably not too far removed from our own, Google has a monopoly on transformers and we are all stuck with a single GPT-3.5 level model, and Google has a GPT-4o model behind the scenes that it is terrified to release (but using heavily internally).
Before ChatGPT was even released, Google had an internal-only chat tuned LLM. It went "viral" because some of the testers thought it was sentient and it caused a whole media circus. This is partially why Google was so ill equipped to even start competing - they had fresh wounds of a crazy media circus.
My pet theory though is that this news is what inspired OpenAI to chat-tune GPT-3, which was a pretty cool text generator model, but not a chat model. So it may have been a necessary step to get chat-llms out of Mountain View and into the real world.
> some of the testers thought it was sentient and it caused a whole media circus.
Not "some of the testers." One engineer.
He realized he could get a lot of attention by claiming (with no evidence and no understanding of what sentience means) that the LLM was sentient and made a huge stink about it.
He was unfairly labelled as a lunatic early on. I'd implore anyone reading this thread to see what he had to say for yourself and form your own opinion: https://youtube.com/watch?v=kgCUn4fQTsc
Is that really the case in the last few years/decades?
My understanding is that any company that can (read: has enough money for good lawyers), will prefer to use trade secrets for a combination of reasons, a big one being that competitors cannot use that technology after 10 years/when the patent expires.
Admittedly this was from my entrepreneurship classes in a European uni, so I'm not sure how it is in different places in the world.
Patents in the US are 20 years. Given how short sighted modern companies are, I can’t imagine anyone at any large company is even planning for something 20 years in the future, much less placing much value in an outcome that far out.
They didn't YOLO ChatGPT. There were more than a few iterations of GPT-3 over a few years which were actually overmoderated, then they released a research preview named ChatGPT (that was barely functional compared to modern standards) that got traction outside the tech community because it was free, and so the pivot ensued.
In 2019 the technology was new and there was no 'counter' at that time. The average persons was not thinking about the presence and prevalence of ai in the way we do now.
It was kinda like a having muskets against indigenous tribes in the 14-1500s vs a machine gun against a modern city today. The machine gun is objectively better but has not kept up pace with the increase in defensive capability of a modern city with a modern police force.
That's rewriting history. What they said at the time:
> Nearly a year ago we wrote in the OpenAI Charter : “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time. This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas. -- https://openai.com/index/better-language-models/
Then over the next few months they released increasingly large models, with the full model public in November 2019 https://openai.com/index/gpt-2-1-5b-release/ , well before ChatGPT.
> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code (opens in a new window).
"Too dangerous to release" is accurate. There's no rewriting of history.
> Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.
I wouldn't call it rewriting history to say they initially considered GPT-2 too dangerous to be released. If they'd applied this approach to subsequent models rather than making them available via ChatGPT and an API, it's conceivable that LLMs would be 3-5 years behind where they currently are in the development cycle.
Competition is great, but it's so much better when it is all about shaving costs. I am afraid that what we are seeing here is an arms race with no moat: Something that will behave a lot like a Vickrey auction. The competitors all lose money in the investment, and since a winner takes all, and it never makes sense to stop the marginal investment when you think you have a chance to win, ultimately more resources are spent than the value ever created.
This might not be what we are facing here, but seeing how little moat anyone on AI has, I just can't discount the risk. And then instead of the consumers of today getting a great deal, we zoom out and see that 5x was spent developing the tech than it needed to, and that's not all that great economically as a whole. It's not as if, say, the weights from a 3 year old model are just useful capital to be reused later, like, say, when in the dot com boom we ended up with way too much fiber that was needed, but that could be bought and turned on profitably later.
Three-year-old models aren't useful because there are (1) cheaper models that are roughly equivalent, and (2) better models.
If Sonnet 4.6 is actually "good enough" in some respects, maybe the models will just get cheaper along one branch, while they get better on a different branch.
It's funny, it sure seems like software projects in general follow the Lindy effect: considering their age and mindshare, I can safely predict gcc, emacs, SQLite, and Python will still be running somewhere ten, 20, 30 years from now. Indeed, people will choose to use certain software specifically because it's been around forever; it's tried and true.
But LLMs, and AI-related tooling, seem to really buck that trend: they're obsoleted almost as soon as they're released.
AI-related tooling is pretty fungible, but AI models get immediately obseleted due to the unit economics around training models... as well as the fact that nobody releases their datasets or training paradigms in useful detail (best we get is the model weights, because of copyright etc etc)
People are rapidly learning how to improve model capabilities and lower resource requirements. The models we throw away as we go are the steps we climbed along the way.
The real interesting part is how often you see people on HN deny this. People have been saying the token cost will 10x, or AI companies are intentionally making their models worse to trick you to consume more tokens. As if making a better model isn't not the most cutting-throat competition (probably the most competitive market in the human history) right now.
I mean enshittification has not begun quite yet. Everyone is still raising capital so current investors can pass the bag to the next set. Soon as the money runs out monetization will overtake valuation as top priority. Then suddenly when you ask any of these models “how do I make chocolate chip cookies?” you will get something like:
> You will need one cup King Arthur All Purpose white flour, one large brown Eggland’s Best egg (a good source of Omega-3 and healthy cholesterol), one cup of water (be sure to use your Pyrex brand measuring cup), half a cup of Toll House Milk Chocolate Chips…
> Combine the sugar and egg in your 3 quart KitchenAid Mixer and mix until…
All of this will contain links and AdSense looking ads. For $200/month they will limit it to in-house ads about their $500/month model.
While this is funny, the actual race already started in how companies can nudge LLM results towards their products. We can't be saved from enshittification, I fear.
I'm concerned for a future where adults stop realizing they themselves sound like LLMs because the majority of their interaction/reading is output from LLMs. Decades of corporations being the ones molding the very language we use is going to have an interesting effect.
They did, but Uber is no longer cheap [1]. Is the parent’s point that it can’t last forever? For Uber it lasted long enough to drive most of the competition away.
Uber's in a business where you have some amount of network effect - you need both drivers available using your app, as well as customers hailing rides. Without a sufficient quantity of either, you can't really turn a profit.
LLM providers don't, really. As far as I can tell, their moat is the ability to train a model, and possessing the hardware to run it. Also, open-weight models provide a floor for model training. I think their big bet is that gathering user-data from interactions with the LLM will be so valuable that it results in substantially-better models, but I'm not sure that's the case.
Their other genius was to operate illegally, make the service so popular that politicians had no choice but to change the laws, and in the process make taxi licences, that used to cost as much as a house, worthless.
Unfortunately, people naively assume all markets behave like this, even when the market, in reality, is not set up for full competition (due to monopolies, monopsonies, informational asymmetry, etc).
And AI is currently killing a bunch of markets intentionally: the RAM deal for OpenAI wouldn't have gone through the way it did if it wasn't done in secret with anti-competitive restrictions.
There's a world of difference between what's happening and RAM prices if OAI and others were just bidding for produced modules as they released.
This is a bit of a tangent, but it highlights exactly what people miss when talking about China taking over our industries. Right now, China has about 140 different car brands, roughly 100 of which are domestic. Compare that to Europe, where we have about 50 brands competing, or the US, which is essentially a walled garden with fewer than 40.
That level of internal fierce competition is a massive reason why they are beating us so badly on cost-effectiveness and innovation.
It's the low cost of labor in addition to lack of environmental regulation that made China a success story. I'm sure the competition helps too but it's not main driver
oh, then explain to me how both China is leading in both robotics and AI. if it is because of "low cost of labor in addition to lack of environmental regulation", you'd be seeing countries like india beating the US and EU.
which isnt particularly unique. its comparable to something like aome subset of americans getting black lung, or the health problems from the train explosion in east palestine.
it took a lot of work for environmentalists to get some regulation into the US, canda, and the EU. china will get to that eventually
It isn’t. I just bring it up to state there is a very good reason the rest of the world doesn’t just drop their regulations. In the future I imagine China may give up many of these industries and move to cleaner ones, letting someone else take the toxic manufacturing.
Only if you take consummer electronics out of the equation, because this AI arm race has wrecked havoc in the market for consumer GPUs, RAM, SSD and HDD.
If you take the arm race externalities into account, I'm very much unconvinced that we're better off than last year.
At a certain point though we can't only blame the free market or the companies. Consumers should know better than to choose products that are anti-consumer. The fact that they don't know better and don't care is the bigger problem. Until we figure out what to do about that any solution is going to be dangerously paternalistic.
Or the same number of tokens in less time. Kinda feels like the CPU / modem wars of the 90s all over again - I remember those differences you felt going from a 386 -> 486 or from a 2400 -> 9600 baud modem.
We're in the 2400 baud era for coding agents and I for one look forward to the 56k era around the corner ;)
There's hundreds of gameboy emulators available on Github they've been trained on. It's quite literally the simplest piece of emulation you could do. The fact that they couldn't do it before is an indictment of how shit they were, but a gameboy emulator should be a weekend project for anyone even ever so slightly qualified. Your benchmark was awful to begin with.
Your expectations are wild. Most software engineers could not write a game boy emulator - and now you need zero programming skills whatsoever to write one.
reply