The bottom line: This new Claude model is not yet capable enough to autonomously...

xarchive · 2026-04-08T23:18:25 1775690305

The AI researchers designed tests to evaluate whether the model can do their real day-to-day work. They found out Mythos scored well on structured tests, but they know themselves that structured tests do not capture the non-linear, intangible aspects of AI research. So, interesting results, but AI can't replace them yet and AGI still far away.

That's how they reached this conclusion.