More

CJefferson · 2026-04-17T01:03:31 1776387811

I'm a professor at uni, and this is what is happening -- many students are never really learning. Then they crash into exams at the end of term when they don't have their AI, and they bomb, I'm seeing failure rates like never before.

Now, part of me thinks 'is not letting students having AI like not letting them have a calculator'. On the other hand, if I just let the AI do the exam, well I don't really need the student at all do I?

koonsolo · 2026-04-17T11:36:14 1776425774

When kids learn calculation, they indeed are not allowed to use a calculator.

Same is true for your field now. When kids learn things the AI already knows, it's clear they can't use the AI.

If you want them to become smarter than the AI, they will have to pass through a period where they are dumber than the AI, and it's clear at that point they can't use it.

AI raised the bar, that's all. But it's still a bar that can be passed with human intelligence, and your job is to get them past that.

rogerrogerr · 2026-04-18T00:47:45 1776473265

> it's still a bar that can be passed with human intelligence

Can you expand on this?

koonsolo · 2026-04-18T09:40:35 1776505235

As a developer becomes better, they become better than an LLM, being able to deal with more complex things than what an LLM can handle. Some people will not be able to pass it, but others will.

When there will ever be AGI (I don't think this can be achieved with the current architecture, it needs another AI breakthrough), then we might not be able to surpass it, much like chess currently.

CJefferson · 2026-04-16T06:39:41 1776321581

Oh, that is cool, I’ve never seen that. I might add that to an extended version of the post sometime, I’ll be sure to credit you.

CJefferson · 2026-04-14T02:11:34 1776132694

The chances of significant bugs in lean which lead to false answers to real problems are extremely small (this bug still just caused a crash, but is still bad). Many, many people try very hard to break Lean, and think about how proofs work, and fail. Is it foolproof? No. It might have flaws, it might be logic itself is inconsistent.

I often think of the ‘news level’ of a bug. A bug in most code wouldn’t be news. A bug which caused lean to claim a real proof someone cared about was true, when it wasn’t, would in the proof community the biggest news in a decade.

CJefferson · 2026-04-12T23:23:33 1776036213

Yes, I've found tests are the one thing I need to write. I then also need to be sure to keep 'git diff'ing the tests, to make sure claude doesn't decide to 'fix' the tests when it's code doesn't work.

When I am rigourous about the tests, Claude has done an amazing job implementing some tricky algorithms from some difficult academic papers, saving me time overall, but it does require more babysitting than I would like.

Tuna-Fish · 2026-04-12T23:52:44 1776037964

Give claude a separate user, make the tests not writable for it. Generally you should limit claude to only have write access to the specific things it needs to edit, this will save you tokens because it will fail faster when it goes off the rails.

LelouBil · 2026-04-13T01:20:42 1776043242

Don't even need a separate user if you're on linux (or wsl), just use the sandbox feature, you can specify allowed directories for read and/or write.

The sandbox is powered by bubblewrap (used by Flatpaks) so I trust it.

eru · 2026-04-13T09:51:01 1776073861

You might want to look into property based testing, eg python-hypothesis, if you use that language. It's great, and even finds minimal counter-examples.

CJefferson · 2026-04-12T04:54:23 1775969663

To me, the thing wordpress installs offer is the GUI. I help a few people with wordpress installs, and I've ended up setting up a private wordpress install, and then I run a script which mirrors the website statically -- this is moderately hacky, and I'm sure could be done better, but as long as I hide the private wordpress install, it means I don't need to worry about keeping it up to date.

I haven't found a static generator which has as nice a WYSIWYG interface as wordpress.

CJefferson · 2026-04-06T13:39:03 1775482743

It's very hard in most language to portably handle endian-ness -- almost by definition, if your code has an issue where endian-ness effects behavior, it's not-portable.

I tend to take another view point (while I understand yours) -- if it's not tested, it doesn't work. And nowadays it's really hard to test big-endian code. I don't have one, I find running different-endian software in QEMU really annoying and difficult.

CJefferson · 2026-04-06T02:26:18 1775442378

In my experience, as someone who has gone through this as maintainer of two decent sized projects, that simply doesn't work.

The author of the 'port' probably doesn't know your whole codebase like you, so they are going to need help to get their code polished and merged.

For endian issues, the bugs are often subtle and can occur in strange places (it's hard to grep for 'someone somewhere made an endian assumption'), so you often get dragged into debugging.

Now let's imagine we get everything working, CI set up, I make a PR which breaks the big-endian build. My options are:

1) Start fixing endian bugs myself -- I have other stuff to do!

2) Wait for my 'endian maintainer' to find and fix the bug -- might take weeks, they have other stuff to do!

3) Just disable the endian tests in CI, eventually someone will come complain, maybe a debian packager.

At the end of the day I have finite hours on this earth, and there are just so few big endian users -- I often think there are more packagers who want to make software work on their machine in a kind of 'pokemon-style gotta catch em all', than actual users.

awilfox · 2026-04-07T06:08:33 1775542113

There are many, many users for every one of us packagers. We (at least the four I am aware of, including myself) are not doing 'gotta catch em all', we're doing "we have been notified by users that this package (is not|no longer) working". And it looks like 'gotta catch em all', because there are so many users, still.

There are new users asking how to get Raspberry Pis into aarch64be mode in the Gentoo Arm project channels. There are thousands and thousands of Power Macs. SPARC servers with ridiculous amounts of cores and computer power are super cheap on eBay because Oracle ended support for them - and this is a great way to get a huge thread count cheap, if your software actually runs on it.

Make the BE CI optional if you need to. That way, the maintainer has time to find and fix it, and you can still merge other changes while it runs. What binutils did was have the BE CI run separately and specifically ping the BE maintainers - that way, they know the build's failing, and no one else is bothered with it.

CJefferson · 2026-04-06T01:14:24 1775438064

It’s very different, and it depends what you are targeting. I love love2d.

I think love2d is better if what you love is coding, everything is code, love2d just executes Lua.

If what someone wants to do is make (for example) a 2d platformer, or definately for 3d, and the coding is something you need to do to make your game, goody is better, it includes so many batteries, have a built in gui level editor, etc.

One big advantage of love2d (although ironically not loved by many in its audience) is it is the AI friendly engine, as AIs love text and hate GUIs.

CJefferson · 2026-04-04T13:27:08 1775309228

However, unless you are a super-programmer, it's very easy to introduce subtle bugs. Software I write has hit this occasionally, someone somewhere does something like cast an int to bytes to do some bit-twiddling. Checking your whole codebase for this is incredibly hard.

My modern choice is just to make clear to BE users I don't support them, and while I will accept patches I'll make no attempt to bugfix for them, because every time I try to get a BE VM running a modern linux it takes a whole afternoon.

CJefferson · 2026-04-03T15:09:01 1775228941

You are correct, honestly, I couldn't disagree more with th article. At this point I can't imagine why it's important.

It's also increasingly hard to test. Particularly when you have large expensive testsuites which run incredibly slowly on this simulated machines.