I'm not familiar with the jargon either, but based on some reading it comes down...

anal_reactor · 2026-04-05T07:46:29 1775375189

Why does it only appear on arm64 and not x86?

adrian_b · 2026-04-05T08:14:39 1775376879

It was not architecture-related. Not using huge pages also reproduced the regression on x86.

I do not know why using huge pages mitigates the regression, but it could be just because when the application uses huge pages it uses spinlocks much less frequently so the additional delays do not accumulate enough to cause a significant performance reduction.

tux3 · 2026-04-05T08:58:03 1775379483

The problem is the spinlock being interrupted by a minor fault (you're touching a page of memory for the first time, and the kernel needs to set it up the first time it's actually used)

If your pages are 1GB instead of 4kB, this happens much less often.