that's a bit what the "repeated" scenario (roughly middle of the post) measures. It's not in work order but it is the same order every time, so caches work. And there you see that the working set size matters.
Note that the base setup has zero cache reuse because each run touches a completely different and cold part of memory. (that makes the result more of an upper bound on the needed chunk size)
Note that the base setup has zero cache reuse because each run touches a completely different and cold part of memory. (that makes the result more of an upper bound on the needed chunk size)