Unless I'm overlooking something, the demo only requires DOSBox to have a machine with predefined execution speed. There are no DOS interrupt calls that I can see. Other than that, the program could probably even be trivially modified to fit in a floppy disk MBR and could potentially run without underlying OS.
To be more exact (in an excessive way), it uses the BIOS's code to set the video mode (INT 10h) which is probably a few dozen bytes (at least?) although I have been remiss at not ever reading them. And it depends on DOS configuring the memory space to leave an INT 20h call (to terminate the program) at a place that's easy to RET to. But, yeah, very little extra. But I'm not being negative at all and this is pretty nice code and on the impressive side of 256 byte demos from the 80s and 90s (and onward).
Yes, this is very minimal; if it were self-booting the INT 20h call wouldn't be needed, but there's no getting around the INT 10h, unless you specialize for very specific hardware.
The entire 5150 BIOS fit in 8k, so even if it were laden with BIOS calls (which it's not) then that would be an upper-bound.
Also, MIDI - I'm not very familiar with demo programming, but I guess using MIDI saves a lot of bytes compared to trying to do something similar with only the PC speaker?
Sure, it saves a lot of bytes compared to PCM encoded wave-form data, but it's not really cheating anything unless we also consider the red, blue and green parts of the computer monitor to be cheating because we're not outputting colours as raw wavelengths, but instead the monitor is decoding compressed signals into actual colours.
What is this "cheating" you speak of? I wasn't expressing any judgement, just saying that using MIDI helps save bytes. But now that you mention it, the bitmapped graphics that we take for granted nowadays also help (it gives you a whole memory space to work with that doesn't count towards the length of your program, rather than having to "race the beam" -https://en.wikipedia.org/wiki/Racing_the_Beam). Not sure if there's a demoscene for the Atari 2600, but that would probably be the most "bare-metal" you could get...