Including it just excludes the old 'program that returns a compressed size of 0 for a common test case because it contains the common test case within itself' scenario. There's variations of that too. Gpt is fantastic at predicting text and you could use that prediction with arithmetic coding to make a very minimal length for some common English text. But is that really fair? The gpt model essentially contains that English text. Including the program size has little overhead for most cases (code size is generally small) but it ensures we catch out cheating which is why kolmogorov complexity includes it.
If you're cheating your own rough estimate, what are you even doing?
Custom cheats and large language model predictors are outside the realm of "most compression methods".
My thought is that lots of things you might want to compress are pretty small, so if you toss a 200KB not-ultra-size-optimized decompressor on top you get a very misleading number. Not having it will tend to understate a bit, but also current compression algorithms are far from perfect so that counterbalances the effect.
I think I disagree with you about how big a blob of text is getting compressed in "most cases". (If we look at photographs or videos then it works better, but we still want to arrange a minimal decompressor instead of grabbing a typical implementation. And the lossy/lossless factor makes the entire thing far more complicated.)