It's desirable for complication speed, but not for performance. The blessing & curse of templates is that while you pay for them in compilation speed, you're also getting every scrap of localized optimizations possible without needing to run a much slower, and less capable, LTO pass.
There isn't a "strictly better" here, it's a trade-off. If you want a split implementation, make one. That's the beauty of C++ - there's nothing special the standard library can do that you can't. Android, for example, has a split implementation: https://cs.android.com/android/platform/superproject/+/maste... (I wouldn't use it, though, it's pretty outdated & shitty, but you can still do what you're talking about).
There isn't a "strictly better" here, it's a trade-off. If you want a split implementation, make one. That's the beauty of C++ - there's nothing special the standard library can do that you can't. Android, for example, has a split implementation: https://cs.android.com/android/platform/superproject/+/maste... (I wouldn't use it, though, it's pretty outdated & shitty, but you can still do what you're talking about).