Agree, there does not have to be a smoking gun. Current and previous attempts are just ham-fisted.
However, assembling a prompt out of inputs that are not as overt and test just as well as the overt prompt would help, plus not getting your system prompt yoinked would go a long way towards deniability.
Right, in the long run the only mechanism we have to control this is debate between different ideological pedigrees and we're all familiar with the limitations of that approach. Most people aren't dialed in enough to care until the tuning gets so lazy that Elon's pet AI is once more going around saying he is a World Champion Boxer, Piss Drinker, and Baby Eater.
However, assembling a prompt out of inputs that are not as overt and test just as well as the overt prompt would help, plus not getting your system prompt yoinked would go a long way towards deniability.