As far as the "what to do with flaky tests", I error on the side of just outright killing the tests. Unless it's an absolutely crucial business case, I'd rather have no test than a test that slowly degrades trust.
For what it's worth, I've been working on a side-project to try to help with almost this exact situation and would be really interested if it could help you; https://gaffer.sh
For what it's worth, I've been working on a side-project to try to help with almost this exact situation and would be really interested if it could help you; https://gaffer.sh