Well, no solution yet, but we are getting down to the nitty gritty part of the problem.
I would still like to see you test with WMC
temporarily disabled, to see if the problems occur, although I understand your Catch 22 situation. But WMC could be re-instated quite easily.
Anyway, I think going forward this is the best approach. Let's see what you find.
It would be tricky to anticipate when a problem will occur, but the next time it occurs, make sure you grab the log (you have to keep logging on, otherwise we will not capture the problem event). To keep the log file size low, you need to periodically reset logging when no problems are encountered.