madVR does show the OSD bitmaps "immediately", but that only means they're shown for the next frame that is getting rendered/presented. If you have e.g. a 12 frame presentation queue in madVR FSE which is nicely "full" all the time, with Blu-Ray playback there is practically a latency of 12 video frames until the OSD images make it to the screen. This is the price to pay for pre-presenting so many frames in advance. The simple problem with D3D FSE presentation is that you can't "take back" any frames which were already delivered to D3D. The only way to solve this would be to lower the number of pre-presented frames to make OSD changes react quicker. So basically there's no good solution to make OSD react quicker.
For subtitles you could simply modify the subtitle timestamps to account for the latency. However, the latency can vary depending on the fill state of the queues. E.g. if the queues are nearly empty, latency is very low, but if the queues are large and full, latency is high. madVR could report the fill state of the queues. But I have to say modifying the subtitle timestamps according to the fill state of the madVR queues doesn't "feel good" to me. But it would be possible and would probably work ok. I would expect timings to not be perfectly exact this way, though. Errors of 1-2 video frames could still occur because in the moment when you ask the fill state of the queues this information could already be outdated again.
Of course the ideal solution would be to use the new interface which allows you to sign each subtitle image with the correct timestamp. That way no timestamp manipulation would be needed at all and madVR would internally take care of everything, resulting in "perfect" sync. But I do understand that switching the interfaces would cost development work and come with the risk of running into new bugs. After all the new subtitle interface in madVR is not well tested yet.