...but TV Shows are not the only 'special case'.
I'm not suggesting it is. What I am suggesting is attempting to provide for 'special cases' is just going to make it unnecessarily complicated. I'm further suggesting [Media Sub Type]=[TV Show] is 'special' in the sense it's captions are unique,
and it's used by many/most users. If I'm wrong, and the same is true of podcasts and music videos and other things, then forget it—restrict it to unique [Media Types].
If the case can be made for a more sophisticated system—and this is a huge leap from the single inert little input box we have now—I suppose it should be a flexible tree-like hierarchy of rules. Something like...
- Default
- Audio
- [Genre]=[Classical]
- [Media sub Type]=[Audiobook]
- Image
- Video
- [Media Sub Type]=[Music Video]
- [Media Sub Type]=[Podcast]
- [Media Sub Type]=[TV Show]
...with a caption expression associated with each. If a rule is applicable, it's children are checked. If none of those rules are applicable, the parent caption is used. Otherwise, the caption of the first matching rule is used.
This help a little in dealing with a complex set of possibilities. But it also quickly reduces to matter of specifying a caption for a subset of media that's only shown in one view anyway. Where that's the case, it would be more straightforward to specify the caption at the view level. This is particularly true considering the design of the caption and that of the view often go hand-in-hand. So what's more practical is striking a balance between specifying default captions by media type, and then overriding those at the view level. That still provides the choice of using simple general captions and frequently overriding them at the view level, or using more complex general captions which rarely need to be overridden.
This also make it easier to specify captions that only vary when necessary, and are otherwise consistent—even for different media types presented in different views. Consistent captions become familiar, and therefore generally easier to understand. This is important for the kind of Theatre View configuration this issue is applicable to—the one with many views handling a variety of media types. So, another strike against the more sophisticated configuration system. Why create something that will only make it easier to produce an undesirable result?