firstly, could I fill in those channels via OTA EPG (without wiping out the IceTV data)
Yes. You can run IceTV for all the channels they support, and then the MC OTA EPG collection for only the remaining channels. You can manage that in MC TV Setup using two processes, with an IceTV XML file import and a separate MC OTA EPG Collection.
Someone has written about that previously around here somewhere.
and more importantly, could meta data like missing series/episodes be somehow supplemented from other sources.
Yes. You could run EPGCollector for the remaining channels instead of the MC OTA EPG collection, then use the built-in Metadata lookup in EPGCollector to add data. However, if the OTA EIT data is really rubbish, neither EPGCollector or any other tool is going to improve the metadata much. You may find another source for the EPG for the missing channels, but that isn't likely in Australia. There a some still available I think, but they are a lot of work.
Then you just run two XMLTV file imports. One for IceTV and one for EPGCollector.
You can also pass the XMLTV file to EPGCollector as part of its run, where it collects just for the missing channels. I think you could also run the additional metadata lookup for all data that way, but it probably won't improve the IceTV data, and may break it. You would have to try it if you wanted to.
I haven't implemented IceTV since their revival, so I haven't implemented the above solutions. YMMV.
Note: I read somewhere that MHEG5 data is still available OTA, but I haven't tested that. It may have just been for New Zealand. If MHEG5 data is available in Australia that is much better than EIT, and gets improved a lot by the EPGCollector lookup process.
MC has an option not to record duplicates or reruns etc based on a field like name, description or series/episode. If those fields are blank or default/generic values that MC has previously recorded, will MC see all such future instances as "this is the same value as a previous recording" and therefore NOT record?
No. The metadata selected must match.
Is it better to choose multiple fields or just one?
Less is better when using the option not to record duplicates. Series, Season and Episode are best. Series and Name are next best. Using the Description is bad at any time, as it varies too much. Broadcasters deliberately obfuscate repeat shows all the time. Even out beloved ABC doesn't give Season and Episode metadata at the moment for most programs, and all repeats have generic names, often for the Series and Name. i.e. Doctor Who just gets the Series and (Episode) Name for new programs, which is enough for EPGCollector to find additional metadata. But for repeats both Series and Name are just set to "Doctor Who". Often the Description as well. @#!&^%@)*&