INTERACT FORUM
More => Old Versions => Media Center 16 (Development Ended) => Topic started by: mark_h on July 22, 2011, 02:53:08 am
-
7. Changed: Export Playlist > Text File (delimited) uses UTF-16 instead of ANSI for better international character support.
OK, I'm in a lot of pain now >:( This has broken every single one of my PERL scripts for processing exported playlists. I cannot even look at the files in linux because it claims they are binary files due to the UTF-16 encoding :(
Frantically trawling Google for help on supporting UTF-16 in Perl... Anybody help!?!
OK, so I've updated my scripts to support UTF-16LE, but a big issue here is that my environment doesn't natively support UTF-16LE so, as above, all export files are now binary, and tools like sed/grep etc cannot process the file unless I first convert back to UTF-8 or ANSI which defeats the object.
I'm not convinced UTF-16 is the best format to use, but I don't have an alternative that supports all characters so I guess I'll defer to your decision here.
But even so, OUCH - the change broke everything at my end. I'm guessing I won't be the only one to suffer this??
-
We could use UTF-8 instead. I don't have a strong preference.
I wasn't sure that Windows Notepad would work with UTF-8, whereas UTF-16 is very common on Windows.
However, I justed tested UTF-8 in Notepad and at least Windows 7 gets it right.
Thoughts?
-
UTF-8 is the native format for my setup, so would get my thumbs up...
Mark
-
My experience (which is fair, but not super extensive) is that utf-8 is the more common format. I think the primary reason is that with utf-16 you need the BOM to say whether the data is big-endian or little-endian. utf-8 doesn't need that, and is a bit easier to use for that reason.
I believe more software supports utf-8 than utf-16.
Brett
-
UTF-16 is Windows' native format for Unicode storage, but any recent Windows has a Notepad which can read UTF-8 as well, probably even XP.
Actually, the BOM is one of the things thats nice about UTF-16, it immediately identifys a UTF-16 document as such, and you know exactly what you are dealing with.
UTF-8 may have a BOM, but its not required. Alot of things are saved as UTF-8 without BOM, so the editor has to "guess" the encoding - it cannot immediately know that its UTF-8.
That said, any tool claiming to support unicode should also have no issue reading UTF-16. I don't care.
-
In build 16.0.137 (and newer):
Changed: Export Playlist > Text File (delimited) uses UTF-8 instead of UTF-16.
Note that we don't write a BOM when outputting UTF-8.