INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: UTF-16 text change in MC 16.0.136  (Read 1422 times)

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
UTF-16 text change in MC 16.0.136
« on: July 22, 2011, 02:53:08 am »

7. Changed: Export Playlist > Text File (delimited) uses UTF-16 instead of ANSI for better international character support.

OK, I'm in a lot of pain now  >:(  This has broken every single one of my PERL scripts for processing exported playlists.  I cannot even look at the files in linux because it claims they are binary files due to the UTF-16 encoding  :(

Frantically trawling Google for help on supporting UTF-16 in Perl...  Anybody help!?!

OK, so I've updated my scripts to support UTF-16LE, but a big issue here is that my environment doesn't natively support UTF-16LE so, as above, all export files are now binary, and tools like sed/grep etc cannot process the file unless I first convert back to UTF-8 or ANSI which defeats the object.

I'm not convinced UTF-16 is the best format to use, but I don't have an alternative that supports all characters so I guess I'll defer to your decision here.

But even so, OUCH - the change broke everything at my end.  I'm guessing I won't be the only one to suffer this??


Logged

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: UTF-16 text change in MC 16.0.136
« Reply #1 on: July 22, 2011, 08:54:20 am »

We could use UTF-8 instead.  I don't have a strong preference.

I wasn't sure that Windows Notepad would work with UTF-8, whereas UTF-16 is very common on Windows.

However, I justed tested UTF-8 in Notepad and at least Windows 7 gets it right.

Thoughts?
Logged
Matt Ashland, JRiver Media Center

mark_h

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 1854
Re: UTF-16 text change in MC 16.0.136
« Reply #2 on: July 22, 2011, 09:06:56 am »

UTF-8 is the native format for my setup, so would get my thumbs up...

Mark
Logged

stottle

  • Junior Woodchuck
  • **
  • Posts: 71
Re: UTF-16 text change in MC 16.0.136
« Reply #3 on: July 22, 2011, 09:39:17 am »

My experience (which is fair, but not super extensive) is that utf-8 is the more common format.  I think the primary reason is that with utf-16 you need the BOM to say whether the data is big-endian or little-endian.  utf-8 doesn't need that, and is a bit easier to use for that reason.

I believe more software supports utf-8 than utf-16.

Brett
Logged

Hendrik

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 10933
Re: UTF-16 text change in MC 16.0.136
« Reply #4 on: July 22, 2011, 09:48:36 am »

UTF-16 is Windows' native format for Unicode storage, but any recent Windows has a Notepad which can read UTF-8 as well, probably even XP.

Actually, the BOM is one of the things thats nice about UTF-16, it immediately identifys a UTF-16 document as such, and you know exactly what you are dealing with.
UTF-8 may have a BOM, but its not required. Alot of things are saved as UTF-8 without BOM, so the editor has to "guess" the encoding - it cannot immediately know that its UTF-8.

That said, any tool claiming to support unicode should also have no issue reading UTF-16. I don't care.
Logged
~ nevcairiel
~ Author of LAV Filters

Matt

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 42373
  • Shoes gone again!
Re: UTF-16 text change in MC 16.0.136
« Reply #5 on: July 22, 2011, 09:59:15 am »

In build 16.0.137 (and newer):
Changed: Export Playlist > Text File (delimited) uses UTF-8 instead of UTF-16.

Note that we don't write a BOM when outputting UTF-8.
Logged
Matt Ashland, JRiver Media Center
Pages: [1]   Go Up