INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: Asian Character support in UPnP server  (Read 7830 times)

horse

  • Regular Member
  • World Citizen
  • ***
  • Posts: 212
Asian Character support in UPnP server
« on: January 15, 2007, 07:11:17 pm »

The more I play with the features of MC12 the more I find it to be more powerful and flexible than me!
However I have imported some Asian CD's into the library and with the Asian Language support loaded they are displayed perfectly. Ok, me thinks, now time to get them to play on the Denon AV Amp. (made in Japan so one would hope it can display the characters :-)  )

The Characters show up as "????" I checked to see what was being sent my MC and find it is sending the info as ?. Have tried checking and unchecking the "Filter International Characters" under options and restarting the server, but my guess they are for single byte characters that are not the normal ASCII set.

I know I'm stetching it, to expect MC to handle double byte, but thought I'd ask as it works fine for the main display. Which is cool.

Logged

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #1 on: January 15, 2007, 08:36:43 pm »

Hi Horse,

Filter International Character replaces everything except 7 bit ASCII with a ?

The UPnP devices didn't handle international characters very well when I put that feature in.
Do you think the Denon does? What character set?

Thanks,

j

horse

  • Regular Member
  • World Citizen
  • ***
  • Posts: 212
Re: Asian Character support in UPnP server
« Reply #2 on: January 17, 2007, 11:29:44 am »

Short answer
The character set is UTF-8 / Unicode which is already what the header from MC indicates.

Long answer
The title I'm trying to display is 聽見藍山的味道 which when I use the tools at http://www.mandarintools.com/chardict_u8.html and set it to show the output with Unicode values, I should get 807D 898B 85CD 5C71 7684 5473 9053 which when I use the code tables at http://www.unicode.org/charts/PDF/U4E00.pdf I get the correct characters.

So that was the theory. Now in practice I thought I'd go back through the other UPnP servers I tried before finding MC12 to see if any of them support Asian fonts and see if the Denon will actually support displaying them on either the front panel (probably sucks as it is only 8 pixel high and would need 16 to do it nicely) or on the Monitor output. (resolution up to 1080p so it will depend on the character generator) Most of them would not even display the CD correctly on Windows, so did not go further. Tried WMP 10 and this did display correctly in Windows, however the bytes sent as the title did not make sense, but they where not a string of 0x3F, and where also not understood by the Denon. Same was true using Twonky. The coding for the CD title is below and neither results are what I expected, also it is not the expected number of bytes. (neither displayed correctly)

WMP 10
e4 b8 8b e5 8d 88 e4 b8 89 e9 bb 9e e5 9b 9b e5 8d 81 e5 88 86

Twonky
e8 81 bd e8 a6 8b e8 97 8d e5 b1 b1 e7 9a 84 e5 91 b3 e9 81 93

Not sure how easy it is for you to pass Unicode from the MC12 Library without changing or filtering? I don't want to waste your time. I will do some more playing as I don't expect a call to Denon will help much as they still have failed to respond to the more basic request regarding the url headers!

Thanks
Logged

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #3 on: January 17, 2007, 12:47:43 pm »

Hi Horse,

Unchecking the option sends UTF8 to the device. If you can find something on the web that indicates a different character set works, let me know and I'll try it out...

j

p.s. I updated the wiki for this:
http://wiki.jrmediacenter.com/index.php/UPnP_Server_and_Devices_%28Media_Receivers%29#Options

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #4 on: January 17, 2007, 01:04:44 pm »

It just struck me: the reason that WMP and Twonky are sending different characters is that they are specifying different character sets in the XML.

The XML that gets sent has the character set in the header, so you can specify different ones. Try sniffing if you want to see the details:
http://wiki.jrmediacenter.com/index.php/Sharing_Plug-in_Debugging_Hints

j

horse

  • Regular Member
  • World Citizen
  • ***
  • Posts: 212
Re: Asian Character support in UPnP server
« Reply #5 on: January 17, 2007, 06:12:01 pm »

John,
BTW you should (or maybe I should) update the wiki with regard to Ethereal. The website is gone and the developers are now working on Wireshark. More at http://www.wireshark.org/   same thing, new name :-)

I checked the xml coding in the SOAP headers and they both indicate UTF-8 for WMP and Twonky.
Doesn't explain the difference. Will keep trying to figure that out :-)

I double checked MC 12;  with or without the international filter I found the xml encoding is set to ISO 8859-1 (Latin-1) in both traces. With filtering disabled I can see the chinese get converted to ? (0x3f) and I ripped another track with a french title and I see the é (0xe9) in the trace. With filtering enabled é is replaced by a ? (0x3f)

The Denon is using UTF-8 as it's encoding.

Apologies, yesterday I was sure I looked at the MC12 trace and it was UTF-8. I restarted the server between enabling and disbaling the international filter option change. Confirmed it changed by looking for the é. Looking at too many traces and doing my paid day job :-)

Logged

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #6 on: January 18, 2007, 09:36:55 am »

I'll try to get a version out for you to test that does UTF-8 then, see if it works. That would be nice if it did.
Hopefully sometime today. Can you drop me an e-mail at gateley at jriver.com?

Thanks,

j

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #7 on: January 18, 2007, 09:41:20 am »

Oh yeah, thanks for the heads-up about ethereal. I fixed the wiki. I always like the name ethereal though...

j

horse

  • Regular Member
  • World Citizen
  • ***
  • Posts: 212
Re: Asian Character support in UPnP server
« Reply #8 on: January 18, 2007, 04:56:35 pm »

Hi John,

Short answer. The private build indicates UTF-8 coding however the characters are not encoded correctly for UTF-8. It still looks like the coding never chaged. (I may be wrong, still learning this)
I'm am looking at two characters, é  and  聽.  The é is coded as 0xE9 and the 聽 is coded at 0x3F or ASCII ? as before. . . . . Also figured out that Twonky did it right and the Denon can't display what I need anyway.

Longer answer for those who want to know and also with a Denon and want more than basic ASCII

I taught myself a little more about Unicode to UTF-8 as I had misunderstood one websites description. Reading http://www.faqs.org/rfcs/rfc3629.html cleared that up.
Also found a useful site for quick conversion from Unicode to UTF-8 to save pen and paper                 http://www.ltg.ed.ac.uk/~richard/utf-8.html
My orignal confusion was [believing] Unicode values were already UTF-8 and needed no more "encoding" - Big mistake.

Based on this all standard ASCII (00 - 7F) will be the same, in my case the following should occur: -

Character           Unicode Value            UTF-8 Hex Value
     ?                        0x3F                       0x3F
     é                        0xE9                       0xC3A9
     聽                       0x807D                   0xE881BD

As you can see the E8 81 BD is the same string that the Twonky server used for the 聽. Have no idea what Microsoft was using :-)

Based on that the Denon is not capable of displaying the CJK character set. Just for the record this uncovered another couple of bugs in the Denon as the XML charset is UTF-8 however the character é is incorrectly (I think) coded as Latin 0xE9 and the front panel still displays é, however the OSD display shows just a plain "e"  ? Will tell Denon about this but guess the response will be the same as for the headers issue. <sound of silence>

Based on this, unless I'm wrong about the coding, I don't see much point is continuing. This has proved that even if MC did code other charsets the Denon would not display them anyway. It can't even handle Latin-1 characters using the OSD. Unless others want to see this support (Some people may as Roku can display this) More than happy to continue testing, however it will be using Wireshark and the tools on the web to decode what is sent :-)

Thanks John for your help, much appreciated, all I can say is I'm very happy with the sound of my Denon and the other features, but I'm very disappointed with there UPnP support, especially as they are a Japanese manufacturer and can't handle international characters! MC12 rocks and would not even consider moving to Twonky even if it did display correctly.
Logged

JimH

  • Administrator
  • Citizen of the Universe
  • *****
  • Posts: 71338
  • Where did I put my teeth?
Re: Asian Character support in UPnP server
« Reply #9 on: January 18, 2007, 05:56:08 pm »

Horse,
Thanks very much for the detailed description.  I've added you to our Hall of Flames thread here:
http://yabb.jriver.com/interact/index.php?topic=24031.msg260605#msg260605
Logged

John Gateley

  • Citizen of the Universe
  • *****
  • Posts: 4957
  • Nice haircut
Re: Asian Character support in UPnP server
« Reply #10 on: January 19, 2007, 10:37:44 am »

Thanks a lot Horse. It's what I found last time I tried this (though I'm pretty sure it was a different device).

I believe what you are calling Unicode is actually UTF-16, right? Unicode is the name of a family of character sets, including UTF-8, UTF-16, and UTF-32. But the UTF-16 value of '?' is 0x003F, not 0x3F. UTF-16 requires 2 bytes per character.

Hopefully UPnP devices will do a better job in the future.

j

horse

  • Regular Member
  • World Citizen
  • ***
  • Posts: 212
Re: Asian Character support in UPnP server
« Reply #11 on: January 19, 2007, 11:39:47 pm »

Almost :-)  BTW for anyone reading this thread without asian fonts loaded will not be seeing the the chinese character I keep using in my examples. Sorry

To us that speak modern languages, the Unicode characters are in tables that require 2 bytes to encode them so UTF-16 = Unicode for a very large % of the characters. Latin and Latin-1  characters have a 0x00 prefix to there ASCII (single byte) equivalent and yes ? would be 003F in both Unicode and UTF-16 coded Unicode. I like UTF-16 as it needs less brain cells to understand even though it uses up a twice the space for many characters.
If you need the ancient scripts then they need 3 bytes to represent them and hence need 4 UTF-8 bytes or 2 UTF-16 doublebytes or a single UTF-32 quad byte.

The following shows various characters and the 4th line is a Unicode Old Persian character: -

                             Unicode                UTF-8             UTF-16       
?                               003F                         3F              003F
é                              00E9                    C3 A9              00E9
聽                             807D                E8 81 BD              807D
<no display>             103A0            F0 90 8E A0      D800 DFA0

The weird stuff happens with UTF-8 which allows for normal ASCII (or Latin characters) as either 0x00 - 0x7F or 0x0000 - 0x007F to allow for ASCII without changing them, but for the 0x80 - 0xFF ASCII (or Latin-1) it needs it to be coded using the rules as illustrated above.
Hence from the private build I could see the header indicated UTF-8, however the é was still encoded E9 and not C3 A9 as it should have been for UTF-8, not that the Denon noticed :-)
I quickley retested with Twonky and had a file with é and 聽 and checked the UTF-8 encoding with Wireshark and it was correct. The Denon did not display the 聽 just ?? but did interepret the é on the front panel but on the OSD it was an e. Tested with â ä and I get a a :-) Seems the Denon is not strict on the coding and the front panel handle display is more accurate than the OSD.

Unicode is the character code table value and UTF-nn defines how it should be encoded (to indicate how many bytes or double bytes are used to define the Unicode character) so there is no confusion when multiple bytes (UTF-8) or double bytes (UTF-16) are needed to encode a string.

It has been fun understanding this and have a new found appreciation for the engineers that work in localization :-) Until more UPnP devices can display other character sets it (I think only Roku claim it so far) it will be an area for the future. UTF-8 is a base requirement for all UPnP devices and from the the standards body it looks like they are working on UTF-16 to be a future base requirement.

Would be interested to here how some of the European users get on with local titles, album and artist names.

OK, now to try and explain this all to Denon . . . . . . .
Logged
Pages: [1]   Go Up