Topic: Computer Speech And Speech Recognition (Read 5238 times)

KingSparta · « **on:** February 05, 2006, 09:54:57 am »

A few years Ago I Remember IBM And Others Were Working On Computer Speech And Speech Recognition.

If you remember Scotti talking into a mouse on Star Trek 4, we may have laughed at him but it seemed that something would be done with it down at the user level. I feel Microsoft has dropped the ball on realistic speech. Have You Heard The Microsoft Reader? Microsoft's Stock Voices Sound Like Computer Generated Voice, And Not Real.

Dave, "Media Center"

Media Center, Yes Dave?

Dave, Play

Media Center, Play What Dave?

Dave, Play "Love Machine"

Media Center, I Love You Too Dave

Lately I Have Been Playing With TextAloud With AT&T's, And NeoSpeech Realistic Voices Using Text Files And Allowing The Software To Read Typed Stories.

I went To Microsoft But I Have Not Found Any Recent Projects That Microsoft Has Been Working On. I Know J River Went To The Computer Show A Few Months Back.

Any New Projects?

Or Does Speech Recognition Seem To be Dead?

A Realistic Voice Of The Above Typed Text:
http://www.spartasoftware.com/Message-01.mp3

alanl · « **Reply #1 on:** February 05, 2006, 10:28:27 am »

Opera have been using it to control their browser, with limited success in my case, but you're right, KS, it seems to have dropped below the horizon in the main stream. Or are we both missing something.

Mr ChriZ · « **Reply #2 on:** February 05, 2006, 11:26:50 am »

Quote from: KingSparta on February 05, 2006, 09:54:57 am

A Realistic Voice Of The Above Typed Text:
http://www.spartasoftware.com/Message-01.mp3

Which text-speech converter made that one?
It's probably the best one I've heard so far, but it's still not quite the sexy
voice from star-trek Voyager. I have always been suprised that we've not been
able to better emulate a human voice...

KingSparta · « **Reply #3 on:** February 05, 2006, 11:35:52 am »

That Was Paul From NeoSpeek, It Is A ASPI5 (A Speech Standard) Voice And I Used It With TextAloud.

I Also Have Kate From NeoSpeek That Maybe Better For Your Late Night Dreams.

In The Future I Was Wondering If JRiver Or Other Companies Would Or Could Include Speech To Read Options And Or Information In "Notes", "Bios" Etc...

Or Integrate Speech Commands Into There Products.

Mr ChriZ · « **Reply #4 on:** February 05, 2006, 12:06:55 pm »

It would be good if good speech recognition could be built into something like Girder.
Maybe there's away to do that already?

JONCAT · « **Reply #5 on:** February 05, 2006, 12:51:59 pm »

Isn't Dragon Naturally Speaking really good? They came out with versoin 9 or 10 a while back

Dr. C

KingSparta · « **Reply #6 on:** February 05, 2006, 01:10:14 pm »

Warning, Sharp Objects Ahead

Ouch On The Price Tag

I think thats a bit much for most users.

richard.e.morton · « **Reply #7 on:** February 05, 2006, 01:30:30 pm »

Hi,

This is actually the area in which I work. I produce callcentre automation using Speech Recognition and Text-To-Speech. The best TTS available (In my opionion, and this is a very subjective area) was made by a company called Rhetorical. Rhetorical and a host of other companies (one called Nuance) were bought by Scansoft, and Scansoft has now rebranded themselves Nuance.

The rhetocrical product was integrated with there existing products; but the result is not as good - but is more scalable - less resource intensive (RealSpeak 4 - ftp://ftp.scansoft.com/products/realspeak/eng_daniel.wav or if your call centre should be in india - ftp://ftp.scansoft.com/products/realspeak/eni_sangeeta.wav)

However the TTS uses a GB of memory and is very processor intensive!

The problem with generating realistic speech is down to the fact we subtly change the way we say sounds to blend them. We can't get a system to artificially produce these sounds, so we concatenate recorded sounds together, thousands of them depending on tempo, whether its rising, flat or lowering pitch, it all gets complicated very quickly.

Microsoft has concentrated on their speech server product and spent less time on TTS and Speech Reco, mainly as there are a few companies which are really good already - IBM and Scansoft (now Nuance).

Speech Reco Engines are also very cpu intensive, but the type of engine I am exposed to is very different to the type used on a Home PC for dictation (Dragon Naturally Speaking and Via Voice are both sold through Nuance)

If you wanna have a go at integrating speech components, I'll be happy to help!

R

JohnT · « **Reply #8 on:** February 06, 2006, 09:36:30 am »

As a developer here at JRiver, I've been toying with speech reco in "command and control" mode for controlling Media Center. So far I've been playing around with Microsoft speech SDK 5.1 which seems fairly good for command and control purposes. Going with the Microsoft product would be low cost for development and end users. I've also looked a little at Carnegie Mellon's Sphinx project which is open source. I would welcome links to other promising and economical products/projects in this area.

- John T.

Mr ChriZ · « **Reply #9 on:** February 06, 2006, 09:43:44 am »

Quote from: JohnT on February 06, 2006, 09:36:30 am

I've also looked a little at Carnegie Mellon's Sphinx project which is open source. I would welcome links to other promising and economical products/projects in this area.
- John T.

Back in September I was searching for a project idea for my dissertation,
I looked at the possabilitys of creating a Media Centre Plugin that used Voice
Recognition to control Media Centre. I also looked at that open source project,
but for the life of me I could not get it to understand what I was saying.
Eventually I dropped the idea because I was unsure If I would be able to create
plugins for MC, or if there was a great enough scope
to make a dissertation out of it.
Maybe you'll get futher with Sphynx than I did!

richard.e.morton · « **Reply #10 on:** February 06, 2006, 09:48:51 am »

Sphinx is a well known project.

The other you may wihs to look at is Festival
http://www.cstr.ed.ac.uk/projects/festival/

Rich

JohnT · « **Reply #11 on:** February 06, 2006, 10:01:09 am »

Quote from: richard.e.morton on February 06, 2006, 09:48:51 am

Sphinx is a well known project.

The other you may wihs to look at is Festival
http://www.cstr.ed.ac.uk/projects/festival/

Rich

Thanks for that link.

Pink Waters · « **Reply #12 on:** February 06, 2006, 11:02:09 am »

http://www.pcworld.com/reviews/article/0,aid,124162,pg,4,00.asp#voice

KingSparta · « **Reply #13 on:** February 08, 2006, 02:09:23 pm »

Speech Recognition

Speech Recognition in Windows Vista empowers you to interact with your computer by voice. It allows you to significantly limit your use of mouse and keyboard while maintaining or increasing productivity. You can dictate documents and e-mail messages in mainstream applications, fill out forms on the web using voice commands, and seamlessly manage Windows Vista and applications by saying what you see

More At:

http://www.microsoft.com/windowsvista/features/foreveryone/speech.mspx#more

richard.e.morton · « **Reply #14 on:** February 19, 2006, 08:28:39 am »

XP has these features as well, Vista takes it further of course. I am not sure how you're supposed to connect to the speech engine, or how the speech engine knows when to control the active application or the OS.

"Play"
"Switch Application"

I tend not to use dictation software, I find them to be too fragile and not too great with accuracy (for me, I know others sware by them)...

You need an engine which knows the context of the application or subject of the dictation...

That's a major problem with speech rec which annoys us, it's the lack of understanding of context. Even very advanced systems lack this.

I'm interested to see Vista, I was looking at the minmimum specs and they look horrendous!

I'm no Mac lover (the interface looks good, but is illogical in major places, rather than Windows being illogical in minor places), but it looks like MS is playing catchup with many of the features and requires more processing power to achieve it...

another example of M$ power consumption requirement (probably due to architecture methodology) look at the spec differences of the hardware platforms of Symbian or Palm Phones versus Windows Mobile...

Rich

Pink Waters · « **Reply #15 on:** February 19, 2006, 09:09:00 am »

The only way to use speech commands in XP is via Microsoft Plus!

Rob L · « **Reply #16 on:** February 19, 2006, 05:54:28 pm »

Or use XP Tablet edition, it's standard in there...

JONCAT · « **Reply #17 on:** February 22, 2006, 12:39:58 pm »

We are looking at Auralog in my French class right now...looks very interesting.

Dr. C

wadek · « **Reply #18 on:** March 26, 2006, 10:52:16 pm »

I am new to posting in the forums, but after coming across this product this evening, I got excited about the possibilities. I am relatively new to the "home automation" and still in the researching phase. So far I have purchased the CAV and HAI Omni and a 15" touchscreen. I have only setup the CAV this weekend and I'm still waiting to recieve my Omni. I havn't been able to make a decision as far as an end user interface. I've been looking alot at Windows Media Center, Cinemar, and HAL, and now JRiver. Any suggestions? Anyway, talking about voice control, I'm am not a programmer, but it seems that HAL is good for voice recognition. Does it make sense to create a plugin and team up with their product?

JimH · « **Reply #19 on:** March 27, 2006, 06:27:29 am »

Welcome to Interact.

You might want to take a look at this:
Zoner/Hyslopc's Multi-zone with Russound CAV6.6 in-wall keypads
http://www.avsforum.com/avs-vb/showthread.php?s=&threadid=534233

INTERACT FORUM

Author Topic: Computer Speech And Speech Recognition (Read 5238 times)

JONCAT

JONCAT