Hi,
This is actually the area in which I work. I produce callcentre automation using Speech Recognition and Text-To-Speech. The best TTS available (In my opionion, and this is a very subjective area) was made by a company called Rhetorical. Rhetorical and a host of other companies (one called Nuance) were bought by Scansoft, and Scansoft has now rebranded themselves Nuance.
The rhetocrical product was integrated with there existing products; but the result is not as good - but is more scalable - less resource intensive (RealSpeak 4 -
ftp://ftp.scansoft.com/products/realspeak/eng_daniel.wav or if your call centre should be in india -
ftp://ftp.scansoft.com/products/realspeak/eni_sangeeta.wav)
However the TTS uses a GB of memory and is very processor intensive!
The problem with generating realistic speech is down to the fact we subtly change the way we say sounds to blend them. We can't get a system to artificially produce these sounds, so we concatenate recorded sounds together, thousands of them depending on tempo, whether its rising, flat or lowering pitch, it all gets complicated very quickly.
Microsoft has concentrated on their speech server product and spent less time on TTS and Speech Reco, mainly as there are a few companies which are really good already - IBM and Scansoft (now Nuance).
Speech Reco Engines are also very cpu intensive, but the type of engine I am exposed to is very different to the type used on a Home PC for dictation (Dragon Naturally Speaking and Via Voice are both sold through Nuance)
If you wanna have a go at integrating speech components, I'll be happy to help!
R