Just found this info twitted by @TheNextWeb : Japanese researchers invent automatic animated sign language
system, and just had to blog about it !
As you may not know, apart my research work on text analytics methodologies,
I studied speech processing until the rigorous nomenclatures of the French
University forced me to choose between specializing in Natural Language
Processing applied to textual material or applied to speech material, a few
years ago.
I still have a strong interest for what goes on in the field of speech processing and its applications (conversational agents, lip-sync systems, vocal search engines) even though I work on textual material for now. And I particularly enjoy applications that merge text and speech processing. So I could not help but being drown into writing those lines on the latest innovative development made by the NHK Science & Technology Research Laboratories that is, imho, just an awesome example of what could be done merging text and speech processing. Let's take a closer look :
The NHK Science & Technology Research Laboratories is coming up with technology that automatically generates animated sign language in order to expand sign language in news broadcasts.
Simply put, it is almost like a lip-sync system but for the hands
The
system is actually built on a text-to-text correspondence module that converts
japanese text to signed text ; another correspondance module then
associates text spans to "hand-codes" (I don't know the exact term, and suggest
this one by analogy with "mouth-codes", used in animation for lip-sync systems
development).
The cherry-on-top idea ? Incorporating a translation memory to
enhance the system outputs with expert knowledge : this materializes by a
user interface through wich a human can enrich the lexicon or refine
combination rules for hand gestures.
Oh yes ! I teased with "speech2text" but wait... There is no speech-to-text module in this system ! Let's think about it : it lacks only one brick ! Indeed, once the speech signal's complexity is reduced to text material (words, phrases or any other accurate text span), the whole system would be in capacity to deal with speech material as input. This kind of phonetization processes development is not an issue in itself nowadays.
And if we think a bit further, I'd say it is a reasonable hope to expect
this kind of system handling "text2speech" outputs too, even if "text2speech"
is not as easy to handle for now, if one is expecting for a natural /
non-robotic output. That would be very useful for blind people (of course, they
can hear broadcast news, but hey, what if they want to refresh their experience
of accessing written info on the web ?), social games applications (texting
messages to your motioned and talking avatar while being temporarily or
permanently speechless, so that it can talk ingame) or domotic applications
(texting messages to your home that are displayed with your avatar and voice in
the end, for example), to mention just a few. #I skip the 3D motion part, as I
am completely unexperimented in this domain#
I am quietly but eagerly waiting for this kind of initiatives to develop and
impact the mainstream audience. Startupers with NLProc backgrounds in text AND
speech processing should begin to combine their skills thinking of the next
opportunities to come up with an innovative solution : multimodal
NLProc is on its way 
