Just found this info twitted by @TheNextWeb : Japanese researchers invent automatic animated sign language system, and just had to blog about it !

As you may not know, apart my research work on text analytics methodologies, I studied speech processing until the rigorous nomenclatures of the French University forced me to choose between specializing in Natural Language Processing applied to textual material or applied to speech material, a few years ago.

I still have a strong interest for what goes on in the field of speech processing and its applications (conversational agents, lip-sync systems, vocal search engines) even though I work on textual material for now. And I particularly enjoy applications that merge text and speech processing. So I could not help but being drown into writing those lines on the latest innovative development made by the NHK Science & Technology Research Laboratories that is, imho, just an awesome example of what could be done merging text and speech processing. Let's take a closer look :

The NHK Science & Technology Research Laboratories is coming up with technology that automatically generates animated sign language in order to expand sign language in news broadcasts.

Simply put, it is almost like a lip-sync system but for the hands :) The system is actually built on a text-to-text correspondence module that converts japanese text to signed text ; another correspondance module then associates text spans to "hand-codes" (I don't know the exact term, and suggest this one by analogy with "mouth-codes", used in animation for lip-sync systems development).

The cherry-on-top idea ? Incorporating a translation memory to enhance the system outputs with expert knowledge : this materializes by a user interface through wich a human can enrich the lexicon or refine combination rules for hand gestures.

Oh yes ! I teased with "speech2text" but wait... There is no speech-to-text module in this system ! Let's think about it : it lacks only one brick ! Indeed, once the speech signal's complexity is reduced to text material (words, phrases or any other accurate text span), the whole system would be in capacity to deal with speech material as input. This kind of phonetization processes development is not an issue in itself nowadays.

And if we think a bit further, I'd say it is a reasonable hope to expect this kind of system handling "text2speech" outputs too, even if "text2speech" is not as easy to handle for now, if one is expecting for a natural / non-robotic output. That would be very useful for blind people (of course, they can hear broadcast news, but hey, what if they want to refresh their experience of accessing written info on the web ?), social games applications (texting messages to your motioned and talking avatar while being temporarily or permanently speechless, so that it can talk ingame) or domotic applications (texting messages to your home that are displayed with your avatar and voice in the end, for example), to mention just a few. #I skip the 3D motion part, as I am completely unexperimented in this domain#

I am quietly but eagerly waiting for this kind of initiatives to develop and impact the mainstream audience. Startupers with NLProc backgrounds in text AND speech processing should begin to combine their skills thinking of the next opportunities to come up with an innovative solution : multimodal NLProc is on its way :)