This Monday, I listened to Biing-Hwang (Fred) Juang’s speech, who is from Georgia Institute of Technology. You can find the news on our school’s website. The title of the speech is Prospects of Speech & Multimedia Research For The New Digital Era. Here is some notes of his talk.

Speech Research

Fred said there is a paradigm shift in speech synthesis. Storage prise becomes cheaper today, and this makes speech synthesis no use, because we can just store all the words that we need in advance.

Then he gave a demo on text-to-speech. It was a email reading program. He first gave a traditional version, which sounds quite machine-ish. Then a much better version similar to real human. The challenge is to let computer understand the document.

Went on, Fred brought forward the quesion: Speech recognition is a language problem or just a math or signal processing problem?

The math model has reached its limitations. In comparison of human Vs. machine, human often does better job. He mentioned two approach: 1) Biological approach; 2) more interaction. Interaction means to give more steps for machine to tweak the result.

In the future of speech research, Fred said speech technology is of human, it should beyond simple conversion between sound and text.

Media Research

Fred first introduced some new telecom paradigm: Ubiquitous computing; converged digital services; right content, right person, right context.

Then he quickly reviewed some of the research project in GIT.

  • Spatialization & the perceptual dimension. He gave a demo of Multi-channel system, transmiting voices of a male and a female at the same time.
  • Content processing. They are using region of interest to solve low resolution on mobile devices.
  • Embedded and Layered coding. They are using multi-stream coding to deal with high bit error.

In summary, these are the things in the center of future telecom research.