h a l f b a k e r yProfessional croissant on closed course. Do not attempt.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Current voice compression codecs are good,
but for the ultimate low-bandwidth voice codec,
perform speech recognition on what each user
says, send the resulting text string across the
Internet, and when the other user's computer
receives the text string, it uses a speech
synthesizer to convert
the text back into speech.
A naive implementation of this would probably
suck -- nobody likes to listen to a robotic-
sounding synthesized voice for very long. But a
slightly better version would detect and send
intonation metadata along with the text (e.g. "at
this point in the sentence, lower the pitch by
50Hz and slow the speech rate by 10%"); this
would make the recreated speech sound more
natural and lose less connotation.
In addition to the above, an advanced version
could pre-download voiceprint 'skins' for each
person you talk to, so that your computer could
more accurately simulate Uncle Fred's voice and
mannerisms using canned data.
[link]
|
|
And this is better than the phone because _________ ? |
|
|
...because you can listen to it over again? |
|
|
The big advantage is high compression. When compressing data further and further you generally hit a limit where you start losing signficant content. Most people will have seen over-compressed JPEGs where the artifacts (distortions caused by compression) so distort the image that you can't tell what it is any more. |
|
|
The only way to compress further is by using a 'common data dictionary' where the sender and receiver are both in possession of a large identical data dictionary which contains fragments commonly used in sound files, images or whatever you are compressing. |
|
|
Jeremi's idea is a development on this where you share speech data in a common compact format (text)
then enhance this using a 'data dictionary' of what the voice sounds like. This gives you massive compression with high quality. One application would be clearer, higher quality mobile phones that use less bandwidth and are therefore cheaper to run. |
|
|
My croissant already given, I would suggest one change, though. Instead of rather than trying to render the speech into text, render it only as far as phonetics. That would allow the data to be communicated without the codec needing to understand the sense of the sentence. Render it back using a Phonetic Markup Language (almost posted that a while back). |
|
|
I'm picturing a sort of verbal MIDI. |
|
|
i thought of this myself one day. Very interesting idea, I must say. Probably quite doable. Actually, I think its partially baked already. There is software to voice-reg and type into chat programs and read the messages aloud to you. Like that T2 head. |
|
|
set it up so you could use it as one of those phones on the internet, the compression would still have speed on a 56k modem. Like Me! :) |
|
|
A 56K modem is fast enough for traditional VoIP phones. |
|
|
very intriguing idea. I like it. Maybe used for podcasts if it is too slow for done in real time. |
|
|
Would the meta data not be best sent as inflection marks in the text itself... Much like the timing information supplied in sheet music. I would prefer symbolising all that is required in the message rather than breaking the message down into phonetics --- noone reads phonetically. |
|
|
Would give a [+] for setting out the symbols required to encode the meta data in the input speech! |
|
| |