h a l f b a k e r yExperiencing technical difficulties since 1999
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
I believe the most perfect voice recognition could be achieved by creating a couch potato computer. A consumer grade desktop computer equipped with a consumer grade TV Tuner card could watch TV all day long. A neural network could be programmed to correlate the voice in the audio with it's text translation
coming from the Closed Captioning also available via the TV Tuner card. This AI system could then train itself merely by watching TV around the clock. Simple, cheap, and once programmed there would be little work to do.
Since the neural network could be exposed to a wide variety of content from the straight talk of CNBC financial news to the slang of MTV's Real World, the resulting product should understand a wide variety of voices, accents, and would have a huge vocabulary - everything from "S&P 500 Index Fund" to "Get jiggy wit' it".
Integrating Visual, Audio and Text Analysis for News Video
http://www.microsof...ng/MediaCom2/v1.pdf [prometheus, Jan 22 2002, last modified Oct 04 2004]
(???) Language tools for fight on terror
http://news.bbc.co....hnology/3399087.stm Jan 31 2004: This is being done with text-to-text translations, so why not voice to text? Look for "Language Weaver" in this story. Quoting: "The idea is to train the software using existing human translations. In a sense, the program learns to translate in a more human fashion, the more information is fed to it." (WTAGIPBAN) [krelnik, Oct 04 2004]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
And when your computer gets fluent enough, it will call your cable TV company and order the super deluxe every channel imaginable package so that it doesn't get bored. |
|
|
Sort of half baked in the film Short Circuit. |
|
|
Unfortunately, our current methods of trying to achieve artificial intelligence are tantamount to trying to row a boat to the moon. |
|
|
You'll be waiting a while. :) |
|
|
Ha! Has anyone done this with a Furby? |
|
|
Trouble is, there's so much background noise on most of today's TV it'd be hard to train the network to filter this out first, especially when some of it's other people's chatter. |
|
|
On the other hand, a lot of programmes like soap operas refuse to have more than one character talk at the same time, lest they baffle their viewers. Pick the right kind of program and it shouldn't get too confused. |
|
|
alx: voice recognition systems should be a able to filter out background noise so this is a good thing. |
|
|
You could start the system on storytelling programmes for children (clear speaking little background) and move onto the news (good enunciation but more background) before tackling MTV and live sports events. |
|
|
The biggest fly in the ointment is that captioning (well, teletext subtitles anyway) does not always follow dialogue word for word. If the sentence is particularly complex the subtitles will sometimes have a simpler sentence construction to fit them on screen more easily. I always thought that this must cause problems for those learning to lip read. |
|
|
Would you really want to talk to a computer thats total knowledge of human interaction comes from a bunch of soaps? |
|
|
¯RobertKidney: I'd prefer a computer that learns on the curve in any venue, as opposed to one that flatlined on year two of Springer or the 700 club |
|
|
There's something inherently wrong with this concept, but I can't put my finger on it. |
|
|
Perhaps the notion that something can gain intelligence by watching television? |
|
|
It's a cool idea, but TV is designed to be interpreted by the most advanced computer known to man - The human brain. These neural networks that allow processing of data into information have been evolved over more than few million cycles of a 32-bit processor... We've got a long way to go before we have a means to generate anything worthwhile through genetic algorithm. So unless someone figures out a way to reverse engineer his or her head, they're going to have to have some fun writing such an algorithm... But don't let that discourage you programmers from trying. |
|
|
It's interesting that a lot of A.I. algorithm doesn't include a very important part of the human program. The Ego. |
|
|
Or alternativly I could be just bitching. I don't mean to... Like I said - Cool Idea 8 ) |
|
|
It's interesting how people seem (to my understanding) to have misinterpreted carpe diem's use of the term AI. I don't imagine that the point of this is to make the computer in any way "understand" what it hears, only to recognise the words used, and associate them with other representations of these words (i.e. phonetic sounds or combinations of letters). A second aim might be to recognise and resolve different voices belonging to different people. |
|
|
The end aim and result, in any case would/should be algorithms to improve the accuracy and speed of speech recognision - not artifitial intelligence itself. What I understand carpe to be using the term AI to represent is the self-governing self-learning form of algorithm evolution which, in this case, would evaluate the accuracy of current methods by comparing the results with the close-captioning, then try several similar methods to see which is better, evaluate these methods, pick the best, and so on.. |
|
|
Yeah, yamahito is right. What's wrong with you guys, I thought half-bakers were smart people! |
|
|
This seems like a good idea to me. There might be a few glitches (sound effects, music, "note symbols" in the text, etc.), but this might be a great source of text-matched voice data. |
|
|
Generally a good idea with the exception that I think a consumer grade (mid-range) system probably wouldn't be up to the task. Realtime analysis of sounds to words would initially take a bit more than average computing power. A system with 2-4 processors, 16G ram and a very large Hard disk array might do the trick |
|
|
(caveat: TV personalities' voices are trained so listeners' minds have minimal work to do to understand them; the normal crap comes out of people's mouths doesn't usually compare well) |
|
|
Even more useful would be a youtube video watcher. That
way it would be able to understand what the Guangzhouish
salesperson is saying, or what steps the Ahmedabad based
support person is proposing you take. |
|
|
3 years later: Don't let it learn from the automatic captions. |
|
|
22 years later: the AI is writing the captions |
|
|
(But then, an AI wrote this, and may have written a bunch more on the HB. Hard to tell sometimes.) |
|
| |