Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Experiencing technical difficulties since 1999

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                                                 

Couch Potato Voice Recognition

Neural Network learns to decipher voice by watching TV
  (+38, -1)(+38, -1)(+38, -1)
(+38, -1)
  [vote for,
against]

I believe the most perfect voice recognition could be achieved by creating a couch potato computer. A consumer grade desktop computer equipped with a consumer grade TV Tuner card could watch TV all day long. A neural network could be programmed to correlate the voice in the audio with it's text translation coming from the Closed Captioning also available via the TV Tuner card. This AI system could then train itself merely by watching TV around the clock. Simple, cheap, and once programmed there would be little work to do.

Since the neural network could be exposed to a wide variety of content from the straight talk of CNBC financial news to the slang of MTV's Real World, the resulting product should understand a wide variety of voices, accents, and would have a huge vocabulary - everything from "S&P 500 Index Fund" to "Get jiggy wit' it".

carpe diem, Jan 22 2002

Integrating Visual, Audio and Text Analysis for News Video http://www.microsof...ng/MediaCom2/v1.pdf
[prometheus, Jan 22 2002, last modified Oct 04 2004]

(???) Language tools for fight on terror http://news.bbc.co....hnology/3399087.stm
Jan 31 2004: This is being done with text-to-text translations, so why not voice to text? Look for "Language Weaver" in this story. Quoting: "The idea is to train the software using existing human translations. In a sense, the program learns to translate in a more human fashion, the more information is fed to it." (WTAGIPBAN) [krelnik, Oct 04 2004]


Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.



Annotation:







       And when your computer gets fluent enough, it will call your cable TV company and order the super deluxe every channel imaginable package so that it doesn't get bored.
mwburden, Jan 22 2002
  

       Sort of half baked in the film Short Circuit.
lubbit, Jan 22 2002
  

       Unfortunately, our current methods of trying to achieve artificial intelligence are tantamount to trying to row a boat to the moon.   

       You'll be waiting a while. :)
seal, Jan 22 2002
  

       Seize the croissant.
phoenix, Jan 23 2002
  

       Ha! Has anyone done this with a Furby?
entremanure, Jan 23 2002
  

       Trouble is, there's so much background noise on most of today's TV it'd be hard to train the network to filter this out first, especially when some of it's other people's chatter.   

       On the other hand, a lot of programmes like soap operas refuse to have more than one character talk at the same time, lest they baffle their viewers. Pick the right kind of program and it shouldn't get too confused.
-alx, Jan 23 2002
  

       alx: voice recognition systems should be a able to filter out background noise so this is a good thing.   

       You could start the system on storytelling programmes for children (clear speaking little background) and move onto the news (good enunciation but more background) before tackling MTV and live sports events.   

       The biggest fly in the ointment is that captioning (well, teletext subtitles anyway) does not always follow dialogue word for word. If the sentence is particularly complex the subtitles will sometimes have a simpler sentence construction to fit them on screen more easily. I always thought that this must cause problems for those learning to lip read.
st3f, Jan 23 2002
  

       Would you really want to talk to a computer thats total knowledge of human interaction comes from a bunch of soaps?
RobertKidney, Jan 23 2002
  

       ¯PeterSealy: MST3K?   

       ¯RobertKidney: I'd prefer a computer that learns on the curve in any venue, as opposed to one that flatlined on year two of Springer™ or the 700 club™
reensure, Jan 23 2002
  

       There's something inherently wrong with this concept, but I can't put my finger on it.
waugsqueke, Jan 23 2002
  

       Perhaps the notion that something can gain intelligence by watching television?
sera, Jan 24 2002
  

       It's a cool idea, but TV is designed to be interpreted by the most advanced computer known to man - The human brain. These neural networks that allow processing of data into information have been evolved over more than few million cycles of a 32-bit processor... We've got a long way to go before we have a means to generate anything worthwhile through genetic algorithm. So unless someone figures out a way to reverse engineer his or her head, they're going to have to have some fun writing such an algorithm... But don't let that discourage you programmers from trying.   

       It's interesting that a lot of A.I. algorithm doesn't include a very important part of the human program. The Ego.   

       Or alternativly I could be just bitching. I don't mean to... Like I said - Cool Idea 8 )
Terrabus, Oct 07 2002
  

       It's interesting how people seem (to my understanding) to have misinterpreted carpe diem's use of the term AI. I don't imagine that the point of this is to make the computer in any way "understand" what it hears, only to recognise the words used, and associate them with other representations of these words (i.e. phonetic sounds or combinations of letters). A second aim might be to recognise and resolve different voices belonging to different people.   

       The end aim and result, in any case would/should be algorithms to improve the accuracy and speed of speech recognision - not artifitial intelligence itself. What I understand carpe to be using the term AI to represent is the self-governing self-learning form of algorithm evolution which, in this case, would evaluate the accuracy of current methods by comparing the results with the close-captioning, then try several similar methods to see which is better, evaluate these methods, pick the best, and so on..
yamahito, Oct 07 2002
  

       Damn, this idea rocks!
koolcj291, Dec 02 2003
  

       Yeah, yamahito is right. What's wrong with you guys, I thought half-bakers were smart people!
Size_Mick, Feb 26 2004
  

       This seems like a good idea to me. There might be a few glitches (sound effects, music, "note symbols" in the text, etc.), but this might be a great source of text-matched voice data.
Predictor, Jan 07 2005
  

       Generally a good idea with the exception that I think a consumer grade (mid-range) system probably wouldn't be up to the task. Realtime analysis of sounds to words would initially take a bit more than average computing power. A system with 2-4 processors, 16G ram and a very large Hard disk array might do the trick
shahcat, Sep 23 2009
  

       wow, good idea [+]   

       (caveat: TV personalities' voices are trained so listeners' minds have minimal work to do to understand them; the normal crap comes out of people's mouths doesn't usually compare well)
FlyingToaster, Sep 23 2009
  

       Even more useful would be a youtube video watcher. That way it would be able to understand what the Guangzhouish salesperson is saying, or what steps the Ahmedabad based support person is proposing you take.
pashute, Jul 02 2017
  

       3 years later: Don't let it learn from the automatic captions.
pashute, May 30 2023
  

       22 years later: the AI is writing the captions
yamahito, Dec 20 2024
  

       [yamahito] Bravo!!!   

       (But then, an AI wrote this, and may have written a bunch more on the HB. Hard to tell sometimes.)
minoradjustments, Dec 20 2024
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle