I just tried to find a short segment in a 2- hour-long podcast. (If you're curious, it was a podcast of the a show called The Space Programme, and there was a short segment on the N-Prize in it). Even skipping forward a few seconds at a time, it took quite a while to locate what I was looking for.
So.
Why not have a piece of software which can perform speech-to-text analysis of a podcast (or any file containing spoken word), and can then allow the user to search for a given word or phrase? Yes, I know that speech recognition software is not great, but it's not bad (especially if it can run slowly to "subtitle" the podcast in realtime or slower). I would then have searched for "N-prize" or (if that failed, being an unusual and hard-to-recognize phrase) "prize", or a few other relevant phrases.
If the software were good, it would allow for fuzziness. For example, in searching for "prize", it would also search for "pries" and "price" (easily confused by speech recognition software). It might then show me the points in the podcast where the word or phrase was found, and allow me to click to hear that part of the programme.
I suspect there are plenty of speech-to- text programmes out there, but this would combine the speech recognition with a search and playback function.-- MaxwellBuchanan, May 26 2008 http://www.podscope.com/ Looks dead now, had some buzz in 2006. Does anyone know whether it actually worked? [jutta, May 26 2008] Service formerly known as podzinger http://www.everyzing.com/I vaguely remember actually using this to find things in podcasts. Now it's a corporation targeting enterprise markets. [jutta, May 26 2008] Halfbakery: Speech To Text Processing Speech_20To_20Text_20ProcessingI knew we'd been over this before. [jutta, May 26 2008] baked by intelligence-gathering agencies the world over-- FlyingToaster, May 26 2008 OK - so I just need to download the relevant transcripts from the CFBA?-- MaxwellBuchanan, May 26 2008 Jutta - from the looks of it, that podscope site would be excellent - I wonder why it ne'er took off?
What I had in mind was more a tool for use on your own machine (it would 'index' spoken-word mp3 or wav files, either on the fly or to create an archive on your machine). However, a web- wide search tool would have many additional applications, especially if it could drop you into the selected podcast just before the searched phrase occurs.-- MaxwellBuchanan, May 26 2008 Word spotting as technology is still very fragile. This is much harder than text to speech, and the results are, well, spotty. (Sometimes doing something very badly is worse than not doing it at all.)
There are some things in this space I'd love to try - like first translating the user query into phoneme salad, then spotting the phoneme salad; then trying to parse the surrounding sentence with some sort of meaningful grammar. We're on the verge of being able to do this usably well, but it's still not easy.-- jutta, May 26 2008 I know it's a difficult field. How does real-time subtitling on TV work? I'd always assumed it was by speech recognition, because it makes the sort of mistakes you'd expect it to make.
Also, this algorithm doesn't have to be perfect. Suppose it consistently mis- hears "prize" as "price"; then I search for "prize" - the engine will search for "prize", "price" and anything similar. It will not be perfect. But, suppose I search a podcast for the "prize" and it comes up with the following poorly- transcribed phrases:
"would hefty pay the prize for not" "came as a big sir prize to him" "offered a price of fifth team hundred dollars"
It would then be very easy for me to see that the third one was probably what I was looking for, click on it, and be taken to part of the podcast that said "offered a prize of fifteen hundred dollars..."-- MaxwellBuchanan, May 26 2008 Yeah, your examples are quite close to what podzinger's results felt like.
This is an existing area of research, but AFAIK, commercial real-time subtitling is done by humans (often stenographers with a little bit of software to translate steno back into normal written language), and the errors you see are human errors.
(You can tell from the fact that subtitlers quite often summarize, leave out uhms/ahs, rephrase expletives.)-- jutta, May 26 2008 Hmm. I'm pretty sure I've seen some errors which I wouldn't expect a human to make (things like "big sir prize"), but maybe they are human slips or glitches in the downstream software.
According to Mr. Wiki, "Voice recognition technology has advanced so quickly in the United Kingdom that about 50% of all live captioning is through voice recognition as of 2005.", but it's also possible that this is done through a speaker re-voicing in realtime for clarity - it's not clear exactly what's done in practice.
There are certainly different types of subtitle; some appear in whole phrases, often colour-coded to the speaker and often condensed; others appear word by word, with no obvious condensation, and look much more computer- generated to my eye.-- MaxwellBuchanan, May 26 2008 Yeah, respeaking would fit both in using both STT and a human. (And since there's a significant problem with speaker-agnostic STT, this does make a difference.) Frustratingly, I can't find BBC statistics on any of this - they just say how much they're captioning, not what the quality of the captioning is, or how it's done.-- jutta, May 26 2008 Same here - it's all precisely vague and exactly imprecise. But maybe the software isn't as good as I'd thought. The only hope for rescuing this might be the fact that the software needn't operate in realtime. However, I suspect that speed is not main issue.-- MaxwellBuchanan, May 26 2008 random, halfbakery