Annotate Attentive Listening Audio Interface

The problem: The current audio interfaces are either
* an IVR (press one for this press two for that) or
* audio speech recognition wizards that try to guess what you meant.
*- If they think they succeeded they ask you if that's what you meant and proceed to do some preconceived action.
*- If they don't they'll give you a menu to choose from or go back to the beginning.

This is quite frustrating as opposed to the interaction we have with visual interfaces on screens.

If you were not able to read the result you simply read it again. There's always enough info to look around and choose what you want to do, and if not, then you ask for a bit more and get tons more.

The solution should mimic the way we do conversation naturally:

We (A) ask for something (Q1). The listener (B) hears what we ask, understands some of it, usually not every exact single word, but gets the gist of what you are saying, and then comes up with a question (RQ2) in response.

This response (RQ2 given by B), which is itself another question, has in it at least three new "sub-questions", three new bits of extra info (RQ2a, RQ2b, RQ2c...), so that when RQ2 is answered (by A, with the response R3) , B will be sure she understood A correctly, and everything A asked at the beginning (in Q1) is now clear. B can now take action, or ask another question if there are still any pending issues.

Example: Excuse me - what is the cheapest flight to New York.

Exasperating computerized aided personnel answering: When exactly do you want to fly sir? (I don't know, some time within the next month as long as it's in my budget!)

Typical smart human answer (after telling her about the discussion with the computerized aided personnel, including my afterthoughts):
[Summing up her thoughts which were: OK, let me see what kind of packages we have coming up within the next month which may help you decide. I can then do a detailed comparison of the days and get back to you. OK, here goes...]
I can give you some details about the cheapest flights in the next month, but that will take me some time. Do you want me to check it out for you now, and get back to you when I have an answer?

Or: Typical google answer to search for "Litle Woman" - Did you mean "Little Women" - finding results 1-97875 for "Little Women". Search instead for "Litle Woman"

Human researcher: Are you referring to the book by Louisa May Alcott, to something in the news about a small woman, or to something else?

You: What was that?

HR: Louisa May Alcott the author of the book "Little Women"?

You: You say Loo ee zah, not louse like house!

HR: OK Thank you. I'll remember that: Loo ee za. So is it *that* book you were asking about?

You: No. What was your other suggestion?

HR: Oh there's a news item going around now about a woman - a medical doctor named Thor Dou who married a young man who's name is Mark Little. She had adopted his family name and called herself Doc Thor Due Little, but now she has divorced and there's a whole issue in the news about it. Is that what you were looking for?

You: No.

HR: Oh. But you're looking for the term: "Little Women", right? (short pause) did you mean women in the plural or woman like a single woman? And Little means small or is it a family name? Or am I totally off?

--------------------------

What programming technologies will I need to develop this kind of a system?

Is there any open source stuff that will help me?

Do you think it's doable with today's existing technologies to get something at least close to this "intelligent" discussion, where the computer is narrowing down the possibilities but without asking me in a mechanical "press one for this, press two for that" way?

---------------------

The thing is, that I don't want to understand anything about the phonemes and audio. I want a stream of text(s), like:

"What is the cheapest flight to New York"

Then I get a list of possible semantics and hearing mistakes:

Watt ease the chipset
fight / light / test jetflight
2 knew rock.
NYC /NYS

And:
Asking about a plane ticket...
Asking about a ticket for a game of boxing
Probably gave location
Did not give time