The problem: The current audio interfaces are either * an IVR (press one for this press two for that) or * audio speech recognition wizards that try to guess what you meant. *- If they think they succeeded they ask you if that's what you meant and proceed to do some preconceived action. *- If they don't they'll give you a menu to choose from or go back to the beginning.
This is quite frustrating as opposed to the interaction we have with visual interfaces on screens.
If you were not able to read the result you simply read it again. There's always enough info to look around and choose what you want to do, and if not, then you ask for a bit more and get tons more.
The solution should mimic the way we do conversation naturally:
We (A) ask for something (Q1). The listener (B) hears what we ask, understands some of it, usually not every exact single word, but gets the gist of what you are saying, and then comes up with a question (RQ2) in response.
This response (RQ2 given by B), which is itself another question, has in it at least three new "sub-questions", three new bits of extra info (RQ2a, RQ2b, RQ2c...), so that when RQ2 is answered (by A, with the response R3) , B will be sure she understood A correctly, and everything A asked at the beginning (in Q1) is now clear. B can now take action, or ask another question if there are still any pending issues.
Example: Excuse me - what is the cheapest flight to New York.
Exasperating computerized aided personnel answering: When exactly do you want to fly sir? (I don't know, some time within the next month as long as it's in my budget!)
Typical smart human answer (after telling her about the discussion with the computerized aided personnel, including my afterthoughts): [Summing up her thoughts which were: OK, let me see what kind of packages we have coming up within the next month which may help you decide. I can then do a detailed comparison of the days and get back to you. OK, here goes...] I can give you some details about the cheapest flights in the next month, but that will take me some time. Do you want me to check it out for you now, and get back to you when I have an answer?
Or: Typical google answer to search for "Litle Woman" - Did you mean "Little Women" - finding results 1-97875 for "Little Women". Search instead for "Litle Woman"
Human researcher: Are you referring to the book by Louisa May Alcott, to something in the news about a small woman, or to something else?
You: What was that?
HR: Louisa May Alcott the author of the book "Little Women"?
You: You say Loo ee zah, not louse like house!
HR: OK Thank you. I'll remember that: Loo ee za. So is it *that* book you were asking about?
You: No. What was your other suggestion?
HR: Oh there's a news item going around now about a woman - a medical doctor named Thor Dou who married a young man who's name is Mark Little. She had adopted his family name and called herself Doc Thor Due Little, but now she has divorced and there's a whole issue in the news about it. Is that what you were looking for?
You: No.
HR: Oh. But you're looking for the term: "Little Women", right? (short pause) did you mean women in the plural or woman like a single woman? And Little means small or is it a family name? Or am I totally off?
--------------------------
What programming technologies will I need to develop this kind of a system?
Is there any open source stuff that will help me?
Do you think it's doable with today's existing technologies to get something at least close to this "intelligent" discussion, where the computer is narrowing down the possibilities but without asking me in a mechanical "press one for this, press two for that" way?
---------------------
The thing is, that I don't want to understand anything about the phonemes and audio. I want a stream of text(s), like:
"What is the cheapest flight to New York"
Then I get a list of possible semantics and hearing mistakes:
Watt ease the chipset fight / light / test jetflight 2 knew rock. NYC /NYS
And: Asking about a plane ticket... Asking about a ticket for a game of boxing Probably gave location Did not give time-- pashute, May 19 2016 No help here, but similar discussion? Collection_20boxing_20gloves [normzone, May 19 2016] Open source REST semantic extraction from text http://www.nuxeo.co...ul-semantic-engine/ [pashute, May 20 2016] Open Semantic Framework http://opensemanticframework.org/Virtuoso (RDF), Solr (search), OWL API (ontologies) and GATE (tagging and NLP) [pashute, May 20 2016] Natural voice interface at Nuance http://research.nuance.com/Missing the "result" part of the interaction [pashute, May 22 2016] Introduction to voice user interfaces http://ldt.stanford...xcerpt.Winograd.pdf [pashute, May 22 2016] ian thanks! woe!
norm - I think after my edits its a bit clearer. no? or at least now...-- pashute, May 19 2016 INVITATION: Dear Halfbaker. YOU are kindly invited to join the board, and get this going with me!!
It's going to replace the UI for many many devices, including the general computer and smartphone, and at the first stage at least it will become extremely popular alongside it.
OK I edited again. I think I found something that will be very useful, but looking for other tools.-- pashute, May 20 2016 Here it is said by an outsider [link]:
VOICE USER INTERFACES have been tantalizing and disappointing us for decades. The promise is simple to allow people to communicate with machines naturally using voice.
(Mike Sparandara, Lonny Chu and Jared Benson from Punchut)-- pashute, May 22 2016 random, halfbakery