h a l f b a k e r yAssume a hemispherical cow.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
I have recently read //low end speech recognition //by jutta, link 1. and thought it was brilliant in its own right, and that it would make an excellent front end for something I was going to post at some point. The idea was to run the voice recognition software only to the point of recognizing the
uttered sounds, then printing them out without any higher level processing. Leaving the human recipient to make sense of it. Hippo suggested using the phonetic alphabet, to which I suggest allowing less certainty, so a sound might be 70% d or 30%t.
Instead of printing this out for a human being to deal with, I intend to use this as the input to a different type of Speech to text. engine Before some one reads the title and anno's that Speech to text is well baked and widely known, let me just say -not in this flavour they are not! And that what is out there are that good that they are mainly used by those for whom typing is not a practical option. If this idea works as badly as it might then it will generate texts that are about as coherent and meaningful as a whiter shade of pale by Procol Harem or a set of David Bowie lyrics, a slight improvement upon this state of the art a few years ago : ). If this works as well as I would hope, the output will be a block of text with perfect spelling and grammar that expresses the meaning of the original speech, though not necessarily using the original words. You may have gathered from this that it would be difficult to do on the fly, first the speech would go in, then the text would come out.
1 ) blocks of 1 to 7 syllables are translated into a monosyllabic ideographic language, such as old Chinese or Vietnamese. (both modern forms are not monosyllabic) the resulting logograms have semantic meaning. A number of small words, such as the, do not have there equivalent in CHINESE. So not all of the syllables will be translated. And many will come through as as belonging to several different words.
2 ) use probability tables, which logograms occur in combination, to identify the best fits in the new language. There is a slight problem with this, because the characters are in the wrong order, from a Chinese point of view. There are two obvious solutions to this: a ) create a probability table from scratch, by running large amount of English text through the full process. Or b ) and this is the one I prefair, create a set of extra texts by considering the characters as being at one or two places either side of its recorded position, and then using a probability table derived from CHINESE literature
3 ) the three or four best fits are translated back into English, and a human intelligence cuts and pastes from the choices available produce a block of text that expresses the meaning of the original speech.
4 ) use information from which bits make it into the finished document to adjust the probabilities; both the CHINESE probability table for all users of English, and the probability of a particular sound being a particular syllable, for each individual speaker.
By CHINESE (capitals ) I mean a suitable language, possibly even ancient cuneiform ? over 80% Chinese characters have have a phonetic component, this in no way detracts from it's use.
//low end speech recognition //by jutta
low-end_20speech_20recognition thank you for the start [j paul, Jun 28 2011, last modified Jun 30 2011]
LOOKUP TEXT COMPRESSION
Lookup_20Text_20Compression [j paul, Jun 30 2011]
Ancient Egyptian Hieroglyphs
http://www.nekhebet.../m_hieroglyphs.html Simple introduction to how they actually work... [prufrax, Jul 05 2011]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
when you say "translate into chinese" at step 1 do you actually mean translate - which would entail having semantic knowledge of the original speech, or do you mean transliterate - i.e. pick a chinese char with approximately the same phonemic value (ignoring tone)? |
|
|
.. the point being that the major troubles with existing speech-to-text, that you are presumably trying to improve on, are down to not being able to automatically recover the full semantic meaning of the speech. |
|
|
//it was brilliant in its own rite// - clever ironic use of phonetic spelling... |
|
|
The idea is to get from a string of spoken sounds in English, to several strings of Chinese characters. The Chinese characters each represent a thing, or idea. Then to get rid of all of the strings of ideas that are nonsense and then presenting the speaker with a few seemingly sensible possibilities. |
|
|
The semantic gain is from the ideographic nature of the CHINESE language. |
|
|
hippo no pun was intended. i will fix. |
|
|
//clever ironic use of phonetic spelling//
clever ironic use of the word "ironic." |
|
|
Perhaps an example might be useful. |
|
|
In the computer sciences they use the phrase //time flies like an arrow, but fruit flies like an orange.// to show the difficulty of making machines understand the spoken English. A human being would easily recognise that the words (like, flies and orange) have multiple possible meanings, but computers are not easy be to program to do this in a generalized fashion. By using the equivalent of an English to Chinese dictionary to convert the spoken syllables into Chinese glyphs, there would two different symbols for the spoken word flies; one for travels through the air, and one for the flying insects. Likewise for the words orange and like. That gives eight possible combinations, but only one of them will stand out as right when compared to an appropriate probability table. |
|
|
I have just read // lookup text compression // see the link And I think it might offer a way to avoid using CHINESE. |
|
|
But why would you want to avoid using CHINESE? |
|
|
You can only convert the english words into chinese by understanding what the words mean, so this does not gain you anything that you don't get from just using synthetic tokens associated with the set of possible meanings of each collection of syllables in english. |
|
|
In addition, using synthetic tokens would stop the many alternative meanings of the chinese words that are not possible alternative meanings of the english from exploding your probability table. |
|
|
[j_paul] Can the method you describe cope with the
following text ... |
|
|
//In the computer sciences they use the phrase
//time flies like an arrow, but fruit flies like an
orange.// to show the difficulty of making machines
understand the spoken English.// |
|
|
... or does that require getting meta? |
|
|
[pocmloc]
I dont want to avoid CHINES, I just thought that was putting people of the idea |
|
|
[prufrax]
The reason choosing a logographic, monosyllabic,language is that they have, by definition, one symbol for each meaning The point about about semantic tokens is well made, im just looking for a different approach. |
|
|
Translate. Sorry my mistake it should have been transcribe in to CHINESE and translate back into English.. to see how this transcribing process might work in practice, jutta's low end speech recognition idea generates a string of symbols that represent what was said. This is then used in a English to Chinese dictionary and patterns of sounds that compare with Chinese glyphs are selected. All of the possibilities for a given syllable in conjunction with its neighbours are transcribed. An other way of looking at this is. Imagine if you have an Egyptian to hieroglyph dictionary and a short piece wrote in Egyptian long hand, without knowing the meaning of the text you should be able to find probable match for the Egyptian and note down the corresponding hieroglyphs. |
|
|
Pattern recognition.
Not understanding! |
|
|
[mp]
yes!
That is the purpose of the idea.
And the out put would be grammatical correct, and properly spelled. |
|
|
j_paul, then by your definition, chinese is NOT a logographic, monosyllabic language. |
|
|
Have a look in a hanzi dictionary sometime. |
|
|
Also, Egyptian heroglyphs don't work like you want them to either. |
|
|
A pure logographic script is impractical, so the symbols are rapidly evolved to be reused metaphorically for abstract concepts, and then by phoenetic value too. This has happened in Chinese and Ancient Egyptian. |
|
|
This especially happened in Cuneiform, after it was borrowed and reborrowed to write multiple unrelated languages until the symbols ended up having so many possible different readings that they had to add extra symbols (determinants) to indicate what type of reading was required... |
|
|
Text to speech ALREADY uses probability tables. It is why the phrase: |
|
|
"Mum, can you record 'Glee' for me?" |
|
|
Gets changed by probability tables to: |
|
|
"Mum, can you please call police for me?" |
|
|
Thereby freaking out the recipient :) by making a judgement on what word the string of phonemes was most likely representing. |
|
|
I'd say we have a long way to go before we have perfect beach wreck ignition. ;) |
|
| |