h a l f b a k e r yWith moderate power, comes moderate responsibility.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
As most of you will know, Babelfish is a feature of the AltaVista search engine, that lets you translate text from one language to another, and view web pages in different languages. Unfortunately, it is notoriously bad. The process of parsing text in one language, forming a logical account of the
meaning of a sentence (subject verb object modifiers etc), and then converting this abstract representation into another language is very difficult, and it's rare that you find anything in Babelfish that is anything like the meaning of the original document.
Rather than adding translation complexity, the translator could be improved by simplifying the procedure. What I'm proposing is a web-based translator that doesn't try to render the foreign language into a dubiously-constructed version of your native tongue. Instead, all it does is to parse the original sentence, and display its results as to word meaning, word function, part of speech, tense/mood/number, grammatical construction it thinks is being used, etc. The system would also present alternatives if a word has more than one meaning or if there is an ambiguity as to how the sentence could be parsed.
All this would be structured to be as easy to read as possible, and as close to the original document format, so that a section of the document where the meaning was obvious would be rendered as close to a translated text, but a tougher section would include more part-of-speech marks, variant possibilities, and marginal notes.
This should not be hard to implement, since automatic translators must do this already. Dictionaries and grammars provide similar information, but not linked to a specific text. Displaying this information would make reading and understanding documents a slower process, but hopefully a more accurate one. The major challenge would be finding the most readable and clear display format.
(I know of one online English language parser and numerous dictionaries, but nothing that puts all this functionality into one website or tool.)
(?) CLAWS part-of-speech tagger
http://www.comp.lan...el/claws/trial.html Part of the functionality I want. [pottedstu, Dec 05 2001, last modified Oct 21 2004]
(?) Babelfish
http://world.altavista.com/ Translate a wide variety of languages into gobbledegook in a variety of other languages. [pottedstu, Dec 05 2001, last modified Oct 21 2004]
Universal Chinese
http://www.halfbake...Universal_20Chinese What waugsqueke wants? [pottedstu, Dec 05 2001, last modified Oct 17 2004]
(?) Jurafsky & Martin, Speech and Language Processing, Ch. 1
http://www.cs.color...~martin/slp-ch1.pdf A good overview of natural language processing, describing the problems involved and the state of the art. You have to buy the rest of the book tho. [pottedstu, Dec 06 2001, last modified Oct 21 2004]
Wisgo PDA
http://e-dmec.com/WISGO.HTM english|chinese x20K words. [reensure, Dec 06 2001]
Ideas Futures: Machine Translation by 2015
http://www.ideosphe...in/Claim?claim=Tran If you would like to make a bet "about the availability of fluent machine translation" (and you have infinite patience with weird user interfaces), do it here. [jutta, Dec 07 2001]
(?) Maybe morphemes could "attract" their other-language counterparts?
http://www.shouldex...5456/1480&cid=36#36 Thank you pottedstu for referring me to this program elsewhere at b/2. Might zero in on web documents available in two or more languages. Sort of a rosetta stone effect? Not translation, but might yield interesting information? [LoriZ, Feb 24 2002, last modified Oct 17 2004]
[link]
|
|
waugs: I thought about that, but I'm not sure how much it would help. In theory, the elements of a language and the functions they perform (tense, number, gender, verb forms, noun function (agent, location, instrument,...), meaning) are common among languages. But they're not all fully encoded in all languages: e.g. Japanese has no gender (if I remember correctly), while some Native American languages have 9 genders; and Latin has no praeterite/simple past tense, while Greek and English do (though they're used slightly differently). Any practical intermediate language isn't going to encode all this data, and it's going to have to make guesses if it is forced to assign a gender or tense. |
|
|
The Universal Chinese idea comes close to this, anyway. |
|
|
Wow, cool idea [Waugs]. Maybe the intermediate language could be one especially created for this purpose that has, by necessity, a huge vocabulary. Rather than having multiple meanings for words, as most languages do, this intermediate language (Waugspeak) has an individual word for every meaning, yielding a much higher degree of meaning precision with reduced ambiguity when the translation from Waugspeak takes place. Of course translating TO this language might be tough as you'd have to know how to assign the various meanings of a word to the separate words in the intermediate language. This reminds me a bit of colorspace conversions in, say, Photoshop, where the LAB colorspace is (or should be) used as an intermediate step between CMYK and RGB colorspaces because LAB colorspace encompasses all of the colors in both CMYK and RGB colorspace. |
|
|
Shouldn't it be Waugspeake? |
|
|
Waugspeak(e) is doubleplus good. |
|
|
There are many hard to define words. "Will" in English, "hyygeligt" in Danish, and multitudes of other words in myriad languages. Can the best and brightest not figure out what the constituent parts of "will," or other difficult words like it are? What the bits are? Is there not an atomistic level of language to be discerned, or to be created? I think that this is not an impossible task. Of course there are plenty of words with no direct translation to another language and that is why the intermediate language, Waugspeak, would be, by necessity, enormous--encompassing the all of human verbal communication and designed for continuous extension. I mean, c'mon, this is the halfbakery here . . . this is possible--especially in light of the rapid increase in computing power. The world is now beginning to see Moore's law exceeded, and exceeeded soundly, soon approaching a geometric rise in computing power. It seems to me that, through even inelegant, brute-force means, something like what is being described here shall be possible soon. While it may or may not be mediated through an intermediate language in the exact way described, it is hard for me to imagine that fluent machine translation of language will not happen in, say, the next decade. Crudely within 5 years, consummately within 10. |
|
|
See 'ICML' at top right of page. (This saves me the embarrassment of linking my own idea again.) |
|
|
<< it is hard for me to imagine that fluent machine translation of language will not happen in, say, the next decade. Crudely within 5 years, consummately within 10. >> |
|
|
That's what they said in about 1970. |
|
|
Waugsqueke: The notion of a universal intermediate language (UIL) is old and apparently unworkable. Generation is easy, understanding is hard; translating from X to UIL is about as hard as translating from X to Y. That means that UIL only makes sense if you can get humans to actually write in UIL, and if you can get everyone to agree on a common language, you don't need a translator anyway. I imagine that systems like Babelfish *do* have some kind of intermediate representation internally, but I bet it's not textual. |
|
|
Peter, if this is so Baked, do you have an example of such a thing? |
|
|
For a simple version of this, have Babelfish output the text it does now, but whenever it's not completely clear on the translation of some word (there are several alternates), make the word (or phrase) a drop-down. The most likely choice is the default, but the user can pull it down to see other interpretations. |
|
|
An ongoing struggle in the academic community is to make sure that processing components (translation engines, speech recognition systems, handwriting recognizers, etc.) don't just output their best guess but rather produce a complete spectrum of possibilities with probability annotations. Often a higher level of processing has information that can help disambiguate the possibilities better than the lower level processor could. You're proposing taking it one step further and exposing the ambiguity to the human (the highest level processor of all, so far anyway), which I think is a great idea. |
|
|
Bristolz: What the hell are you talking about? I think you're on crack. Moore's Law *is* a "geometric rise in computing power", if you hadn't noticed, and I see no evidence that we're leaving it behind. Furthermore, even if we do develop extremely fast computers, there's nothing in this idea (or anywhere else) that tells us how to simply apply brute-force algorithms to get great translation. If we could do that, we would -- there are plenty of people who would be willing to let a computer chug overnight to produce brilliant translations of documents. |
|
|
Would you like to make a bet about the availability of fluent machine translation in the next decade? |
|
|
Okay, egnor, maybe I got a little carried away. Yes, the problems in building superb translation are much more than just compute power. What sort of translation achievement, benchmark, would have to be reached for you to consider something well-translated (we'll leave "brilliant translation" out for now as precious few people, let alone machines, are any good at that). |
|
|
<winces in pain at seeing this resurface> |
|
|
It's your own fault...<grin> |
|
|
Well, it's my fault that I wince in pain but, I think LoriZ added a link and resurfaced it. |
|
|
Oh yes. This is where your crack habit got outed. But allow me to restate my appreciation for Waugspeak.:) |
|
|
Ah, so. Hadn't realized the link was new... |
|
|
You know, I like to translate something from English to Japanese, to English, to French, back to English, to Russian, and back again to English, then on to Greek, and finally back to English for the last time. Things get lost in translation. |
|
|
Me, in Russian, the English back section which likes the fact that it translates for the second time in English, in French from English what from Japanese, in English and in Greek, and final in last English the back section knowing. Thing is gone with translation. |
|
|
I, in the Russian, English supports the section which likes the fact that it translates for the second time into English, in French of English what of the Japanese, in English and in the Greek, and the final in last English back knowledge of section. The thing went hand in hand with the translation. |
|
|
I, in the Russian, Englishes supports division whom it loves fact whom it transfers for the second time into English, in the French of the English of Japanese, in the English and in the Greek, and final examinations in the knowledge last English rear of division. Thing went hand in hand with the transfer. |
|
|
I, in Russian, Englishes supports the department that loves the make that transports for the second time in the English, in the Frenches of English Japanese, in the English and in the Greek, and the final examinations in the last English rear department of knowledge of department. The thing went with the transport. |
|
|
//final examinations in the last English rear department of knowledge of department.// See what I mean? |
|
| |