h a l f b a k e r yNot just a think tank. An entire army of think.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Babelfish scores very high in the usefulness ratings, but pretty poorly for accuracy.
Wikipedia scores highly on both usefulness and accuracy - the accuracy relying on hundreds of thousands of humans checking the database against their real-world knowledge.
So let's put the two technologies together.
When
I read a Babelfished page, I'm basically doing half of the translation myself anyway. Babelfish has made a good start on it by translating most of the words, generally in the correct tense and sometimes with a nod to the correct grammar, I then complete the translation in my head by using:
a) a knowledge of English, as she should be spoke
b) a knowledge of the context - for instance "fil" would translate from French as "string" in the context of an article about knots, but as "wire" in an article about electronics
c) some basic knowledge of French / Italian / Whatever
The Babelfished version should open in an editable window which will allow me to rewrite some or all of it and also tag it to give the machine an idea of context.
The edits will then be submitted back to the translation site which will run it again through several mutated variants of the original translation engine. The mutation that produces the closest result to my submission will then be used for all future translations.
The crucial point here is that the although the machine is not learning from a professional translator, it is learning from people with a good idea of what the finished translation should be, based on the rough translation, subject knowledge and knowledge of the output language. And hopefully it will be learning off thousands of corrections a day.
This idea makes a number of assumptions that seem to hold true for wikipedia - eg that people will voluntarily donate their time to the project and that the majority of people will not deliberately feed in false data. It also makes some assumptions about the ability of translation engines (which are horrifically complex) to mutate without breaking.
Google Translate
http://translate.google.com asks you to "contribute a better translation" (after you enter a phrase) [swimswim, Nov 30 2009]
The skinny on Google Translate
http://googleresear...anslation-live.html Once again - google beats me to it. [wagster, Nov 30 2009]
Wiki Distributed Translation
Wiki_20Distributed_20Translation Similar concept. [bungston, Nov 30 2009]
[link]
|
|
Only one concern, though. Is there an engine powerful
enough to learn from your feedback? Presumably, the
existing translation software is quite complex, because it
does make a faint stab at context. |
|
|
So, how does the software extract information from your
translation which it can then generalize to use in other
translations? |
|
|
It doesn't really need to extract information as such, it just has to compare one piece of text with several others and score them on similarity. The machine isn't really learning anyway, it's just breeding a number of mutant offspring, getting them to make the translation, and killing off all but the one that scores highest on similarity to the human-edited text. It's a blind watchmaker. |
|
|
My main concern is that code being different to living organisms, mutations might always regress. |
|
|
Actually, couldn't you do some kind of mashup of
Wikipedia? When i'm stuck for a word, i frequently
look it up there, then go to the corresponding entry
for the language i'm trying to use. Were that
automated, it might work quite well, and the
updating would occur without anyone needing to do
it specifically for the translator. |
|
|
Is that much more useful than a dictionary, which all translators have already? I might try it next time I'm stuck and see if it's better. |
|
|
Ian, what's the connection between your link to WIkipedia's explanation of the "Google bomb" concept and this idea? |
|
|
I think Ian is suggesting that such a machine could be hijacked by a coordinated group of pranksters. Spanish-->English: "me gusta enanos con queso" Google: "george bush is a big fat idiot" |
|
|
Thanks [swimswim] - looks like the google approach is pretty similar: "we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model" |
|
|
[Wagster], one advantage is that rather than just having a single word, you have context, so for example you don't end up translating the Danish "frø" to mean "seed" when you should be using "frog" or vice versa (there is a gender difference there though). I've found it very useful. |
|
|
Right then, from now on I shall be using Google Translate instæd. Or Ian's Hungarian phrasebook... |
|
|
How close is my Wiki Distributed Translation idea to this one? It seems very close to me. |
|
|
Having read all of it, it's fairly similar [bung], except for the Darwinian aspect. I must confess I only read the first half of your idea before posting. |
|
|
Sounds good, except for the fact a large group of
people could screw with the system by submitting false translations back to the engine. |
|
| |