Link from Template Translation

Most computerized translators translate a document word for word: they look up the first word in the document with the help of a dictionary file, write the corresponding word in another language, read the second word, and so on, until the entire document is in a different language. This allows the user to get a vague idea of what's going on, but it isn't as good as a human translation.

My idea would be to build a set of templates for a language, as well as a dictionary. When asked to translate a sentence, it would first consult its templates and, by determining the possible types of each word, find the one that fits the sentence best. The templates would be two part; once the blanks in the first part were filled out by the sentence to be translated, the second part would automatically be filled, based on what was in the first. After making changes to word order or the words themselves, the computer would proceed with the word for word translation.

For example, take the French sentence, "Est-ce que vous parlez français?" Translated word for word, this means, "Is it that you speak French?" However, say you had the template "Est-ce que vous *verbPhrase1* ? : Do you *verbPhrase1* ?" The translator would be able to see that "Est-ce que vous" followed by a verb phrase meant "Do you (phrase)?" It could detect that the phrase in the French sentence was probably "parlez français." There might be a second template for verb phrases, "*verb* *verbModifier* : *verb* *verbModifier*" If the words in "parlez français" could be used this way, the translator could be fairly certain that it had found the right template. Because both side of the template are the same, it would leave the word order in the phrase unaltered. Once the computer was sure that it had worked out the phrase, these words would THEN be substituted for their English counterparts and inserted into the English half of the Est-ce template, resulting in the better, "Do you speak French?"

Of course, there would be some problems. To create templates for every possible sentence would be troublesome. You would have to implement some scheme where templates to fit any sentence could be built out of smaller basic templates, which could be built of even smaller ones, and so on. Also, the sentence might be ambiguous, having more than one possible meaning. Sentences might have more than one template that fit it well or words might have more than one substitute in another language. You might be able to get around this with word associations and classes of words; the computer would be taught that certain words would work together better than others. For example, a serial port carries data, but sequential harbors usually do not. In this case, you would have to associate serial ports with words expressing data transfer with computer hardware, so that when the computer read that a port was sending or receiving data to or from a peripheral, it would understand that the type of port you find on the back of a computer is the one most likely meant. The computer would also have to learn "serial port" as a single noun, not just as an adjective and noun.

You could make the project files open source, so that if anyone noticed a mistranslation, they could write a few templates to fix the problem and update the translation resources. Although the translations would probably always be below anything a human could do, with enough advancements in language processing, they could be accurate enough for almost any use. (I'm sure the government would still want to double check treaties and such.)