Sunday, October 08, 2006

Machine Translation and Multilingual Search

If Google vertical markets director Jeff Levick isn’t exactly sounding a clarion call for translation technology, he should be. From Andy Atkins-Krüger of
Jeff pointed to the fact that there are twice as many Chinese speakers in the world as English – not all the world speaks English. And there are great information resources in Chinese and Arabic, he said.
Recognizing the need for increased development of translation tools is a far cry from indicating the direction Google’s research and development team is going to take as they augment their current search tools. It’s quite that the demand for multilingual search is going to be an engine for translation research – one which Google in particular is well-placed to direct, and to benefit from.

Let’s imagine that I’m an Internet user in need of information about eels. It could happen. I’m a practiced searchologist, so I vary my queries: ‘edible fish’, ‘conger’, etc. But if I don’t know the Portuguese word for eels, no variety of English-language concepts and synonyms is going to call up all the Portuguese –language content pertaining to the prominence of that slimy fish in that nation’s cuisine. Imagine if my English-language search query could generate results drawn from all online content, without regard to the language of composition. Since a page-indexing bot doesn’t know it’s Mandarin from its Catalan, this not a technically difficult feat. Web indexing software is blind to meaning; the search string a user inputs is utterly meaningless to the search engine, since search is a matching operation and not a comprehending one. The first steps might be the compilation of cross-referenced databases in which potential search terms in the input language are related to corresponding terms in other languages. Imagine if the search engine algorithm could automatically convert my search string into multiple searches in different languages… simultaneously scouring the Web for references to gia cam, poulet, ayam muda, and chicken, returning results posted online in Vietnamese, French, Malay, and English. The search engine operator has greatly expanded the breadth of their search results as well as the size of their potential user population, and the user benefits by gaining access to a wider swatch of the world’s available information.

Of course, a search result in Vietnamese might not be immediately helpful for the English-speaking user. Hence the role multilingual search plays in instigating development of machine translation tools. If my search for chicken returns a Vietnamese-language page as the top result, I know that the information is desirable… that’s the job of the search engine itself, to return the most relevant results. But I can’t read it. If the search engine operator wants to attract and keep users, it’s going to be interested in adding a machine translation function to its page results. Google currently offers this option, but there is tremendous room for improvement.

John Dvorak is equally dissatisfied with current online translation services. Writing for PC Magazine:
…if computers can play a credible world-class game of chess, then they should be able to translate complex sentences written in the world’s major languages. … Exactly what’s the hang-up? … The computer revolution began a half-century ago. We should have been able to solve this problem by now.
The rules of language translation are orders of magnitude more complex than chess; it’s rather misleading to point to the success of chess-playing programs as evidence that “private industry can’t seem to manage” the resolve needed to provide reliable machine translation. Setting that small confusion aside, it’s easy to sympathize with Dvorak’s complaint that research toward improved machine translation has been under-funded. Not Systran, or Babelfish, or WorldLingo is yet capable of producing reliable idiomatic translation. It’s a difficult problem, and progress is being made all the time. But maybe we should look for a silver lining in the otherwise inconvenient lag between demand for and delivery of machine translation. The persistence of problems in even the best machine translators keeps the pressure on for foreign language study… users still can’t relax, and need to have a sufficient competence in the source language to be able to vet their translation results. This is a point lost on Dvorak, whose French “has been in decline since 1973.” While I’m personally eager for a time when I’ll be more easily able to browse foreign language content online, I can’t rally behind a critic who couldn’t be bothered to keep his own skills up to par. Let’s not be so complacent as to advocate machine translation without also pushing the case for multilingualism. Damn monoglots.

A quick recap: since an increasing portion of information online is in a language other than that of the user, multilingual capacities in both search and machine translation are increasingly in demand. Search is uncomprehending, and therefore the easier problem to tackle, but the better multilingual search becomes, the greater the demand for machine translation. The two functions are inextricably linked, and interdependent. Google is well-placed to capitalize on the market potential of superior translation services… how exciting if its corporate leadership were to recognize that translation is a natural companion to its book Library, Database, Scholar, and Search initiatives! To name just one benefit, the working literary translator would have an immensely easier time of identifying resources for his or her work; just as the eel enthusiast would finally be immediately directed to webpages of eel aficionados based in Portugal. Sunny days ahead.

No comments: