By Nicola Cancedda, Principal Scientist, Xerox Research Centre Europe
I read with interest the post by Kate Dobbertin Bernola on the value of automated translation features on social networks and web browsers. I share her excitement for technology that helps lower language barriers… to the point that I made it my job!
Bing Translator and Google Translate are excellent examples of powerful automated translation tools. Do you know how they work? As a side effect of their web indexing operations (Bing and Google run search engines after all…) they collect large amounts of “parallel data:” text in one language together with its translation in another language. Many websites publish localized content, so depending on your choice, or on the place you connect from, they can display their content in your language. It is relatively easy for Google and Microsoft to detect them and collect this content. They then run sophisticated statistical algorithms and create big tables where sequences of words in one language are associated with sequences in the other. Think of a huge dictionary, where entries are not words but sequences of words, so that you can capture for instance that the word “jobs” has a different translation when it appears as “Steve Jobs,” “print jobs,” or “Government jobs.” When you ask for new text to be translated, they look among all possible ways of combining the matching fragments, using statistics from the stored data as guidance. Everything is wonderfully engineered to give you the translation in a matter of seconds.
Google Translate and Bing Translator don’t know what users will ask to translate. Facebook posts can be about anything, let alone the content of full web pages. These generic systems counter this uncertainty by absorbing translation examples from as many different sources as possible, but this also means that every time they translate they risk misinterpreting the context and choosing the wrong translation among the many available.
But what if you know in advance that what you are translating are, say for example, email messages to the customer care centre at Xerox? Or announcements on Monster.com? Do you still need to consider all those potential translations of “jobs” in the same way as the examples mentioned above? This is precisely what our team specializes in. We develop similar algorithms as those used by Google and Microsoft, but with the objective of deploying automated translation in business processes where you know something about what will be translated. This often results in a level of accuracy that, without ever reaching perfection, is suitable for having a task completed by someone who does not know the language of a triggering message, even in cases where a “generic” tool would not be adequate. We have tools that are compatible with most business environments. For instance, we have a special email server that translates all the email that it receives into the language of the user: you can then use an email tool of choice (e.g. integrated in a Customer Relationship Management tool, if this is what your business process calls for) to visualize the translation. We also have a software tool that sits on your desktop and automatically translates whatever is sent to the clipboard: just Ctrl-C a snippet from any application and you get its translation on screen, and if what you copied is similar in topic and style to what the translation system has been trained on, then you will have high-quality translation.
Look around you, where in your work situations could you use a fast (but maybe imperfect) translation, I would be happy to know!
Agree and support all above especially having the opportunity to test our Xerox MT engine and benchmark it against MS solution sometime ago.
Just wanted to add, that we should always bear in mind that (customer) sensitive content might not always be suited for ‘public domain’ MT services due to confidentiality or other contractual constrains;-)
The data processed by translating it online will stay in MT provider’s corpora opened for everyone.