A research project is making headlines for finding that machine translation engines, like Google Translate, produce translations that introduce bias or ruin the source text’s intended meaning. This is a good time to call out a few points and then talk about how Lingotek can help. First, a quick summary of the story.
The Problems of Bias and Meaning in Machine Translation
Researchers delivered a paper at a recent conference workshop on Natural Language Processing (NLP) in Africa. Their research highlights concerns about some of the output created by machine translation (MT) engines, Google’s in particular. These concerns extend to all languages and not just the ones found predominantly in Africa. These concerns focus on bias and meaning in translation. This research finds that MT engines might introduce biased language based on gender, race, or ethnicity where none previously existed. This is not news in the MT world and Google, Microsoft and other smart people are working toward solutions. The researchers also confirmed that MT engines can struggle to keep a text’s meaning in the produced translations. And by “struggle”, we mean “completely reverse the intended meaning”. Unclear writing styles, the use of contronyms, archaic terms, and other sources of ambiguity in the original text force the MT engine to “guess” at meaning. The resulting translations can range from embarrassing to controversial.
The “bad guy” in this story isn’t so much Google as it is the entire field of MT. Google just happens to be the biggest target and, to their credit, they make it easy to do this sort of research. All MT engines have shortcomings in how they handle specific source-to-target language pairings, domain-specific content, or both. And, yes, meaning can suffer when text is processed by an MT engine. Extracting meaning from text is hard enough for humans. That a series of computer algorithms can come even close is a tremendous accomplishment.
Tremendous accomplishment or not, the issues brought up in this paper are very important if you depend on MT for translation. Should you avoid using machine translation because of these issues?
Plan Before Using Machine Translation
I am not going to argue against using MT. Using an inexpensive technology to reach underserved markets and stretch your translation budget is an easy choice to make. I will, however, always recommend that you have a plan when you use MT. Think about your content on a continuum and do some basic classification.
At one end of the continuum is your most important content. A professional translator should always be engaged if there are clear legal, financial, or health and safety implications to a mistranslation. Also, if the source text is complicated, ambiguous, depends on metaphor and cultural references, skip the MT and go for a pro.
On the opposite side of the continuum, there are large volumes of content that are ripe for machine translation. Product documentation, knowledge base articles, community-contributed content are all good candidates. A minor mistranslation isn’t going to ruin anyone’s day.
Somewhere in the middle, there’s content that’s important to translate well, but you have trouble justifying the budget for full professional translation. Consider having a professional post-edit the MT output. It can sometimes be a great way to get decent quality at a lower cost.
Let Technology Help
Once you have your content plan, you will need to execute on it. Fortunately, a number of content platforms have connectors into Lingotek’s Translation Management System (TMS) and use it to drive quality throughout the localization process. The TMS can automate the process of getting your content to the right MT engine, professional translator, reviewer, or mix of all three. The Lingotek TMS already has direct support for many of the best-known MT engines. For more specialized MT scenarios, we now integrate with Intento Hub so you can centralize the choice of MT engine and get the best translation from more than 30 different MT engines.
There’s one more thing to consider. Most MT engines provide a choice of usage models – one that is free with basic features, or one that is tied to a paid subscription. The paid versions will have additional features and, for some engines, will let you train an instance of the engine with your domain-specific content. This should lead to better translation quality than you get from the standard offering. That is the case with Google’s Translate and AutoML products, Microsoft Translator, DeepL Pro, Amazon Translate, and pretty much any other MT engine provider with an actual business model.
No existing machine translation engine is a complete replacement for professional translators. Language is infinitely malleable and constantly changing. Usage patterns and cultural norms evolve, new words are born, and new meanings attach to existing words. That matters when translated content needs to have the same meaning and personal impact as the original text. You can still get useful results from a machine translation engine, though. Plan carefully, leverage any customization features you can, and keep your expectations reasonable. Use professional translators with the content that matters most. And, of course, take advantage of a good translation management system to automate content processing, quality assurance, and take control of the entire localization process.