The perfect translation system, be it a human or machine, does not exist. However, the dream of something like the Babblefish from the Hitchhiker’s series or the universal translator on Star Trek haunts us and might go something like this.
Your personal computer will have a translation module, maintained from some central database created by the publisher of the system. When email comes in, it will automatically and almost instantly be translated into whatever language you desire (presumably your native tongue). When you send email, it will be translated into whatever language you choose. You will be able to configure it so that when email goes out to Japan, it is translated into Japanese, when it goes to France, it is translated into French, and so on (or you can configure on a person by person basis, giving consideration to the linguistic skills of individuals).
Similar systems will exist for businesses, but they will be faster and more comprehensive. A book will be scanned into a computer and rendered into another language in a matter of minutes. The computer might even attend to the graphics and desktop publishing tasks, assuming you want it to. The finished translation will need the same amount of editing and proofreading that any piece of writing does, that is to say a lot.
Interpretation will work the same way. Your phone company will provide for virtually nothing a system which lets you talk to anyone in any language. You call Japan and speak to Mr. Tashima. You say what you want in English and he hears it in Japanese. He says what he wants in Japanese and you hear it in English. Court, medical, and conference interpreting will work in basically the same way. People will have small devices like hearing aids which will pick up the incoming language and convert it into your native tongue. These devices will also use noise cancellation technology to take care of any interfering sounds so that you hear only the interpretation.
A box on your television, or perhaps inside it, will provide instant interpretation or subtitles of foreign films and television broadcasts. You will flip to one of the more than 500 channels you have and see a program which looks interesting, and the system will provide instant interpretation of the dialog.
Furthermore, small devices the size of a pocket calculator will read things for you. You point them at a menu, a street sign, or a newspaper and they scan the page and they translate it and then give you either a printed version on a small screen or read it to you.
Such technology would make communication with anyone anywhere possible. You could travel in remote parts of Tibet and speak and read with the locals. You will walk into a conference and listen to an interpretation of the speaker given by a machine which never tires or loses interest in the task. You can go to a doctor or hotel or restaurant anywhere and communicate everything you need to, be it verbally or in writing.
Can It Be Done?
This is really two questions. One: Is machine translation possible in theory? Two: What will machine translation be like in practice within the next ten to twenty years? The former question seems not to be asked much, if at all, except in certain research laboratories. The latter question seems very much on the minds of translators and others in the translation industry, if only because of the profound financial impact the answer to the question will have.
The first question, whether or not machine translation is possible in principle, might seem impossible to answer. Or perhaps you think that the answer has to be assumed as negative until proven otherwise, in other words, it ain’t possible until someone does it. But given that machine translation, unlike breaking the four-minute mile, will involve hundreds or thousands of people working for years or perhaps even decades and spending billions, possibly trillions of dollars in their effort, a little theory seems like a good idea.
The arguments against machine translation being possible seem to run something like this. Language is too subtle and complex for a computer to understand and translate. There are just too many variables to consider in any given sentence. Linguistic communication relies too heavily on context and intonation, on body language and cultural underpinnings, to be handled by a computer. Computers will never be fast enough or powerful enough to deal with the immense requirements of language translation. Computers will have to understand what they read in order to translate, and therefore will have to be sentient themselves, in some fashion similar to what we humans experience as self-awareness. And perhaps the most fundamental argument against machine translation lies in the question of whether or not the human brain is capable of actions and behaviors that cannot be reduced to algorithms.
Fair enough, all good arguments. But the argument for machine translation being possible in theory is sufficiently powerful and compelling to obviate all the above arguments against it. In simple terms, the argument for machine translation goes like this: “If that three-pound piece of meat in your head can do it, why not a hunk of technology?” In essence, the proof for machine translation being possible in principle is sitting in every translator’s head. That three-pound pulpy grayish mass that we call the brain allows a translator to translate. A brain is an organic machine consisting of roughly one-hundred billion cells, neurons and glial cells, each with a multitude of connections to other neurons, communicating chemically with each other through synapses whose activities are modulated by neurotransmitters. Regardless of how little is actually understood about the brain, and regardless of the obvious deficiencies of my description of it above, the brain remains a finite object capable of only a finite number of actions. As such, the brain can be considered a machine, or if you prefer a less mechanistic metaphor, a piece of organic technology, which can in principle be understood and reproduced. And so a computer that translates as well as a human translator is in principle possible.
But So What?
What does the argument above really imply for the future? In other words, just because something is possible in principle doesn’t mean we’ll be able to do it in practice, at least not in the near term. Or maybe we will.
First I want to dispense with a few preconceptions and protests that are probably percolating in your mind. One, computers are plenty fast nowadays. I don’t mean the little box sitting on your desk or lap, which is in and of itself powerful in many ways but equally limited. I mean the chips that are currently on the drawing boards for the next generation of supercomputers. If Moore’s Law holds for even fifteen more years (note: Moore’s Law refers to the trend of doubling the computational capacity of chips every eighteen months), and as a technical translator who does a lot of work in computer science and electrical engineering I can say with some confidence that the research community believes it will, then we will have a computer chip whose speed and capacity is functionally equivalent to the human brain by 2025 at the latest. Similarly, the cost and performance of various types of memory are expanding far faster than most home users can find uses for, though web servers rapidly eat up even terabytes of data. Finally, the kind of parallel processing that gives supercomputers much of their power is becoming more and more common at the consumer level, so even if Moore’s Law places an upper limit on the performance of an individual chip, a group of chips tied together, making full use of terabytes of RAM and other high-speed memory arrays, should easily equal the raw power of the human brain within fifteen years.
Enough of the technical stuff. That’s not, you might say, where the problems really lie. They reside instead in the nature of language, in the intricacies and subtleties of written and oral communication, in the nuances of a person’s voice or the subtext in a well-written paragraph. Accurate enough, to varying degrees, but rarely relevant to the vast majority of what is being translated in the world these days.
Most of what is translated in our industry is not high literature destined to be awarded Nobel or Pulitzer prizes. Rather the majority of material that translators work on is information, ideas, or beliefs on a particular subject, and most often the material is nothing more than instructions, directions, or explanations, with a minimum of style of literary content. The material is generally bland and dry, for instance software or hardware manuals, engineering specifications, scientific or other technical research material, financial or corporate reports, fiscal analyses, clinical trial reports, patents, and so forth. Accurately rendering the subtle style of a source text is rarely an issue that translators struggle with, or even discuss much amongst themselves. So if the current human translators don’t have to deal with the subtleties and nuances of well-written literary prose, then neither will the machine systems.
As an aside, let us keep in mind that literary translation is an area of endless debate among literary translators; the sheer number of versions of literary classics amply demonstrates this. That machines may not in the foreseeable future tackle such material is not relevant to this discussion; instead it should be remembered that even humans have difficulty ferreting out the intended meaning in a sentence written by a literary master. What’s more, that meaning will change with both the reader and the times. Literary theory and literary analysis are dedicated to such issues; the fact that these are fertile fields for endless explorations suggests that people aren’t quite sure what to make of fiction like James Joyce’s Ulysses, to pick a particularly intractable text. I am certain that computers will eventually try their electronic hand at rendering the Mahabarat or the writings of Chuang-zu into English, and I look forward to studying the results.
Back to the topic at hand though. What MT systems will work on represents a fairly particular subset of the world’s written output. Not only does written language spare the MT system from having to deal with intonation or body language, but the kind of writing commonly translated in the translation industry at present is generally more carefully structured and reasoned, freer from grammatical and syntactic errors, less liable to contain slang, neologisms, or spur-of-the-moment coinages, and more precise in terminology usage than spoken language, even on the same subject, would be.
Finally, the MT system may not even have to understand what it is translating. I say this for two reasons. First, translators occasionally, and almost exclusively amongst themselves, talk about how little they understand of some of the material they work on. They of course can follow the gist and usually much more, but they also know, at least deep down, that they probably do not have the same in-depth understanding that the specialist or expert who wrote the material has. This can occur with material as simple as a business letter, in which the topic of the letter is understood between both parties but not known to the translator, or material as abstruse as an ethical commentary on organ transplantation and brain death.
Second, and most important, computers are more and more often nowadays performing on par with humans in complex tasks. The canonical example is chess. You are doubtlessly aware that Deep Blue defeated the Russian Chess Master Kasparov in a recent match. Kasparov felt it would never happen, until it did that is. He even commented after the match that at times there seemed to be an intelligence behind Deep Blue’s decisions, that the computer became more cautious at one point in one game. Of course he, and all observers, know that no such thing happened. And despite the considerable accomplishment that Deep Blue represents in combining dedicated hardware with expert system-style programming, Deep Blue is neither conscious nor intelligent in the human sense of those words. To put it another way, after the match, Kasparov made many insightful and thoughtful remarks when asked about his experience. In contrast, if anyone bothered to ask Deep Blue a question, I’m certain the remark was silence. And it is more than doubtful that Deep Blue has any particular plans for its prize money, or any desire one way or another to play chess again.
The point is that tasks which require considerable intellectual achievement for humans can be performed using different methods by computers. Whether or not translation is one such task remains to be seen. In other words, do we need to create a sentient, intelligent computer, then teach it to translate and hope after its training it wants to translate, or can we build a sophisticated expert system, a Blue Linguist if you will, that translates as well as a human does, despite using completely different internal methods? This question will be answered in part in the various R&D labs around the world working on MT. And it will be answered in part by the market.
In other words, if the translation is good enough, translation consumers will not care who or what translated it using which method. So the real question for MT in essence becomes: what is good enough?
Good enough means acceptable to those who want the translation. Consider this: a company wants all the specifications for an automobile translated from English into French, Spanish, German, Italian, Dutch, Portuguese, Chinese, and Japanese. The specifications total over 5,000 pages, approximately 1 million words. Assume that a translator can do 5,000 words per day (I realize this is high, but assume it anyway). It will therefore take 200 days of work to produce the translation. A team of ten translators will still take 20 days, plus the time to unify the text after the translators are finished. At $0.25 per word (what the agency might charge the automobile company), the total cost per language would be $250,000. And these numbers are for each language involved. Therefore, if a machine system can translate the information at 20,000 words per hour, we see that the job might be done in a little over two days, plus clean-up time. And the computer plus software will cost considerably less, maybe $3,000 for the computer and $4,000 for the software for each language pair.
But, you say, the translation won’t be as good. I agree, at least based on current software and technology. However, let us recall that quality is only one of many factors in a market economy, and the most important factor is embodied that old epigram: time is money. Recall that this statement really means that speed is money. The faster the better. The sooner the product hits the market, the sooner the company recoups its investment. The lower the investment, the better.
So we have a case of the classic cost-benefit ratio. Therefore, the real question is: at what point does the quality of a translation become more important than the cost or time involved? If the machines are 200 times faster, 1000 times cheaper, and produce reasonably accurate and intelligible translations, they will get most of the work. And although they have not reached this state yet, it seems clear, given current technology and progress, that the time is not too far off when they may just well be there, at least for certain categories of translation.
For an excellent study of the cost/benefit ratio of current MT and MAT systems, I strongly recommend Lynn Webb’s Master’s Thesis on the subject, available at www.webbsnet.com. I hope Lynn will be able to keep her research current as the technologies she evaluated develop.
Some people claim rather strangely that machine translation is possible, but machine interpretation is not. I disagree. Interpretation deals with the spoken language, a fundamentally simpler form of language than the written language. There are three issues that will tax MI systems: non-verbal communication that accompanies speech, voice processing and synthesis, and the general sloppiness of spoken language.
(Please note that although speech-to-speech MT is a common way to refer to machine-driven interpretation systems, I prefer MI not only because it is a more compact term, but also because it serves to remind us of the important linguistic distinctions between translation and interpretation.)
The first issue will not be as important as many people might think. A speaker at a large conference, for instance, does not rely much on body language to communicate, simply because most viewers are not close enough to benefit from it. In fact, many speakers at conferences are really just reading prepared speeches, changing the issue from machine interpretation to machine translation (of course, the machine has to be aware of deviations from the prepared text, just as a human interpreter does). Witnesses in court are trained by lawyers to avoid body language, so that the jurors will pay attention to the words only. And when body language is important, humans have a great deal of trouble, given how varied and complex each person’s use of such non-verbal communication is. So the computers will have the same problems the humans do.
The second challenge is being met as I write this. We’ve all seen and heard about voice input software such as Dragon Systems’ Naturally Speaking or IBM’s Via Voice. Both work reasonably well without taxing a mainstream home or business system. It is not difficult to imagine such software becoming virtually 100% accurate (or at least as accurate as a human listener, perhaps more so) within a few more generations of the software. The same holds for speech synthesis. I’ve been listening to my Macintosh for years now, having it read material I have written to me so that I can edit by listening to a disinterested reader (and trust me, the computer is completely neutral). The available voices are admittedly obviously synthetic and frequently tinny or disturbingly neutral, but they are improving. An acceptable synthesized voice seems likely within a few years. If you want a sample of the improvements in this area, listen to the Web newscaster Ananova (www.ananova.com). This virtual woman reads the day’s news headlines in a generally acceptable voice, though at times pronunciation does sound decidedly computer-like.
The third problem, the general sloppiness and imprecision of human speech, will be a challenge only insofar as the computers are not as accurate as people are. When queried about the meaning of an ambiguous or obscure statement, most people will admit that they hadn’t thought much about it, but now that they do, they realize they can’t be certain as to the intended meaning. How exactly MI systems will address such challenges, perhaps by reproducing the ambiguities, querying the speaker (if possible, and note that when querying is possible, that is what human interpreters do), or simply paraphrasing the statement based on a best-effort guess, remains to be seen. I suspect though that MI systems will in time become sufficiently accurate to be practical.
There is a final problem, one not often discussed when MT, particularly MI, is mentioned. This is the psychological element. Even if we have a lab-tested, government-approved, U.N.-certified MI system, it may still not be adopted for quite some time. People may simply not accept it. I’ve seen Japanese people struggle with the idea that I can speak the language fluently, and some I knew during my years in Japan never quite accepted it. Given that kind of attitude, and it is prevalent among many languages and cultures in the world, machine interpretation systems may not be warmly greeted, at least not initially. So their first appearances may be in situations in which we the users will not realize machines are doing the work instead of humans, such as in telephone communications when making airline or hotel reservations or getting technical support for software, or perhaps for international operator assistance. Eventually such systems will be accepted, I think, if only because people ultimately accept anything that makes life easier.
The State of the Art
So, you say, this is all well and good, but none of it is going to happen for a long time. Perhaps not even for centuries. We’ll all be long dead, or at least retired, before a computer can do anything useful with language or in translation. Maybe, but a review of where the MT/MAT industry is now seems in order.
The pace of change in computing is enough to give a seasoned funambulist vertigo. The original PCs, including the TRS-80 (with 4K of memory, no hard drive, floppy drive, and no operating system per se), the Commodore 64, the Apple II, etc. were less powerful than the current average Casio BOSS or Sharp Wizard, to say nothing of the current 3M PalmPilots, which effectively represent more computing power than Apollo 11 had at its disposal. The first PCs, the 8086 and then the 286, introduced in the early 1980s were brain-dead machines even back then. For the past eight years, we’ve seen CPU processing speed double every 18 months as per Moore’s Law, hard disk storage space double every two years, and the arrival of peripherals such as CD-ROM drives, DVD drives, scanners, and laser printers which only ten years ago or so were either dreams or ghastly expensive technologies.
The processing power and storage capacity to handle incredibly large and complex tasks is available, or will be soon. This means that the brute-force approach becomes more and more viable as an approach to problems that at present resist elegant computational solutions. Brute force more than anything else let Deep Blue defeat Kasparov, and though chess is hardly as complex as language, it suggests that what seemed for centuries to represent a pinnacle of human intellectual achievement can be performed without an iota of thought as we know it, just virtually inconceivable amounts of raw processing power.
In addition, I think we forget the extent to which human-like computing has already started to enter our lives. We now have voice-driven phone systems in which you state your preferred selection aloud and the system processes it. Admittedly these systems are crude and nowhere remotely near providing real-time online translation, but they indicate that what once seemed to be an insurmountable problem, that of voice recognition and synthesis, is falling to the wayside.
Similarly, optical character recognition, the solution to getting texts into computers, is now extremely fast and accurate. What’s more, you can buy a little pen dictionary that has a built-in scanning head at its tip. Run it over a word you need to look up, and the dictionary will then display the definition on a small LCD screen built into the shaft of the pen. Again, very limited compared to the demands of true MT, but suggestive nonetheless.
Current MT products, including PowerTranslator, Transcend, Logos, and others, have a limited capacity to provide useful translations. Although some translators disparage these products’ output as nothing more than word salad, in many cases the results are useable, if inelegant. For informational purposes, however, the results may be satisfactory to some people. Moreover, if the text to be translated is limited in terms of style, usage, and terminology, and is put through a preparatory editing process, then the results may be sufficiently good that with some, or arguably considerable, post-editing, the final translation could be printed and distributed with no fear of rejection.
Regardless of the limited scope of application for current MT software, such technology is slowly improving and will eventually, I think, be capable of providing usable translations for general consumption. Long before that happens though Machine-Assisted Translation (MAT) technology will revolutionize the translation industry.
Currently MAT is in its early childhood. The most sophisticated systems are still little more than elaborate databases with version control features for preparing and monitoring document translation, terminology and glossary management functions, and some fuzzy logic for finding good matches for text that has not actually been translated yet.
Future systems, as described in recent magazines such as Language International and Multiling will offer far more. Not only will they come with vast pools of sample translations mined from the terabytes of such material already available and extensive terminology and glossary listings, but they will also offer intelligent matching of untranslated text that far outperforms today’s best “fuzzy” guesses, real-time collaboration between non-local sites via the Internet, constant and automatic updating of sample translations and word lists via bot searches of the Web, and so forth.
The future translator will not sit at a desk with a printed copy of a text to one side of the keyboard and some dictionaries or other resources to the other. In fact many translators already work primarily if not exclusively with electronic source material and use at least some Web-based resources for terminology research. Instead future translators will likely have a live link to their client’s web site, working directly in real time with the other translators and project manager involved in the project. They will prepare the source material for “translation” by the MAT system, then monitor the output and work on the parts that the system cannot handle. They will also perform considerable editing, proof-reading, and QA work, along with developing and maintaining glossaries, sample translation databases, and other necessary resources for the MAT system.
This paradigm shift is already underway, with products like Trados’ Workbench, IBM’s Translation Manager II, Corel Catalyst, and Atril Software’s Déjà Vu leading the way. Other products are more focused on localization, while still others, such as Logos, offer a hybrid system that exists somewhere between true MT and MAT, depending, perhaps, on who you ask and what you want to do with it. The point is that this paradigm shift to MAT is not in the hazy future but is happening now. Languages that use the Roman alphabet and routinely use source material in electronic format are the most amenable to use with this software; languages such as Japanese and Chinese are still largely not available in electronic format, and even when they are, the systems do not handle such two-byte languages particularly reliably, at least not yet.
In other words, if you are a Spanish-English or German-English translator, you are probably already using MAT software, or you will be soon enough. If you are a translator working from Japanese to English, you have a couple of years yet before you have to make the move, though doing so earlier would be wise.
There is, however, a problem. Actually, there are a few problems. The first and most obvious is the cost associated with MAT. Not only is the software itself quite expensive for freelance translators to add to their office arsenal, but also it requires more RAM, more hard disc space, and a large monitor to be used efficiently. In addition, a scanner with good OCR software would also be extremely useful. This whole bundle could run as much as $4000, depending on which combination of hardware and software one opts for. Obviously $4000 is a lot for a freelance translator to invest, particularly since many translation vendors prefer to pay translators who use MAT or MT software less than they otherwise would. In fact, some translators who use MAT go as far as not telling their clients about it so as to avoid the issue of reduced rates when using MAT. In sum, there are considerable costs for a freelancer who uses MAT, and how the market will treat such freelancers remains undecided in places.
Second, and perhaps less obvious, is the question of ownership of material. Translators are independent contractors who translate on a work-for-hire basis. They do not own what they produce. If a translator creates a glossary or terminology list in an MAT package while doing a translation for a client, who owns that list? If the translator cannot recycle or reuse such lists, much of the value of MAT will be lost. The same can be said for the organizations that want the translations done, too. Moreover, how would a translation vendor know if I were reusing a terminology list that I created while working for them? And should they care? Such problems are common with Internet and computer technologies. Just consider the issues surrounding MP3 if you are uncertain as to the arguments on both sides. I would like to see a cooperative arrangement exist, one in which translators can continue to build and extend their libraries of terminology and translation samples, and perhaps even, when not legally inappropriate, share material with each other. The same, I believe, should hold for translation vendors. The more good resources we all have, the better our translations will become, and the more quickly we can do them. That is after all the point of MAT.
The third and final problem is translators themselves. Many translators seem resistant to MAT because of the paradigm outlined above. They see translation as a highly intellectual process, one which involves careful analysis of the source text, meticulous research in “quaint and curious volumes of forgotten lore”, and then creative writing to formulate a target text that balances form and function. MAT takes much of this away, they believe. It is too automated, too computerized, too, well, you get the idea. I don’t consider these translators to be Luddites, resisting to the last a change that is inevitable and beneficial. What I think they are resisting, and I share in their resistance, is a tendency in the translation industry, and in localization in particular, to put speed above everything else. Translators thrive on the challenge of creating a high-quality translation; MAT is perceived by many as a way to crank out in very short times a translation of at best marginal quality. “Good enough so that we don’t get sued” is how one localization manager put it to me one day. Whether or not these attitudes are justified or reasonable is a matter of endless debate; but the fact remains that many translators are not rushing to embrace these technologies, use them only grudgingly, and in some cases are leaving the translation profession. I hope that translators will give the technology a chance to mature, to be better understood and appreciated, and to be more widely used in the industry before they reject it. MAT is here to stay; it has its place; it has the potential to let translators do what they do best. Conversely though, employers of translators, localization firms in particular, should take the time to train translators to these systems, to transition not overnight but a bit more gradually to this new paradigm, and to let translators actually translate. Unhappy translators rapidly become ex-translators, and the supply of good translators is small enough that no one should do anything to reduce it.
In 1992 I bet a friend that within 15 years, computer translation systems would take over the industry, leaving very little work for humans, who will maintain and operate the systems and edit their translations. As of this writing (spring, 2000), I am prepared to say that I have lost this bet. My earlier estimations about when and how machine translation would evolve are clearly incorrect, so I concede.
But let’s take a look at what has happened in the past five years, the time from when I first wrote about that bet until now. The first desktop supercomputer, the Apple Macintosh G4, has arrived, with Intel’s chip line only slightly behind. Voice synthesis is now available as a part of the Mac OS, and though the voices are lackluster, they are usable. Voice-input systems, such as IBM’s ViaVoice and Dragon Systems Naturally Speaking series, are now available for a couple of hundred dollars or less and offer accuracy rates approaching 98%. And machine-assisted translation software (MAT) and terminology management software are becoming more prevalent and useful.
Ultimately I believe true MT is inevitable, though how or when it will arise I no longer care to predict. As Neils Bohr said: prediction is difficult, especially about the future.
For me the real question is how will a machine translation system be created. There are two major avenues of research: One, create a conscious computer which can understand and manipulate language essentially as a human would, but do so much more quickly and accurately. This seems extremely difficult for the near term, as there is as yet no good definition of consciousness itself, and what relationship language and consciousness have remains to be clarified. There are also obvious logistical and ethical issues involved, such as what to do if the sentient computer isn’t in the mood to translate (can you threaten to pull its plug?), or how to educate such a computer to be a good translator (how to accomplish that with humans is still a subject of some debate).
The other major avenue is to create a system which produces a good translation using different methods from how the human brain does it (however that may be). This is the approach used by all current machine translation systems. Progress thus far is better measured not by how far the systems have come, but by how far they still have to go. Perhaps IBM is working on a successor to Deep Blue. IBM might name it the Blue Linguist and have teams of researchers creating specially-designed language chips, circuit boards, databases, and so forth. And perhaps there will be a contest every year in which the Blue Linguist and five expert human translators all work on the same documents, with a panel of judges trying to identify the Blue Linguist’s work from among the group of six translations.
The point is that the results of the MT, or for that matter the MAT, system matter, not the method used to produce them. The translation industry is always ready to adopt any technology or methodology that improves translation quality and speed while reducing costs. So translators, whether or not they like it, will have to use MAT software. And true MT is coming, and translators should keep track of the progress in this area.