Computer-assisted translation, computer-aided translation or CAT is a form of language translation in which a human translator uses computer hardware to support and facilitate the translation process.
Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation (not to be confused with machine translation)
Overview
The automatic machine translation systems available today are not able to produce high-quality translations unaided: their output must be edited by a human to correct errors and improve the quality of translation. Computer-assisted translation (CAT) incorporates that manual editing stage into the software, making translation an interactive process between human and computer.
Some advanced computer-assisted translation solutions include controlled machine translation (MT). Higher priced MT modules generally provide a more complex set of tools available to the translator, which may include terminology management features and various other linguistic tools and utilities. Carefully customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, aim at increasing the efficiency of the entire translation process.
Range of tools
Computer-assisted translation is a broad and imprecise term covering a range of tools, from the fairly simple to the complicated. These can include:
- Translation memory tools (TM tools), consisting of a database of text segments in a source language and their translations in one or more target languages.
- Spell checkers, either built into word processing software, or add-on programs
- Grammar checkers, again either built into word processing software, or add-on programs
- Terminology managers, which allow translators to manage their own terminology bank in an electronic form. This can range from a simple table created in the translator's word processing software or spreadsheet, a database created in a program such as FileMaker Pro or, for more robust (and more expensive) solutions, specialized software packages such as SDL MultiTerm, LogiTerm, Termex, TermWeb, etc.
- Electronic dictionaries, either unilingual or bilingual, also known as dictorobotary
- Terminology databases, either on the host computer or accessible through the Internet, such as TERMIUM Plus or Grand dictionnaire terminologique from the Office québécois de la langue française
- Full-text search tools (or indexers), which allow the user to query already translated texts or reference documents of various kinds. Some such indexers are ISYS Search Software, dtSearch Desktop and Naturel
- Concordancers, which are programs that retrieve instances of a word or an expression and their respective context in a monolingual, bilingual or multilingual corpus, such as a bitext or a translation memory
- Bitext aligners: tools that align a source text and its translation which can then be analyzed using a full-text search tool or a concordancer
- Project management software that allows linguists to structure complex translation projects in a form of chain of tasks (often called "workflow"), assign the various tasks to different people, and track the progress of each of these tasks
Concepts
Translation memory software
Translation memory programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts.
Such programs split the source text into manageable units known as "segments". A source-text sentence or sentence-like unit (headings, titles or elements in a list) may be considered a segment, or texts may be segmented into larger units such as paragraphs or small ones, such as clauses. As the translator works through a document, the software displays each source segment in turn and provides a previous translation for re-use, if the program finds a matching source segment in its database. If it does not, the program allows the translator to enter a translation for the new segment. After the translation for a segment is completed, the program stores the new translation and moves on to the next segment. In the dominant paradigm, the translation memory, in principle, is a simple database of fields containing the source language segment, the translation of the segment, and other information such as segment creation date, last access, translator name, and so on. Another translation memory approach does not involve the creation of a database, relying on aligned reference documents instead.
Some translation memory programs function as standalone environments, while others function as an add-on or macro to commercially available word-processing or other business software programs. Add-on programs allow source documents from other formats, such as desktop publishing files, spreadsheets, or HTML code, to be handled using the TM program.
Language search-engine software
New to the translation industry, Language search-engine software is typically an Internet-based system that works similarly to Internet search engines. Rather than searching the Internet, however, a language search engine searches a large repository of Translation Memories to find previously translated sentence fragments, phrases, whole sentences, even complete paragraphs that match source document segments.
Language search engines are designed to leverage modern search technology to conduct searches based on the source words in context to ensure that the search results match the meaning of the source segments. Like traditional TM tools, the value of a language search engine rests heavily on the Translation Memory repository it searches against.
Terminology management software
Terminology management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hot keys to view the entry in the terminology database. Some programs have other hotkey combinations allowing the translator to add new terminology pairs to the terminology database on the fly during translation. Some of the more advanced systems enable translators to check, either interactively or in batch mode, if the correct source/target term combination has been used within and across the translation memory segments in a given project. Independent terminology management systems also exist that can provide workflow functionality, visual taxonomy, work as a type of term checker (similar to spell checker, terms that have not been used correctly are flagged) and can support other types of multilingual term facet classifications such as pictures, videos, or sound.
Alignment software
Alignment programs take completed translations, divide both source and target texts into segments, and attempt to determine which segments belong together in order to build a translation memory or other reference resource with the content. Many alignment programs allow translators to manually realign mismatched segments. The resulting bitext (also known as parallel text) alignment can then be imported into a translation memory program for future translations or used as a reference document.
Interactive machine translation
Interactive machine translation is a paradigm in which the automatic system attempts to predict the translation the human translator is going to produce by suggesting translation hypotheses. These hypotheses may either be the complete sentence, or the part of the sentence that is yet to be translated.
See also
- Comparison of computer-assisted translation tools
- Computational linguistics
- Computer-assisted reviewing
- Fuzzy matching
- Translation
- Computer-assisted interpreting
References
External links
- Computer Aided Translation at DMOZ
- Machine Translation and Computer-Assisted Translation:a New Way of Translating?
- Computer Aided Human Translation
- Medical Computer Assisted Coding
- CAT tool glossary with over 150 concepts explained