Skip to main content

Machine Translation: An overview

Machine Translation is the oldest computer based technology for man kind. It is defined as an automatic translation of source language sentence in to the target language sentence using the computer programs. The source and target language can be any natural language like English, Hindi, Bengali, Bodo, French etc.

In computer science there is a field to study all natural language related works called natural language processing. Speech recognition, text to speech generation, speech to text etc all are the applications of natural language processing (NLP). Machine Translation (MT) is one of the application of NLP.

In MT, in order to generate a translation equivalent of a particular source sentence, the machine translation needs to know the various rules, their mappings etc. for both source and target languages. The rules and mapping of the words or phrases can be learned from the pair of sentences manually or automatically collected data. The learning is accomplished during training. In order to make machine learned the various mapping or rules from the parallel data, the translation engine may take various  approaches. The following shows some of the approaches.

1. Direct Machine Translation (DMT)
2. Rule Based Machine Translation (RBMT)
3. Example Based Machine Translation (EBMT)
4. Statistical Machine Translation (SMT)
5. Neural Machine Translation (NMT)
6. Hybrid Machine Translation (HBMT)

Once the translation system is built using any one of the method above, the we can use these engine for translating a source sentence in to target sentence.

For example: in order to translate a source sentence i.e., Bodo sentence आं ओंखाम जायो। in  to English like I eat rice. first we need to learned the possible mappings of all the individual token of source sentence like आं-I ओंखाम-rice जायो-eat from the large parallel corpora or parallel dictionary. Once these mappings are learned for a particular language pair, we can translate any source sentence to target sentence for that particular language pair setting.