Survey on Machine Translation systems for Ancient Indian L anguages

— Sanskrit is a less ambiguous, language suitable for natural language processing. Most of the ancient In-dian books were written Sanskrit. This paper is a survey done on different Sanskrit involved machine translation systems.


INTRODUCTION
Sanskrit is a less ambiguous language. As its less ambiguous in nature it is more suitable for natural language processing. [1] Sanskrit is a free word order language. Sanskrit, considered as the mother of most of all languages, possesses a rich grammar which was developed by Panini around 3000 years ago and it includes 3,959 rules. NASA, the most advanced research center in the world has discovered that Sanskrit is the less ambiguous spoken language on the planet. There is saying that Sanskrit is the best suitable language for computers. Due to the unambiguous nature of the language Sanskrit is the simplest language that is most suited for Artificial Intelligence and Natural Language Processors.
Machine translation (MT) is the process of converting one natural language to another using application software. Mainly there are three types of rule based machine translation techniques-direct approach, transfer based approach and interlingua based approach. Most of the translators developed were concern about word translation, bilingual dictionaries based on direct translation.

II. MACHINE TRANSLATION SYSTEMS
Machine translation (MT) is the process of converting sentences in one natural language called source language to another called destination language. One of the the major classification of machine translation approach include Rule based machine translation, Statistical, Example-based, Hybrid machine translation and Neural machine translation. In rule based approach large set of rules are manually developed and apply these rules to map structures from source to target language Direct translation, transfer based and interlingua based approaches are the major rule based machine translation techniques.

A. Direct Translation
Direct translation is the simplest form translation in which words in the source sentence are directly converted into a destination language .In this translation is done with the help of a bilingual dictionary. Word by word translation is performed here. Anusaaraka is an example of direct taranslation based well known machine translation system.

B. Transfer based Translation
A database of translation rules is used to translate a text in source language to target language. In this approach whenever a sentence is matched to any one of the rules present in the database its directly translated using a dictionary. The dictionary is such as source language(SL) dictionary, target language(TL) dictionary, and a bilingual dictionary. There are mainly two steps in this approach, syntactic transfer and semantic transfer.
In syntactic transfer the SL sentence is analysed to generate asyntactic structure called parse tree and this parse tree of SL is then transfers to TL parse tree. At semantic transfer analyse a SL input to a language specific semantic representation and transfer this to TL semantic representation. Case frames and logical forms are the two constructs used for semantic representation. Finally, these representations are used generate syntactic structure and then surface sentence in the TL.

C. Interlingua based Translation
In Interlingua based approach a language independent frame work is developed for translation of source language to destination language. [2] The interlingua approach has a number of advantages. It requires fewer components for the translation of the source language to each target language, and to add a new language. It allows both the analyzers and generators to be written by monolingual system developers. Also, it can handle languages that are different from each other.
DeryleW. Lonsdale, Alexander M. Franz, and John R. R. Leavitt presented the design and development of an interlingua for a large-scale MT project, 1SL-nTL. They also discussed how the resulting Knowledge-based, Accurate Natural-Language Translation (KANT) interlingua handles complexity,and development of different stages efficiently. It is developed in a balanced fashion with maximal coverage. They use, a recursive list-based structural representation of source sentences in this approach. An interlingua frame consists of a head concept, feature-value pairs, and semantic slots. It may contain nested interlingua frames. The source language expressions and semantic units from the domain were considered for the concept generation. The overall format is modeled using frame-based structures. The f-structure reflects deep semantic relationships between major constituents. [8] The Interlingua approach is based on the concept that MT must go beyond purely linguistic information, syntax and semantics, and should understand the content of texts. Interlingua based translation is divided into two monolingual components: analyzing the source language text into an abstract universal language-independent representation of meaning, the interlingua, and generating this meaning using the lexical units and the syntactic constructions of the target language.  .Vaquous triangle for machine translation III RELATED WORKS A detailed study on machine translation system on Sanskrit, Interlingua based machine translation system and Paninian framework for translation were done in developing the proposed system. Akshar Bharathi et.al. provided details of the Paninian framework [1], Parsing Free Word Order Languages in the Paninian Framework [2], and Karaka analysis [3]. He also explains the use of lexical functional grammar (LFG) in unification for specifying mapping to grammatical relations [4]. The parsing of Sanskrit sentences using LFG is explained by Mrs. Namrata Tapaswi et.al. [5]. Paul Kiparsky gives detailed description of different levels of Paninian framework with examples and rules of Ashtadhyayi and rule formation on different levels of Paninian framework. [6] Sudhir Kumar Mishra et.al. [7] gives a detailed study on the Karaka analysis system based on rules of Ashtadhyayi with examples.
Sameh AlAnsary et.al. briefly reviews three of the most renowned interlingua-based machine translation projects, Distributed Language Translation (DLT), UNIversal TRANslator(UNITRAN) and KANT system. DLT, a research project developed in Utrecht, The Netherlands, is an interactive system developed to operate over computer networks. Translation is distributed between two independent terminals; one for the analysis and another for generation.UNITRAN is a translation system developed at Massachusetts Institute of Technology. The system operates bidirectionally between Spanish and English. KANT system has been developed at Carnegie-Melon University (CMU) in Pennsylvania, USA in 1989". KANT is the only interlingua-based MT system to be operational commercially. It has been used in translating English technical documents into French, Spanish and German.  [3]. Most of the systems were developed on the rule based approach. V. CONCLUSION Linguistic studies on Sanskrit are less compared to other Indian natural languages Rule based translation scheme is used in most of the Sanskrit involved translation systems Most of the systems were developed either in direct or transfer based approaches and for simple sentences. Very rare translation systems uses Sanskrit as source language. There is an interesting and more efficient machine translation system developed based on interlingua approach . As Sanskrit considered as mother of many Indian languages a translation system based on interlingua approach seems to be more efficient and useful.