Survey on Machine Translation systems for Ancient Indian Languages

DOI : 10.17577/IJERTV9IS050488

Download Full-Text PDF Cite this Publication

Text Only Version

Survey on Machine Translation systems for Ancient Indian Languages

Sreedeepa H. S

Assistant Professor Computer Science & Engineering

College Of Engineering Kallooppara Thriuvalla

Divya Madhu

Assistant Professor Computer Science & Engineering

Vidya Academy of Science & Technology Technical Campus Kilimanoor

Abstract Sanskrit is a less ambiguous, language suitable for natural language processing. Most of the ancient In-dian books were written Sanskrit. This paper is a survey done on different Sanskrit involved machine translation systems.

Keywords Machine translation; Interlingua; source lan- guage; direct translation; destination language.

  1. INTRODUCTION

    Sanskrit is a less ambiguous language. As its less am- biguous in nature it is more suitable for natural language pro- cessing. [1] Sanskrit is a free word order language. Sanskrit, considered as the mother of most of all languages, possesses a rich grammar which was developed by Panini around 3000 years ago and it includes 3,959 rules. NASA, the most ad- vanced research center in the world has discovered that San- skrit is the less ambiguous spoken language on the planet. There is saying that Sanskrit is the best suitable language for computers. Due to the unambiguous nature of the language Sanskrit is the simplest language that is most suited for Artifi- cial Intelligence and Natural Language Processors.

    Machine translation (MT) is the process of converting one natural language to another using application software. Mainly there are three types of rule based machine translation tech- niques- direct approach, transfer based approach and interlin- gua based approach. Most of the translators developed were concern about word translation, bilingual dictionaries based on direct translation.

  2. MACHINE TRANSLATION SYSTEMS Machine translation (MT) is the process of converting sen-

tences in one natural language called source language to an- other called destination language. One of the the major classi- fication of machine translation approach include Rule based machine translation, Statistical, Example-based, Hybrid ma- chine translation and Neural machine translation. In rule based approach large set of rules are manually developed and apply these rules to map structures from source to target language TABLE I.[2] summarizes the advantages/disadvantages of ma- jor machine translation approaches.

I. MACHINE TRANSLATION APPROACHES

Approaches

Advantages

Disadvantages

Rule based

  1. Easy to build an ini- tial system

  2. Based on linguistic theories

  3. Effective for core phenomena

  1. Rules are formulated by experts

  2. Difficult to maintain and extend

  3. Ineffective for marginal phenomena

1.Based on taxonomy

1. Hard to build a knowledge hierarchy.

Knowledge based

of knowledge.

2. Contains an infer- ence engine.

2. Hard to define the granularity of knowledge

3.Interlingual repre-

3. Hard to represent

sentation

knowledge

1. Similarity measure

1.Extracts knowledge

is sensitive to system.

Example based

from corpus.

2.Based on translation patterns in corpus.

  1. Search cost is ex- pensive.

  2. Knowledge acquisi-

3. Reduces the human

tion is still problem-

cost

atic.

Statistics based

  1. Numerical knowledge

  2. Extracts knowledge from corpus.

  3. Reduces the human cost

  4. The model is mathe- matically grounded.

  1. No linguistic back- ground.

  2. Search cost is expen- sive.

  3. Hard to capture long distance phenomena.

Direct translation, transfer based and interlingua based ap- proaches are the major rule based machine translation tech- niques.

  1. Direct Translation

    Direct translation is the simplest form translation in which words in the source sentence are directly converted into a destination language .In this translation is done with the help of a bilingual dictionary. Word by word translation is performed here. Anusaaraka is an example of direct ta- ranslation based well known machine translation system.

  2. Transfer based Translation

    A database of translation rules is used to translate a text in source language to target language. In this approach whenever a sentence is matched to any one of the rules present in the

    database its directly translated using a dictionary. The diction- ary is such as source language(SL) dictionary, target lan- guage(TL) dictionary, and a bilingual dictionary. There are mainly two steps in this approach, syntactic transfer and se- mantic transfer.

    In syntactic transfer the SL sentence is analysed to generate asyntactic structure called parse tree and this parse tree of SL is then transfers to TL parse tree. At semantic transfer analyse a SL input to a language specific semantic representation and transfer this to TL semantic representation. Case frames and logical forms are the two constructs used for semantic repre- sentation. Finally, these representations are used generate syn- tactic structure and then surface sentence in the TL.

  3. Interlingua based Translation

    In Interlingua based approach a language independ- ent frame work is developed for translation of source lan- guage to destination language. [2] The interlingua approach has a number of advantages. It requires fewer components for the translation of the source language to each target language, and to add a new language. It allows both the analyzers and generators to be written by monolingual system developers. Also, it can handle languages that are different from each other.

    DeryleW. Lonsdale, Alexander M. Franz, and John

    1. R. Leavitt presented the design and development of an in- terlingua for a large-scale MT project, 1SL-nTL. They also discussed how the resulting Knowledge-based, Accurate Nat- ural-Language Translation (KANT) interlingua handles complexity,and development of different stages efficiently. It is developed in a balanced fashion with maximal coverage. They use, a recursive list-based structural representation of source sentences in this approach. An interlingua frame con- sists of a head concept, feature-value pairs, and semantic slots. It may contain nested interlingua frames. The source language expressions and semantic units from the domain were considered for the concept generation. The overall for- mat is modeled using frame-based structures. The f-structure reflects deep semantic relationships between major constitu- ents. [8]

      The Interlingua approach is based on the concept that MT must go beyond purely linguistic information, syntax and semantics, and should understand the content of texts. Interlingua based translation is divided into two monolingual components: ana- lyzing the source language text into an abstract universal lan- guage-independent representation of meaning, the interlingua, and generating this meaning using the lexical units and the syn- tactic constructins of the target language. [9]

      Fig.1 represents the vaquous triangle for machine transla- tion approaches. It depicts the three types of rule based trans- lation system. The main phases present in the translation are analysis transfer and generation phase.

      1. .Vaquous triangle for machine translation III RELATED WORKS

A detailed study on machine translation system on Sanskrit, Interlingua based machine translation system and Paninian framework for translation were done in developing the proposed system. Akshar Bharathi et.al. provided details of the Paninian framework [1], Parsing Free Word Order Lan- guages in the Paninian Framework [2], and Karaka analysis [3]. He also explains the use of lexical functional grammar (LFG) in unification for specifying mapping to grammatical relations[4]. The parsing of Sanskrit sentences using LFG is explained by Mrs. Namrata Tapaswi et.al. [5]. Paul Kiparsky gives detailed description of different levels of Paninian framework with examples and rules of Ashtadhyayi and rule formation on different levels of Paninian framework. [6] Sudhir Kumar Mishra et.al. [7] gives a detailed study on the Karaka analysis system based on rules of Ashtadhyayi with examples.

Sameh AlAnsary et.al. briefly reviews three of the most renowned interlingua-based machine translation projects, Distributed Language Translation (DLT), UNIversal TRANs- lator(UNITRAN) and KANT system. DLT, a research project developed in Utrecht, The Netherlands, is an interactive sys- tem developed to operate over computer networks. Translation is distributed between two independent terminals; one for the analysis and another for generation.UNITRAN is a translation system developed at Massachusetts Institute of Technology. The system operates bidirectionally between Spanish and Eng- lish. KANT system has been developed at Carnegie- Melon University (CMU) in Pennsylvania, USA in 1989. KANT is the only interlingua-based MT system to be operational com- mercially. It has been used in translating English technical doc- uments into French, Spanish and German.

Translation system developed JNU uses word sense disambig- uation module and Anaphora Resolution module Here they used Sanskrit as SL and Hindi as TL. Sanskrit to English ma- chine translation developed by Subramanian focus on sandhi vicheda,,and morphological analysis.

IV SANSKRIT INVOLVED MACHINE TRANSLATION SYSTEMS

Some of the Sanskrit involved machine translation systems were shown in the TABLE II.[3]. Most of the systems were developed on the rule based approach.

TABLE II MACHINE TRANSLATION SYSTEMS DEVELOPED FOR SANSKRIT

Machine Translation System

Approach

Source-target Language Pair

Features

ETSTS

Rule and example based

English to San- skrit

Converts target sentence to

speech output, Use of

Bilingual dic- tionary

Sanskrit to English Translator by Subramaniam

Rule based

Sanskrit to English

Focus on Sandhi Vichheda , Mor- phological Anal- ysis.

English to Sanskrit machine translation by Mishra and Mishra

Rule based

English to San- skrit

POS tagger Mod- ule, Uses

ANN for verb se- lection, GNP Module.

English to Sanskrit machine translation by Mane D.T.etal

Rule based

English to San- skrit

Use of bilingual dictionary and grammar rules file.

Sanskrit to Hindi MT by JNU.

Rule based

Sanskrit to Hindi

WSD module, Anaphora Reso- lution module.

Interlingua based Sanskrit to English machine translation

Knowledge bsed

Sanskrit- Eng- lish

Based on Panin- ian Grammer

V. CONCLUSION

Linguistic studies on Sanskrit are less compared to other Indian natural languages Rule based translation scheme is used in most of the Sanskrit involved translation systems Most of the systems were developed either in direct or transfer based approaches and for simple sentences. Very rare translation systems uses Sanskrit as source language. There is an interesting and more efficient machine transla- tion system developed based on interlingua approach . As Sanskrit considered as mother of many Indian languages a translation system based on interlingua approach seems to be more efficient and useful.

REFERENCES

  1. R. Briggs, "Knowledge representation in Sanskrit and artificial in- telligence," AI magazine, vol. 6, 1985, p. 32. Springer, 2009, pp. 200-218. Annual Conf. Magnetics Japan, p. 301, 1982.

  2. H. S. Sreedeepa and S. M. Idicula, "Interlingua based Sanskrit- English machine translation," 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), Kollam, 2017, pp. 1-5.

    doi: 10.1109/ICCPCT.2017.8074251

  3. Jaideepsinh K. Raulji, Sanskrit Machine Translation Systems: A Comparative Analysis, International Journal of Computer Appli- cations,2016.

  4. Sameh AlAnsary Department of Phonetics and Linguistics, Faculty of Arts, Alexandria University ElShatby, Alexandria, Egypt., Interlingua-based Machine Translation Systems: UNL versus Other Interlinguas,2014.

  5. Akshar Bharati Rajeev Sangal Department of Computer Science and Engineering Indian Institute of Technology Kanpur Kanpur 208016 India Internet: sangal@iitk.ernet.in, Parsing Free Word Order Lqanguages in the Paninian Framework, ACL '93 Proceed- ings of the 31st annual meeting on Association for Computational Linguistics Pages 105-111, June 22 – 26, 1993

  6. Akshar Bharati, Medhavi Bhatia, Vineet Chaitanya, Rajeev Sangal Department of Computer Science and Engineering Indian Institute of Technology Kanpur sangal@iitk.ernet. Paninian Grammar Framework Applied to Englishin February 1996

  7. Akshar Bharati, Vineet Chaitanya, Rajeev Sanga Paninian frame- work and its application to Anusaraka,Springer, February 1994, Volume 19, Issue 1, pp 113127

  8. Akshar Bharati, NLP:A Paninian Perspective, PHI Learning, 1996

  9. Namrata Tapaswi, Suresh Jain and Vaishali Chourey, Parsing Sanskrit Sentences Using Lexical Functional Grammar Systems and Informatics (ICSAI), International Conference on 19-20 May 2012 pp.2636 2640..

  10. Paul Kiparsky, Stanford University, On the Architecture of Panin- ian Grammar, UCLA.2002.

  11. Sudhir kumar Mishra, JNU, Sanskrit Karaka Analyzer for MT,2007.

  12. Deryle W. Lonsdale, A. M. Franz and J. R. R. Leavitt. Large Scale Machine Translation: An Interlingua Approach. in Proceed- ings of the 7 th International Conference on Industrial and Engi- neering Applications of Artificial Intelligence and Expert Systems,

    Austin, Texas, The United States. 1994

  13. Sameh AlAnsary Department of Phonetics and Linguistics, Faculty of Arts, Alexandria University ElShatby, Alexandria, Egypt., Interlingua-based Machine Translation Systems: UNL versus Other Interlinguas,2014

  14. P. Goyal, V. Arora and L. Behera, "Analysis of Sanskrit text: Pars- ing and semantic relations," in Sanskrit Computational Linguistics,

    Sanskrit Computational Linguistics, 200-218, 2009

  15. Ved Kumar Gupta, Prof. Namrata Tapaswi, Dr. Suresh Jai,Knowledge representation of Grammatical constructs of San- skrit language using Rule based Sanskrit to English MT, 2016 http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6524744

  16. Dinesh kumar, Gurpreet Sing, POS Tagger for Morphology rich Indian languages, International Journal of Computer Applications (0975 8887) Volume 6 No.5, September 2010

Leave a Reply