Voice to Voice Language Translation System

DOI : 10.17577/IJERTV3IS100924

Download Full-Text PDF Cite this Publication

Text Only Version

Voice to Voice Language Translation System

Akshay Suresh Deshpande1 Keshav Shesharao Ambulgekar2 Kedar Raghunath Joshi3

Electronics and Telecommunication Engineering Maharashtra Institute of Technology Aurangabad, Maharashtra

Akshata H. Utgikar

Assistant Professor Electronics and Telecommunication Dept.

Maharashtra Institute of Technology Aurangabad, Maharashtra

Abstract: In this paper the nascent stage of developing a personalized interpreter, we propose to develop a prototype which uses a speech processing hardware and on translators to provide the user with real time translation. Speech processing hardware works on the principle of compare and forward, i.e., a database is already stored in the unit which is used for comparing with the input speech and the result is forwarded for further processing. The need arises from the inability of dictionaries and human translators to suit our needs for better communication. In this situation the prototype proposed will suffice the purpose reasonably well and minimize the communication inefficiencies.

Keywords Language Translator, Microcontroller, Speech Recognition HM2007, Speech Synthesizer APR6016.

  1. INTRODUCTION

    The global, borderless economy has made it critically important for speakers of different languages to be able to communicate. Speech translation technology being able to speak and have ones words translated automatically into the other persons language has long been a dream of humankind. Speech translation has been selected as one of the ten technologies that will change the world. There are especially high hopes in Japan for a speech-translation system that can automatically translate ones everyday speech as use of Japanese has become increasingly international, so such speech-translation technology would be a great boon to the nation.

    Automatic speech translation technology consists of three separate technologies: technology to recognize speech (speech recognition); technology to translate the recognized words (language translation); and technology to synthesize speech in the other persons language (speech synthesis). Recent technological advances have made automatic translation of conversational spoken Japanese, English, and Chinese for travelers practical, and consecutive translation of short, simple conversational sentences spoken one at a time has become possible.

    This report starts by affirming the significance of speech-translation technology, and providing an overview of the state of research and development to date, and the history of automatic translation technology. It goes on to describe the architecture and current performance of speech translation systems.

  2. SURVEY

    Following are the systems implemented for Speech to Text & Speech to Speech Translation commercially available in the market.

    1. Text To Speech Translator:

      The SP0-512 Text to Speech IC is a pre- programmed microcontroller that accepts English text from a serial connection converts that text to phoneme codes then generates audio. It is ideal for adding a robot voice to your embedded designs.

    2. VOICE ACTIVATED PHRASE LOOKUP (Text to Speech System):

      Voice activated phrase lookup systems are not true speech translation systems by definition. A typical voice activated phrase lookup system is the Phraselator system. The Phraselator is a one-way device that can recognize a set of pre-defined phrases and play a recorded translation.

      This device can be ported easily to new languages, requiring only a hand translation of the phrases and a set of recorded sentences. However, such a system severely limits communication as the translation is one way, thus reducing one partys responses to simple pointing and perhaps yes and no.

    3. SIGMO (Speech to Speech System):

      SIGMO allows real-time translating of 25 languages. It has two modes of voice translation. Set the native language, then the language to translate to. By pressing the first button and speaking the Set phrase SIGMO in turn will instantly translate and pronounce it in a selected language. By pressing the second button, it will translate speech from the foreign language, then instantly speak selected native language.

    4. MASTOR (IBM)

      MASTOR (Multilingual Automatic Speech-To- Speech Translator) is IBMs highly trainable speech-to- speech translation system, targeting conversational spoken language translation between English and Mandarin Chinese for limited domains depicts the architecture of MASTOR. The speech input is processed and decoded by a large- vocabulary speech recognition system. Then the transcribed text is analyzed by a statistical parser for semantic and

      syntactic features. A sentence-level natural language generator based on maximum entropy (ME) modeling is used to generate sentences in the target language from the parser output. The produced sentence in target language is synthesized into speech by a high quality text-to-speech system.

    5. Matrix (ATR)

    The Spoken Language Translation Research Laboratories of the Advanced Telecommunications Research Institute International (ATR) has five departments. Each department focuses on a certain area of Speech Translation.

    This system can recognize natural Japanese utterances such as those used in daily life, translate them into English and output synthesized speech. This system is running on a workstation or a high end PC and achieved nearly real-time processing. Unlike its predecessor ASURA, ATR-MATRIX is designed for spontaneous speech input, and it is much faster. The current implementation deals with a hotel room reservation task. ATR-MATRIX adopted a cooperative integrated language translation model. Because of its small, light size, and available attachments it is portable and easy to use.

  3. PROPOSED SYSTEM

    Fig. 1. Block Diagram of the system

    The figure explains the Block Diagram of Voice to Voice Language Translation System in which the input speech is given through the microphone which then goes to the speech processing unit. This unit processes the input and the word which was spoken is recognized. The input speech first goes to the speech IC of the speech processing unit.

    1. SPEECH RECOGNITION SYSTEM

      Speech Recognition is the process of converting an acoustic signal, captured by microphone or telephone to a set of words. In this system, HM2007 is used as a Speech Recognition unit. The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip contains an

      analog front end, voice analysis, regulation, and system control functions. The chip may be used in a stand alone or CPU connected.

      Speech recognition is divided into two broad processing categories; speaker dependent and speaker independent. Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system. This is the most common approach employed in software for personal computers. Speaker independent is a system trained to respond to a word regardless of who speaks. Therefore the system must respond to a large variety of speech patterns, inflections and enunciation's of the target word. The command word count is usually lower than the speaker dependent however high accuracy can still be maintain within processing limits. Industrial applications more often require peaker independent voice recognition systems.

      In this system we are using speaker independent mode as it can use by any one.

      Some features of HM2007 are as follows:

      • Single chip voice recognition CMOS LSI

      • Speaker dependent

      • External RAM support

      • Maximum 40 word recognition (.96 second)

      • Maximum word length 1.92 seconds (20 words)

      • Microphone support

      • Manual and CPU modes available

      • Response time less than 300 milliseconds

      • 5V power supply

      The speech recognition system is a completely assembled and easy to use programmable Speech recognition circuit. Programmable, in the sense that we train the words (or vocal Utterances) we want the circuit to recognize. This board allows you to experiment with many facets of speech recognition technology. It has 8 bit data out which can be interfaced with any Microcontroller for further development.

      The input to the system is feed by microphone to speech recognizer circuitry which is used to recognize the words that are already stored in the system. The speech recognition & recording system requires an external memory which is sufficed by a SRAM. Speech recognition & recording system along with the static RAM forms the fundamental block of the speech processing unit. The database is stored in the SRAM and the Speech processing unit is used in the recognition mode where comparison of the input and the database takes place and a particular eight bit BCD address is given as the result. This BCD address is feed to digital data processing unit. The microcontroller used in this system will convert the input address from HM2007 and process it in such a way that the address generated at the

      output will specify the address of the same word but in the different language, which will be then feed to the APR6016 in order to retrieve the word stored in the synthesizer system.

    2. SPEECH SYNTHESIZER

    In this system we are using APR6016 as audio playback and recorder part which is at the output of the system as shown in the block diagram. The APR6016 offers non-volatile storage of voice and/or data in advanced Multi- Level Flash memory. Up to 16 minutes of audio recording and playback can be accommodated. The APR6016 memory array is organized to allow the greatest flexibility in message management and digital storage. The smallest addressable memory unit is called a sector. The APR6016 contains 1280 sectors. Sectors 0 through 1279 can be used for analog storage. During audio recording one memory cell is used per sample clock cycle.

    The APR 6016 stores voice signals by sampling incoming voice data and storing the sampled signals directly into FLASH memory cells. Each FLASH cell can support voltage ranges from 1 to 256 levels. These 256 discrete voltage levels are the equivalent of eight (28=256) bit binary encoded values. During playback the stored signals are retrieved from memory, smoothed to form a continuous signal and finally amplified before being fed to an external speaker amplifier. Device control is accomplished through an industry standard SPI interface that allows a microcontroller to manage message recording and playback.

    The APR 6016 is equipped with an internal squelch feature. The Squelch circuit automatically attenuates the output signal by 6 dB during quiet passages in the playback material. Muting the output signal during quiet passages helps eliminate background noise. Background noise may enter the system in a number of ways including: present in the original signal, natural noise present in some power amplifier designs, or induced through a poorly filtered power supply.

    The audio signal containing the content we wish to record should be fed into the differential inputs ANAIN-, and ANAIN+. After pre-amplification the signal is routed into the anti-aliasing filter. The anti-aliasing filter automatically adapts its response based on the sample rate being used. No external anti-aliasing filter is therefore required. After passing through the anti-alias filter, the signal is fed into the sample and hold circuit which works in conjunction with the Analog Write Circuit to store each analog sample in a flash memory cell.

    The audio signal containing the content you wish to record should be fed into the differential inputs ANAIN-, and ANAIN+. After pre-amplification the signal is routed into the anti-aliasing filter. The anti-aliasing filter automatically adapts its response based on the sample rate being used. No external anti-aliasing filter is therefore required. After passing through the anti-alias filter, the signal is fed into the sample and hold circuit which works in conjunction with the Analog Write Circuit to store each analog sample in a flash memory cell.

    The APR contains a 20 bit op-code register, out off which 14 bits are for the sector address and remaining 5 bits are for the op-code of various instruction. The instructions

    and there op-code with the summary of the instruction is listed in the table given below:

    TABLE 1 OPERATIONAL CODES

    INSTRUCTION NAME

    OP-CODE [OP4-OP0]

    SUMMARY

    NOP

    00000

    No Operation

    SID

    00001

    Causes the Silicon ID to be read

    STOP

    00110

    Stop the current Operation

    STOP_PWDN

    00111

    Stop the current Operation & causes the device to enter into Power Down mode

    SET_REC

    01000

    Start a Record Operation from the Sector Address specified

    REC

    01001

    Start a Record Operation from the Current Sector Address specified

    SET_PLAY

    01100

    Start a Playback Operation from the Sector Address specified

    PLAY

    01101

    Start a Playback Operation from the current Sector specified

    When a SET_REC or REC command is issued the device will begin sampling and storing the data present on ANAIN+ and ANAIN- to the specified sector. After half the sector is used the SAC pin will drop low to indicate that a new command can be accepted. The device will accept commands as long as the SAC pin remains low. Any command received after the SAC returns high will be queued up and executed during the next SAC cycle.

    The SET_REC command begins recording at the specified memory location after Tarec time has passed. Some time later the low going edge on the SAC pin alerts the host processor that the first sector is nearly full. The host processor responds by issuing a REC command before the SAC pin returns high. The REC command instructs the APR6016 to continue recording in the sector immediately following the current sector. When the first sector is full the device automatically jumps to the next sector and returns the SAC signal to a high state to indicate that the second sector is now being used. At this point the host processor decides to issue a STOP command during the next SAC cycle. The device follows the STOP command and terminates recording after TSarec. The /BUSY pin indicates when actual recording is taking place. The typical recording sequence is as shown below:

  4. SYSTEM FLOW

    Start

    Input Speech

    Compare & forward address

    Fig. 2. Typical Recording Sequence

    When a SET_PLAY or PLAY command is issued the device will begin sampling the data in the specified sector and produce a resultant output on the AUDOUT, ANAOUT-, and ANAOUT+ pins. After half the sector is used the SAC pin will drop low to indicate that a new command can be accepted. The device will accept commands as long as the SAC pin remains low. Any command received after the SAC returns high will be queued upand executed during the next SAC cycle. Figure 2 shows typical playback sequence:

    Is Add= 1-5?

    Wait for some delay

    Is next input recognized

    ?

    Is Add= 55?

    Voice too short

    Voice too long

    Is Add= 66?

    Is Add= 77?

    Fig. 3. Typical Playback Sequences

    Data for Speech Synthesizer

    Voice Output

    End

    Add previous & present input

    Fig. 4. System Flow chart

    Translate the received code

    No Match found

    Invalid address

  5. CONCLUSION

Voice to Voice Language Translation system is a device that is designed to bridge the language gap between individuals and foreigners when traveling in our country. The need arises from the inability of dictionaries and human translators to suit our needs for better communication.

At present we need Personalized Interpreters which will reduce our dependence on dictionaries and human interpreters. This will reduce the hindrance posed by the language barrier. In this situation the system proposed will suffice the purpose reasonably well and minimize the communication inefficiencies.

The system can overcome the real time difficulties of illiterate people and improve their lifestyle.

REFERENCES

[1]. Sign Language to Speech Translation System Using PIC Microcontroller, Gunasekaran. K1, Manikandan. R2, Senior Assistant Professor2, School of Computing, SASTRA University, Tirumalaisamudram, Tamilnadu, India-613401. guna1kt@gmail.com1, manikandan75@core.sastra.edu2. Volume 5, No 2 Apr-May 2013

[ISSN NO: 0975-4024] [2]. Speech to Speech Language Translator, Umeaz Kheradia, Abha Kondwilkar, B.E (Electronics & Telecommunication), Rajiv Gandhi Institute of Technology.umeaz_kheradia17@yahoo.com, abhassk@yahoo.co.in. Volume 2, Issue 12, December 2012 [ISSN

2250-3153] [3]. Process Speech Recognition System using Artificial Intelligence Technique, Anupam Choudhary, Ravi Kshirsagar, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-5, November 2012.

[4]. An Implementation of Text Dependent Speaker Independent Isolated Word Speech Recognition Using HMM INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH

TECHNOLOGY (IJESRT), Ms. Rupali S Chavan, Dr. Ganesh S.

Sable, [September, 2013] ISSN: 2277-9655 Impact Factor: 1.852 [5]. Patent US103508 B3 Voice Activated Language Translator.

[6]. Patent Brevettos US6085160 Language Independent Speech Recognition.

Leave a Reply