Indian Language Recognition

Vineeta Singh; Deeptha Shree G.

doi:10.17577/IJERTV3IS10641

Volume 03, Issue 01 (January 2014)

Indian Language Recognition

DOI : 10.17577/IJERTV3IS10641

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 108
Total Downloads : 291
Authors : Vineeta Singh, Deeptha Shree G.
Paper ID : IJERTV3IS10641
Volume & Issue : Volume 03, Issue 01 (January 2014)
Published (First Online): 24-01-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Indian Language Recognition

Vineeta Singh

Amrita School of Engineering

Deeptha Shree G.

Amrita School of Engineering

Abstract

Spoken Language Recognition is an important application with a growing necessity. The motivation behind this paper arises from the fact that India is home to 415 living languages (according to the SIL Ethnologue), which highlights the necessity of an application which could identify each spoken language and classify it and further translate it.

In this paper we investigate Indian spoken Language recognition for 3 languages namely Kannada, Tamil and Malayalam with the usage of RASTA-PLP for extracting features from speech utterances and Multi- layer Perceptron for classification of these features.

Introduction

Spoken language recognition is the process of identifying a speech utterance and classifying it. With the advent of globalisation, the demand for communication across boundaries is increasing. This has given rise to new challenges for Automatic Speech Recognition (ASR): before the machine can understand the meaning of the utterance, it must identify which language is being spoken. The past few decades has seen lot of advances in this field.

Humans have an inborn ability to distinguish and characterise languages to some extent. Scientists and researchers have been in a quest to automate this part of the human intelligence [1]. Various ways of classifying a given language are acoustic phonetics, phonotactics, prosody and syntax.
- Acoustic phonetics: Phonemes are any of the minimal units of speech sound in a language
  
  that can distinguish one word from another. The number of phonemes in each language is between 15-50.Phonetic repertoire differs for different languages.
- Phonotactics: Each language has a particular set of phonotactics rules that determine the permissible phone sequences.
- Prosody: Prosody is the study of all the elements of language that contribute toward its acoustic and rhythmic effects.
- Syntax: The way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses)
  
  Based on these several methods have been devised for Language Identification (LID). However the basic steps involved in any LID remain the same:
  
  Fig1. Steps in language identification
STATE OF THE ART

This section discusses the background work related to different spoken Language recognition systems. This includes discussion on the algorithms which are most commonly used and the ones that have achieved the best results.

A lot of research has been performed in this domain with a lot of effort being put to improve the accuracy of the LID systems. Language identification is a two-step process: feature extraction and classification.

Various methods have been implemented for feature extraction. A few of them are described below:
- Linear predictive coding (LPC): It is a popular technique used for speech coding. It has the ability to provide accurate estimates of acoustic features with less computation and storage compared to other approaches. The LPC derives a compact and precise representation of the spectral magnitude for signals with a brief duration. The fundamental idea of LPC is that a speech sample can be estimated as a linear combination of past samples. One main limitation of LPC features is the linear assumption that fails to take into account of the non-linear effects. The LPC features also lack the discriminant power for classification tasks and are highly sensitive to acoustic environment. Additive background noise or room reverberation can affect the accuracy of LPC analysis.
- Mel-Frequency Cepstral Coefficients: The MFCC features apply Mel-frequency warping onto the power spectra with the use of a triangular Mel-scale filter bank. Logarithmic compression is also applied to Mel-spectra to approximate human auditory processing [5].
- Perceptual Linear prediction: PLP is an acoustically derived features proposed by Hermansky.[2] The following three concepts from the psychophysics of hearing are applied to derive an auditory spectrum estimate:
  - Critical-band spectral-resolution [6],
  - The equal-loudness hearing curve [7], and
    - The intensity-loudness power law of hearing [8].
- RASTA-PLP: A major cause of problem in speech recognition systems is the mismatch
  
  between the conditions used to record the speech training data and the conditions under which the data to be recognized is recorded (ex: a change in headset). The term RASTA comes from the words RelAtive SpecTrA. The RASTA technique applies a bandpass filter to each spectral component in the critical band spectrum estimate. Human hearing seems relatively insensitive to slowly varying stimuli [3]. The basic idea of RASTA filtering is to exploit these phenomena by suppressing constant and slowly varying elements in each spectral component of the short term auditory- like spectrum prior to computation of the linear prediction coefficients.
  
  Feature vectors produced with these methods can be used directly in the training and testing of a vector- quantization-based, dynamic time- warping-based, hidden-Markov-model-based or neural network based language recognition system.
PROPOSED METHOD

The workflow of our proposed method is similar to figure1. The first step is to capture speech segments of Tamil, Malayalam and Kannada, the second step is pre- processing of the captured speech signal, the third step is feature extraction using RASTA-PLP algorithm and the fourth step is language identification with the help of MLP neural network.
We know that statistical properties of speech are not stable over a large period of time. Therefore, we opt for short time processing of speech signals. To perform this we used a hamming window because it has minimal side-lobes and smooth curve. This is implemented using the Enframe function in the Voicebox toolbox for Matlab. [4]
RASTA-PLP is implemented using the signal processing toolbox for Matlab Voicebox [4].

PLP parameters obtained on applying RASTA-PLP to speech segments are compressed and given as input to the MLP neural network
LANGUAGE CLASSIFICATION

Fig2. Two layer Feed forward neural network

The compact PLP feature set of each speech utterance is given as input to a 2 layer feed forward Multi- Layer Perceptron model (MLP). Data is randomly divided for training, validation and testing. The artificial neural network is trained using scaled conjugate gradient back propagation algorithm. The performance is evaluated using mean square error and confusion matrices. Matlab Neural pattern recognition Toolbox was used to implement MLP feed forward network.

Fig3. Confusion matrix and region of convergence
CONCLUSION

The languages- Kannada, Tamil and Malayalam were classified successfully with good accuracy. Therefore, we conclude by stating that satisfactory results can be produced when Indian languages are classified with MLP feed forward neural networks using RASTA-PLP for feature extraction.
REFERENCES

J. Zhao, H. Shu, L. Zhang, X. Wang, Q. Gong, and P. Li, Cortical competition during language discrimination, NeuroImage,

vol. 43, pp. 624633, 2008
H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech", J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738- 1752, Apr. 1990.
H. Hermansky and N. Morgan, "RASTA processing of speech", IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 578-589, Oct. 1994
{Ellis05-rastamat Author = {Daniel P. W. Ellis}, Year

= {2005}, Title = {{PLP} and {RASTA} (and {MFCC}, and

inversion) in {Matlab},Url =

{http://www.ee.columbia.edu/~dpwe/resources/matlab/rastam at/},Note = {online web resource}}
S. Davis and P. Mermelstein, Comparison of parametric representations for mono-

syllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no.4, pp. 357366, 1980.
Zwicker, E. Masking and psychological excitation as consequences of ears frequency analysis, in Frequency Analysis and Periodicity Detection

in Hearing edited by R. Plotag and G. Smoorenburg (Sijthoff, Leyden, The Netherlands).
Makhoul, J. and Cosell, L., LPCW: An LPC vocoder with linear predictive

spectral mapping, in Proceedings of the IEEE International Conference

on Acoustics, Speech, and Signal Processing, 466-469, Philadelphia, 1976
Stevens, S., On the psychophysical law, Psychol. rev. 64, 153-181

Indian Language Recognition

Leave a Reply