Analysis of Formant Frequency F1, F2 and F3 in Assamese Vowel Phonemes using LPC Model

DOI : 10.17577/IJERTV6IS050422

Download Full-Text PDF Cite this Publication

Text Only Version

Analysis of Formant Frequency F1, F2 and F3 in Assamese Vowel Phonemes using LPC Model

Dr. Bhargab Medhi Department of Applied Science Gauhati University

Guwahati, Assam, India

AbstractFormant frequency plays an important role in speech as well as speaker recognition. Formants are the spectral peaks of a sound wave which means the specific resonance frequencies of vocal tract which have maximum energy concentration during the vowel utterances. In a speech spectrum, there may be any number of formants, but for speech the most informative are the first three formants referred to as F1, F2, and F3. In this paper, the paths of these three formants are analyzed in Assamese vowel phonemes. LPC model is used to identify the formant frequencies.

Keywords Formant, LPC, Vowel Phoneme, Spectrum, Filter.

  1. INTRODUCTION

    Assamese (IPA: xmija) is a native language of Assam which is a major language in the north-eastern India. Its root is Indo-European family of languages. Assamese scripts is derived from the Devanagari scripts consisting thirty nine consonants and eleven vowel symbols which are arranged in a well structured scientific manner[6]. Though there are eleven vowel symbols in Assamese script, but the number of vowel phonemes is only eight.

    A phoneme is nothing but a single unit of sound that has a meaning in that language. The vowel is the largest phoneme group as the source for vowel is quasi-periodic puffs of airflow through the vocal folds vibrating at a certain fundamental frequency. Each vowel phoneme corresponds to a different vocal tract configuration. Different studies say that the first three formant frequencies measured in the steady- state part of a vowel play an important role in its characterization. The formants of the same vowel uttered by different speakers, in different contexts, at different speaking rates and with different stress patterns, show a lot of variability [5]. From the last few decades, a number of well approaches have been developed for analysis and synthesis of speech signal with a view for speaker/speech recognition. Among those approaches formant estimation is considered as one of the basic models for speech recognition and research.

    phoneme 10 times, uttered by ten numbers of Assamese native speakers of equal number of male and female. The recording is done in an acoustic studio in a noise free environment where the utterances are kept normal, stress free and intonation flat.

    The written symbols of Assamese vowel scripts and their corresponding vowel phonemes are presented in the following TABLE I.

    TABLE I: Assamese vowel phonemes and their positions

  2. LPC MODEL AND FORMANT FREQUENCY

    Speech signal is formed by the convolution of excitation source and time varying vocal tract components.LPC is a method of separating out the effects of source and filter from a speech signal. The cepstral analysis is the deconvolution of speech into source and system components by traversing through frequency domain. LPC is a tool used mostly in audio signal processing and speech processing for representing the spectral envelop of a digital signal of speech in compressed form, using the information of a linear predictive model [5].

    In LP analysis of speech, an all pole model is assumed for the system producing speech signal s(n).The predicted sample can be represented as by (1).

    p

    s(n) ai s(n i) Gu(n)

    The formant model is used for the determination of formant frequency of Assamese vowels based on the model proposed

    i 1

    — (1)

    by L. Welling. The basic idea behind the LPC model is that a given speech sample at time can be approximated as a linear combination of the past speech samples [1,5]. The formant frequencies are computed by taking peak the LPC spectrum. In the first phase during this work, a small database for eight Assamese vowel phonemes is created recording each

    Where, ai (i=1, 2, 3, . . ., p) are the co-efficients assumed to be

    constant over the speech analysis frame. The u(n) is the normalized excitation and G is the gain of excitation. If s(n) is the estimate value of s(n) calculated from the linear

    combination of past p-samples, then we can get the (2).

    p

    s(n) ak s(n k )

    k 1

    —(2)

    p

    E(s) S(z) a .S(z)z

    k

    k

    k 1

    —(9)

    Now, the predictor error can be defined as by (3) given below.

    i.e.

    A(z) 1 a .z

    e(n) s(n) s(n)

    S(z)

    k

    E(z) p k

    k 1

    p

    s(n)

    k 1

    ak s(n k )

    —(3)

    —(10)

    So it is clear that LP residual can be obtained filtering the speech signal with A(z). Similarly, it can be defined as (11).

    For a speech frame of size m samples, the mean square of prediction error over the whole frame is given by (4).

    H (z)

    1 1

    p k A(z)

    p 2

    1 ak .z

    k 1

    k

    E e2 (m) s(m) a s(m k )

    —(11)

    m m

    k 1

    —(4)

    Since, A(z) is the reciprocal of H(z), LP residual is obtained by the inverse filtering of speech. LP spectrum provides the

    Optimal predictor coefficients will minimize this mean square

    error. The minimum MSE criterion of E is given by (5)

    E

    vocal tract characteristics from where the vocal tract resonances i.e. formants can be estimated taking the peaks from the LP spectrum [2, 4].

    ak

    0,

    k 1, 2,…, p

    —(5)

  3. EXPERIMENT AND RESULT

    In the processing part, the first action is to capture the signal of vowel utterances that we require. The Audacity software is

    Differentiating the Equation (4), we get

    Ra r

    Where,

    —(6)

    used to record the vowel utterances in 16,000 Hz in mono format. The silence part is removed manually which overwrites the original signal. The following parameters are considered in the formant analysis-

    • Frame length=256 samples

    a [a1 a2

    … ap

    ]T , r [r(1) r(2) … r( p)]T

    • Frame overlap= 128 samples

    • Sampling frequency= 16,000 Hz.

      and R is a Toeplitz symmetric auto-correlation matrix which given by (7).

    • Window type=Hamming (256 samples)

    r(0)

    r(1)

    r(1)

    r(0)

    r( p 1)

    r( p 2)

    R

    r( p 1)

    r( p 2)

    r(0)

    —(7)

    LP residual is the prediction error e(n) i.e. the difference between the predicted speech sample s(n) and the current sample s(n) which is given by (8).

    e(n) s(n) s(n)

    p

    e(n) s(n) ak .s(n k)

    k 1

    —(8)

    Fig.1. Formants of vowel (IPA: /o/ ) by male speaker.

    In the frequency domain, the equation (8) can be represented as

    in given below (9).

    Fig.2. Formants of vowel (IPA: /o /) by female speaker.

    Fig.3. Formants of vowel (IPA: /a/) by male speaker.

    Fig.4. Formants of vowel (IPA: /a/) by female speaker.

    Englewood Cliffs, NJ,1979, Prentice-Hall.

    Fig.5. Formants of vowel (IPA: //) by male speaker.

    Fig.6. Formants of vowel (IPA: //) by female speaker.

  4. CONCLUSION

From the analysis of formant frequencies of different Assamese vowel phonemes, we notice that the variation of F1 and F2 with respect to different vowel is quite distinct. Each color line represents a unique formant. In each case the formant values of female speaker is comparatively high than themale speaker. It is also seen that the third formant frequency F3 does not play a crucial role in the identification of a specific vowel spectrum.

REFERENCES

  1. Gold and N. Morgan, Speech and Audio Processing: Processing and Perception of Speech and Music ,New York, 2000.

  2. Medhi Bhargab, and P. H. Talukdar. "Isolated Assamese Speech Recognition using Artificial Neural Network." Advanced Computing and Communication (ISACC), 2015 International Symposium on. IEEE, 2015.

  3. Medhi Bhargab, and Pran Hari Talukdar. "Zero Crossing Rate Analysis of Assamese Vowel Phonemes." International Journal of Engineering Research and Technology. Vol. 3. No. 3 (March-2014). IJERT, 2014.

  4. Medhi Bhargab, TALUKDAR P. Different acoustic feature parameters ZCR, STE, LPC and MFCC analysis of Assamese vowel phonemes,ICFM 2015.

  5. L.R. Rabiner and R. Schafer, Digital Processing of Speech Signals,

  6. Banikanta Kakati, Assamese, its Formation and Development, 5th edition, Guwahati, India,LBS Publications, 2007.

  7. F. Jelinek, Statistical Methods for Speech recognition, Cambridge, The MIT Press, 1998.

Leave a Reply