Cepstral Analysis of Speech using Discrete Hartley Transform

Download Full-Text PDF Cite this Publication

Text Only Version

Cepstral Analysis of Speech using Discrete Hartley Transform

Madhukar. B. N.

Senior Assistant Professor, ECE Dept, New Horizon College of Engineering, Bangalore, INDIA.

Dr. P. S. Satyanarayana

Senior Professor, ECE Dept, Don Bosco Institute of Technology,

Bangalore, INDIA.

Abstract This paper presents Cepstral analysis of speech signal using Discrete Hartley Transform. Hitherto, the cepstral

Here, is the angular frequency in rad/sec. Note that

() = cos + = 2 sin ( + ) = 2 cos ( )

analysis of speech was carried out using in temporal domain and 4 4

in the frequency domain using Discrete Fourier Transform (DFT) based approach. A new approach of finding the cepstral coefficients in the frequency domain using DHT, rather than using the DFT is proposed. DFT being a complex transform takes more computation time for finding the cepstral coefficients of the speech signal, but DHT being a real transform takes less computation time to do the same with less memory requirement.

is the cosine-and-sine or Hartley kernel. In Signal Processing terms, this transform takes a signal (function) from the time- domain to the Hartley spectral domain (frequency domain). Eq. (1) is called the Analysis Equation of the CHT [3]. The Inverse Continuous Hartley Transform (ICHT) was again defined by Hartley originally as

The relationship between DFT and DHT is made use of for finding the cepstral coefficients, rather than using the DFT

() = 1


(). ( ) (2)

directly. The usage of DHT method is found to be optimal than using the direct DFT approach thereby saving implementation cost substantially in the cepstral analysis of speech signals. The analysis is done by comparing the usage of DFT directly on the speech signal and then using DHT, there by seeing the performance of these mathematical transforms based on computation time which is used as a performance metric for validating the veracity of the two discrete transforms used. This work is implemented using MATLAB R2014a software.

Eq. (2) is called the Synthesis Equation of the CHT [4]. The Hartley transform has the convenient property of being its own inverse. Hence, it is an Involution Integral or Transform [5].

The properties of the [. ] function follow directly from Trigonometry and its definition as phase-shifted trigonometric functions as given below.

2 = 2 sin (2 + ) = 2 cos (2 ).

4 4

It has angle-addition identity as given below.

KeywordsDHT, DFT, Cepstrum.


    The Discrete Hartley Transform (DHT) is a variant of the Discrete Fourier Transform (DFT) which is renowned and is one of the largely used transforms in the Communication Engineering and Signal Processing. Discrete Hartley Transform (DHT) was developed by Ronald N. Bracewell, a famous Australian physicist, engineer, and mathematician, in the 1980s. DHT is a discretized version of the Continuous Hartley Transform (CHT) invented by the U.S. electronics researcher Ralph Vinton Lyon Hartley (1888 A.D. 1970 A.D.) in 1942 [1]. Though it existed in the technical literature, it remained in the oblivion for a number of years until Bracewell invented and published a discretized version of it in 1983. Hartley also invented the Hartley Oscillator and also contributed towards the development of Information Theory during its stage of infancy. In this paper, Hartley Transform and its discrete variant are reviewed followed by their application to the cepstral analysis of speech signal [2].

  2. THE CONTINUOUS HARTLEY TRANSFORM The Continuous Hartley Transform (CHT) is an

    orthogonal integral transform. Originally, Hartley defined the CHT of a continuous time function() as

    (1 + 2) = {cos 12 + sin 1(2)}


    (1 + 2) = {cos 21 + sin 2 (1)}

    The first-order derivative of 1 w.r.t. 1 is given by the following expression [6],

    ( ) = [ ] = cos sin = ( ).

    1 1 1 1 1 1



    The CHT is closely related to the Continuous Time Fourier Transform or Infinite Fourier Transform or the Complex Fourier Transform. CHT differs from the classic Fourier transform in the choice of the kernel. It is well-known that the Continuous Time Fourier Transform is a complex integral transform due to the use of the complex exponential,

    or 2 as its kernel. But the Continuous Hartley Transform (CHT), unlike the Continuous Time Fourier Transform, is a real transform because its kernel () = cos + sin is real. Even its backward or inverse transform is also real [7].

    () = 1


    (). ( ) (1)

    The Continuous Time Fourier Transform (CTFT) of a continuous time function () is defined as,


    1. Relationship between the Analysis Equations of DFT and

      () = 1







      The Inverse Fourier Transform of () is defined by the following equation,

      The DFT of a discrete time sequence () is defined as

      () = 1




      () = [()] = 1 ()



      The CTFT can be directly obtained by CHT by the following expression,

      () = 1 [() + ()] 1 [() ()] 2 2


      0 , 1.

      Using Eulers Theorem, ± = (±) = cos() ±

      sin(), we get,

      () = 1 () [cos (2) sin (2)] (11)

      That is, the real and imaginary parts of the Fourier


      transform are simply given by the even and odd parts of the Hartley transform, respectively.Conversely, for a real-valued function (), the Hartley transform is given from the Continuous Time Fourier transform's real and imaginary parts [8] as

      () = () () = [{((1 + ))}] (6)

      Due to the absence of the term in (), DHT is purely real.Next, consider the following expression,

      = () = cos() + sin().

      We know that cos = + and sin =

      2 2

      = cos + sin = (1) + (1+) (12)

      2 2

      Here, is the Fourier Transform operator, and [. ] depicts the real part of the entity. It is easy to extend the definition of

      Consider the following assumptions

      = , = , = (1) , = 1+


      Continuous Hartley Transform (CWT) and its inverse to



    2 2

    Then, we have the quadratic equation,

    2 + = 0 (14)

    Solving for , we get,

    = ± 1 (cos sin ) (15)

    2 2

    The Discrete Hartley Transform (DHT) of a discrete

    Considering only the negative sign and simplifying, we get,

    = = 1+ + 1 () (16)

    time signal, (), is defined as [9] 2 2

    () = [()] = 1 () (2)


    Also, 1 = 1 = = 1+ () + 1 () (17)


    2 2

    0 , 1.

    Now, the DFT of() is given by the well known equation

    where, (

    2) = (

    2) + (


    () = [()] = 1 ()





    Also, note that ( 2) = (2) (2) (8)

    Making use of Eq. (17) in the above equation, we get,

    () = 1 + () + 1 ()

    2 2

    The Inverse Discrete Hartley Transform (IDHT) of

    ( )

    () = 1 [ () + ()] + 1 [ () ()]

    is defined as, 2 2

    () = [()] = 1 () (2)


    Since () is complex, we can write the above equation as,


    () = () + () (19)

    0 , 1.

    Thus, the DFT coefficients

    in terms of DHT coefficients are

    Eqs. (7) and (9) are respectively called the Analysis and


    given by the following two equations,

    () = 1 [() + ()] (20)

    Synthesis Equations of DHT. Note that ( ) is the 2

    kernel of DHT [10]. It is a real term due to which the DHT is a

    real transform. The same kernel is made use of in the computation of the IDHT. Thus, unlike in the DFT, where there is a change of sign in the kernel in the synthesis equation, in the case of the DHT, there is no such thing which means that a single algorithm can be used to compute both the forward and backward transforms of DHT, which is a major

    () = 1 [() ()] (21)



    Subtracting Eq. (21) from Eq. (20), we get the DHT coefficients which are expressed in terms of DFT [12].

    () = () () (22)

    1. Relationship between the Synthesis Equations (Inverses) Of DFT and DHT

    If () = [ ()], then it is possible to

    advantage over the conventional DFT. This is the equivalent

    express the of

    () terms of the IDFT of ().

    to the case encountered in Continuous Hartley Transform.


    Hence, the DHT also is its own inverse. Thus, the DHT involves only real operations and hence, the computational load and memory requirement are considerably reduced by 50%, which stands out as a good merit over the conventional DFT. Efficient algorithms called the Fast Hartley Transforms (FHT) have been developed for computing the DHT and have been in use in many DSP applications [11].

    () = () () (23)

    Here, () and () are the real and imaginary parts of

    () respectively. Note that () = [()] and

    () = [()] respectively. DHT finds applications in a variety of domains such as Speech Processing, Image Processing, Biomedical Signal Processing, Data Compression, etc [13].


      Cepstrum was invented jointly by B. P. Bogert, M. J.

      R. Healy, and J. W. Tukey in 1963. The name "cepstrum" was derived by reversing the first four letters of "spectrum." Operations on cepstra are labelled quefrency analysis, or cepstral analysis. A cepstrum is the result of taking the Inverse Fourier Transform (IFT) of the logarithm of the estimated spectrum of a signal. There is a complex cepstrum, a real cepstrum, a power cepstrum, and phase cepstrum. The power cepstrum in particular finds applications in the analysis of human speech. This was the one that was originally invented [12].

      Speech is considered to be the output of a system, the vocal tract ()to an input () which is either a periodic impulse train due to the vibration of vocal chords or the white noise due to the air flow. Based on the type of the input, the speech signal is broadly classified as voiced for the periodic impulse train input and as unvoiced speech for the white noise input [13].

      Speech is composed of excitation source and vocal tract system components. The main theme of cepstral analysis of speech signal is to segregate its excitation and vocal tract components without any advanced knowledge about source and/or system. Voiced sounds are generated by exciting the time varying system characteristics with periodic impulse signals. The unvoiced sounds are generated by exciting the time varying system with a stochastic noise sequence. Hence, the resultant speech signal is the linear convolution of the corresponding excitation sequence and vocal tract filter characteristics. Let () and () be the excitation sequence and vocal tract filter sequences. Then, the speech signal is given by the one-dimensional linear convolution of ()with ().

      () = ()() (24)

      where, the symbol denotes discrete time linear convolution.Taking Discrete Time Fourier Transform (DTFT) on both sides of Eq. (24) yields the frequency spectrum,

      ( ) = ( ). () (25) Eq. (25) can also be written as

      () = (). () (26)

      The speech signal is then deconvolved into the

      components are represented by the slowly varying components concentrated near the lower cepstral region and excitation components are represented by the fast varying components at the higher cepstral region [15].

      () = [ln|S()|]

      () = [ln|()| + ln|()|] (30)

      Figure 1 Cepstral Computation using DHT Approach.

      Sometimes, variations occur in the lower cepstral region due to the vocal tract characteristics and the rapid varying strata of the cepstrum towards the upper cepstral region that is represented by the excitation characteristics of the short time speech segment. A method that is used for extracting vocal tract and excitation characteristics is liftering operation which is done in the temporal domain. Liftering operation is similar to filtering operation in the frequency domain where a desired cepstral region for analysis is selected by multiplying the whole cepstrum by a Boxcar apodization function at the desired position [12]. Liftering is a useful and meaningful process with the real cepstrum for obtaining an estimate of the log spectrum of either of the separated components. That is, we can apply a useful linear operation to the real cepstrum. The output of this process in the quefrency domain is a real cepstrum. However, if the objective is to return to the original time domain with an estimate of the separated signal, the real cepstrum will fail, because its "linearizing" operation is not invertible. To complete this task, we would need a phase preserving linearizing operation [14].



      DHT, since it is a real transform, is very much useful in the cepstral analysis of speech. If () is the discrete speech signal, firstly, its N-point DHT, () is

      excitation and vocal tract components in the temporal domain.

      Taking magnitudes on both sides of Eq. (26), we get,

      computed.Then, the DFT of () is calculated by


      |()| = |()|. |()| (27) Taking Naperian or Natural logarithms on both sides

      the relationship between DFT and DHT, by using the following relations.

      of Eq. (27), we get,

      () = ()+()



      log|()| = log{|()|. |()|} (28) () = ()()


      Or ln|()| = ln |()| + ln |()| (29) 2

      The logarithmic operation is used for transforming the magnitude speech spectrum where the excitation component and vocal tract component are multiplied, into a linear combination of these components [14]. The segregation is done by taking the Inverse Discrete Fourier Transform (IDFT) of ln|S()|, which yields the cepstral coefficients in the temporal domain. In the cepstral domain, the vocal tract

      () = () + () (33)

      We next express () in Steinmetz form or polar form as,

      () = |()|() (34)

      But from Digital Speech Processing theory, it is known that

      () = [{()}] = [{()}]

      () = [ln{|()|()}]

      () = [ln|()| + ()] = [()] (35)

      DHT is then computed by using the relation, () =

      () (), where, () = ln|()| and () = ().

      () = ln|()| () (36) Fially, the cepstral coefficients (), are computed by taking the IDHT of [15] (), i.e.,

      () = [ ()] (37)


Figure 2 Cepstrum of voiced speech segment using DHT.

Figure 3 Cepstrum of unvoiced speech segment using DHT.

The cepstra of voiced and unvoiced speech segments using DHT approach are shown in Figures 2 and 3 respectively.The main theme of cepstral analysis of speech signal is to segregate its excitation and vocal tract components. The input speech signal is converted into short- term segments of duration 15 20 msec. The frame size is maintained to 20 msec and then each frame is multiplied by Hamming Window. Then, the cepstral representation of short-

term speech is computed by finding the IDFT of the log magnitude spectrum. In this work, instead of using the IDFT directly, we have used IDHT to compute IDFT. This is done by using IDHT rather than using IDFT directly. Figure 2 shows a 20 msec voiced frame and its cepstrum in the temporal domain. It can be clearly seen that the vocal tract components are concentrated in the lower cepstral region and the excitation components are concentrated in the higher cepstral region. Figure 2 also shows() which is the voiced frame considered and () which is the windowed frame. Also shown is the cepstralcoefficients (). Note that () is symmetrical in the cepstral domain.


The DHT is two to three times faster than DFT because of no complex arithmetic being involved. All the four quadrants of the Hartley domain data must be used. In the computation of DHT and its inverse, only one quadrant of sines and cosines need to be calculated due to symmetry. Also, the multiplication in DHT is real but complex in the case of DFT. The DHT butterfly loop requires less memory space than the DFT because all of the data are stored in arrays of real numbers. The DFT butterfly loop on the other hand uses complex arrays, which require twice the memory space of a real array. In the DFT computation, there is one multiplication and two additions of complex numbers, which adds up to four multiplications and six additions of floating point numbers foreach iteration. Also, the DHT has our multiplications and six additions of floating point numbers for each iteration. The butterfly loop of the DHT loops from 2 to , while the


butterfly loop of the DFT loops from 1 to . Since the DHT

loops half the number of times as the DFT, the DHT algorithm has two multiplications and three additions for every four multiplications and six additions of the DFT. Also, the DHT has no multiplications for the zero and Nyquist frequencies, which is a major advantage of DHT over DFT. DHT needs less memory to store numbers than the DFT because DHT does not use complex numbers. The results of the DHT can be stored in the same memory space as the original data set, thus eliminating the need to allocate more disk space. Also converting from Hartley domain to Fourier domain and vice versa is a direct and simple procedure.

In this work carried out by the authors, cepstral analysis of voiced speech segment and unvoiced speech segment using both DFT and DHT approaches were considered. Usage of direct DFT approach required 1171.5085 complex additions and 585.75425 complex multiplications. Usage of DHT approach required 292.9 multiplications and 585.823 additions, which indicate that DHT has saved the computation load by 50%. Thus, by using the DHT approach, the computation, storage time and the memory requirement are saved by 50% in the cepstral analysis of speech. This is a good improvement and the DHT based method is good when compared to the traditional DFT approach.


The present work is based on the concept of the basic Discrete Hartley Transform. Other variants of Hartley Transform and its amalgamation with Wavelet Transform need to be worked within the nature of quasiperiodicity and the nonstationarity of the speech signal. The behaviour of the speech signal spectrum to such robust algorithms needs to be analysed.


M.B.N likes to thank Dr. Mohan Manghnani, Chairman, New Horizon Education Institution, for providing the necessary infrastructure and encouraging us to carry out this research work. M.B.N is grateful to Dr. Manjunatha, Principal, New Horizon College of Engineering, and Dr. Sanjay Jain, Professor & HOD, ECE Department, New Horizon College of Engineering for their constant support to achieve his endeavour. Dr. P.S.S likes to thank the management of Don Bosco Institute of Technology for their wholehearted support in carrying out this work.


  1. Alan V. Oppenheim and Ronald W. Schafer, Discrete Time Signal Processing, 3rd Edition, Pearson India Inc., New Delhi, India, 2014.

  2. John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing: Principles, Algorithms and Applications, 4th Edition, Prentice-Hall of India Private Limited, New Delhi, India, 2007.

  3. Agostino Abbate, Casimer M. DeCusatis and Pankaj K. Das, Wavelets and Subbands: Fundamentals and Applications, Birkhauser, Paris, France, 2003.

  4. Martin Vetterli and Jelena Kovacevic, Wavelets and Subband Coding, 1st Edition, Prentice-Hall, New Jersey, U.S.A., 1995.

  5. Sanjit K. Mitra, Digital Signal Processing A Computer Based Approach, 4th Edition, Tata McGraw Hill India Private Limited, New Delhi, India, 2014.

  6. A. Jensen and A. la CourHarbo, Ripples in Mathematics The Discrete Wavelet Transform, 1st Edition, Springer Verlag India Private Limited, New Delhi, India, 2003.

  7. John R. Deller, Jr., John H.L. Hansen, and John G. Proakis, Discrete- Time Processing of Speech Signals, 1st Edition, IEEE Press, Wiley India Private Limited, New Delhi, India, 2014.

  8. Lawrence R. Rabiner, and Ronald W. Schafer, Digital Processing of Speech Signals, 1st Edition, Pearson India Inc., New Delhi, India, 2005.

  9. Douglas O'Shaughnessy, Speech Communications: Human and Machine, 2nd Edition, Cambridge University Press India Private Limited, New Delhi, India, 2007.

  10. Lawrence R. Rabiner, and Biing-Hwang Juang, Fundamentals of Speech Recognition, 1st Edition, Pearson India Inc., New Delhi, 2005.

  11. Thomas F. Quatieri, Discrete-Time Speech Signal Processing, 1st Edition, Pearson India Inc., New Delhi, 2006.

  12. A.M. Kondoz, Digital Speech, 2nd Edition, Wiley India Private Limited, New Delhi, India, 2009.

  13. Ben Gold and Nelson Mogran, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 1st Edition, Wiley India Private Limited, New Delhi, India, 2009.

  14. Shaila D. Apte, Speech and Audio Processing, 1st Edition, Wiley India Private Limited, New Delhi, India, 2014.

  15. Chris Rowden (Editor), Speech Processing. 1st Edition, The Essex Series in Telecommunications & Information Systems, McGraw-Hill Publishing Company, Berkshire, England, 1992.

Leave a Reply

Your email address will not be published. Required fields are marked *