Automatic Speech Recognition using Recurrent Neural Network

Sruthi Vandhana T; Srivibhushanaa S; Sidharth K; Sanoj C S

doi:10.17577/IJERTV9IS080343

Volume 09, Issue 08 (August 2020)

Automatic Speech Recognition using Recurrent Neural Network

DOI : 10.17577/IJERTV9IS080343

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 1,998
Authors : Sruthi Vandhana T , Srivibhushanaa S , Sidharth K , Sanoj C S
Paper ID : IJERTV9IS080343
Volume & Issue : Volume 09, Issue 08 (August 2020)
Published (First Online): 03-09-2020
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Automatic Speech Recognition using Recurrent Neural Network

Sruthi Vandhana T Undergraduate, Department of CSE SVCE

Chennai, Tamil Nadu, India

Srivibhushanaa S

Undergraduate, Department of CSE SVCE

Chennai, Tamil Nadu, India

Sidharth K

Undergraduate, Department of CSE SVCE

Chennai, Tamil Nadu, India

Sanoj C S Assistant Professor Department of CSE SVCE

Chennai, Tamil Nadu, India

Abstract Speech recognition is one of the major developing field in computer science, more people now focusing on speech recognition as like others. Automatic Speech recognition or speech to text conversion is nothing but converting the given input speech of the user into the text or query format depending upon the usage of where the speech action takes place. Nowadays more over every people is using this technology instead of typing or using buttons to give a specific command. Speech to text is an very intriguing task as the sounds of different word hear similar and same word has different sounds depends on the people. So its hard to do the speech to any form of conversion. We have proposed a system which is a simple query processing system for the railways where the input is speech and an output is a text being displayed. The process is converting the input speech into the query for processing the railway system queries.

KeywordsSpeech Recognition, Railways, Neural Network

INTRODUCTION

Speech is humans most efficient way of communication nowadays. There is always some problem occurs with the communication with the computers and systems, yet speech recognition is one of the way to resolve this problem. But this is always been a most challenging tasks to achieve. Speech recognition is an interdisciplinary subfield of computer science and Natural language Processing that develops methodologies and technologies to enable the recognition and translation of spoken language into text by computers.With the help of speech recognition technology, it easy for people to control devices from phone to car and access applications by speaking. It can also be useful in recording the users ID, name and reason of call. It also delivers a great experience of self-service system rate.

Our project is based on an Interactive Voice Response system for Railway ticket reservation and related queries.This system comes with 2 phases of development. Phase-1 includes speech to spectogram conversion phase 2 which is converting them into a text format and providing the response. In this project we are implementing the Phase-1 of the Interactive Voice Response System. The conversion of speech to text is a challenging task. Even though various technologies have been developed, the level of accuracy achieved is low. Thats the reason, we need a better training model to achieve the same.

This training model can be achieved through deep learning using the Recurrent neural network which is used for sequential data analysis. Training and testing by this model will help us attain a best accuracy.

1.1 Scope

Despite the complexity of Speech recognition, it always plays an inevitable role in many fields. It helps many people easily access to any contents they desire. Speech recognition is a thriving domain with many important applications. It's easy to predict that speech recognition research will continue as well as important practical applications will be created. Even though Speech recognition is major thriving field, getting the accuracy is the major issue in this field. Research and development is still in processing to get the most accurate machine possible. And it's not about AI because it's obvious that most of the speech recognition issues are not caused by the lack of understanding but rather a lack of good algorithms. Noises, accents and so on are just purely technical problems which will be eventually solved. Research finds that noisy environment a major trouble in the speech recognition with a practical goal to build an application that works. At the same time our knowledge about speech fundamentally

improves from day to day and the goals are improving more.
LITERATURE SURVEY
Therese S and Lingam C Speaker based Language Independent Isolated Speech Recognition System,the authors have stated that speech has evolved as a primary form of communication between humans. The advent of digital technology,gave us highly versatile digital processors with high speed, low cost and high power which enable researchers to transform the analog speech signals in to digital speech signals that can be scientifically studied. Achieving higher recognition accuracy, low word error rate and addressing the issues of sources of variability are the major considerations for developing an efficient Automatic Speech Recognition system. In speech recognition, feature extraction requires much attention because recognition performance depends heavily on this phase.

After the survey of these papers, our idea of using Long Short Term Memory and Connectionist Temporal Classification has been proposed which overcomes the disadvantages by other existing models. It includes the intention of remembering long sentences and having a prolonged memory for better prediction of content and using a shared decoder and encoder for mapping characters.
PROPOSED WORK

The way we proposed our system includes the Phoneme being extracted from the speech data using Mel Frequency Cepstral Coefficient MFCC which is then fed into the training model of Long Short Term Memory LSTM and Connectionist Temporal Classification CTC. This is followed by validation by calculating the CTC loss. The resultant obtained from this system is the resultant recognized text for the given speech input. And we are trying to provide the accuracy of the result.

Fig no: 3.1 Architecture diagram of speech recognition
WORK SETUP

Major steps involved in our proposed work are below:
CONCLUSION AND FUTURE WORK

In this paper, we have discussed briefly about converting speech to text. By referencing many papers, the method phenome segmentation preprocessing and feature extraction is done but the accuracy of the output gained is found to be lower than expected. So, we have incorporated an hybrid methodology which includes Long Short Term Memory(LSTM) and Connectionist Temporal Classification(CTC) inorder to produce high accuracy. We have used Mel Frequency Coefficient Cepstral(MFCC) and Librosa to extract features from the speech signals. The architecture of our proposed system includes the features being extracted from the speech data which it then fed into the training model of LSTM and CTC. This is followed by validation by calculating the CTC loss. The CTC loss is also calculated to direct the input either to be trained or to be validated repeatedly. The Label Error Rate(LER) and Word Error Rate(WER) are also computed to calculate the accuracy of the model. The trained model is being tested with input speech.

Our future work concentrate on increasing the accuracy of the model and also to complete the system as a whole for the railway system we took dataset for. Our model works only for trained users and the future work is to extent for n users and develop an interactive module. Along with this, we like to provide the system with authenticating the user by using their voice ID.

REFERNCES

K. Geetha and Dr.R. Vadivel, Phoneme Segmentation of Tamil Speech Signals Using Spectral Transition Measure , Oriental journal of Computer Science and technology, Vol. 10,

March 2017, pp. 114-119, ISSN:0974-6471
A. Graves, A. Mohamed and G. Hinton, Speech Recognition with Deep Recurrent Neural Network, International Conference on Acoustics, Speech, and Signal Processing, May 2013, ISSN: 2379-190X

[3]	D. Dhanashri, S.B. Dhonde, Speech Recognition Using Neural Networks: A Review, International Journal of Multidisciplinary		Computer Science and Information Technologies, Vol. 6 (3), 2015, pp. 3206-3209,ISSN: 0975-9646
	Research and Development, Volume: 2, Issue: 6, June 2015,	[8]	I. Majeed, H. Husain, S. Abdul Samad, T.F. Idbeaa, Mel
[4]	pp. 226-229, ISSN:2349-4182 C. Ittichaichareon, S. Suksri and T. Yingthawornsuk , Speech		Frequency Cepstral Coefficients(MFCC) Feature Extraction Enhancement in theApplication of Speech Recognition:A
	Recognition using MFCC, International Conference on		Comparison study, Journal of Theoretical and Applied
[5]	Computer Graphics, Simulation and Modeling, July 28- 29, 2012 X. Liu, Deep Convolutional and LSTM Neural Networks for		Information Technology, Vol.79. No.1, September 2015, pp. 38-56, ISSN: 1992-8645
	Acoustic Modelling in Automatic Speech Recognition, Pearson	[9]	K. Lekshmi, Dr.E. Sherly, Automatic Speech Recognition using
[6]	Education Inc. Vol. 8 (6), 2011 S. Kim, T. Hori, S. Watanabe, Joint CTC-Attention based end-to-		different Neural Network Architectures A Survey, International Journal of Computer Science and Information Technologies, Vol.
	end speech recognition using multi-task learning, 31 Jan 2017,		7 (6), 2016, pp.2422-2427, ISSN: 0975- 9646
[7]	arXiv:1609.06773v2 A. Halageri, A. Bidappa, C. Arjun, M. Sarathy, S.Sultana , Speech	[10] /td>	I. Medennikov and A. Bulusheva, LSTM-based Language Models for Spontaneous Speech Recognition, International Conference on
	Recognition using Deep Learning, International Journal of		Speech and Computer, Vol.9811, SPECOM 2016, pp. 469-475,
			ISBN: 978-3-319-43958-7

Automatic Speech Recognition using Recurrent Neural Network

Leave a Reply