Analysis of Flight Fare Detection using Machine Learning

Arya Karambelkar; Param Mamania; Vaibhav Chunekar

doi:10.17577/IJERTV11IS110165

Volume 11, Issue 11 (November 2022)

Analysis of Flight Fare Detection using Machine Learning

DOI : 10.17577/IJERTV11IS110165

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 562
Authors : Arya Karambelkar , Param Mamania , Vaibhav Chunekar
Paper ID : IJERTV11IS110165
Volume & Issue : Volume 11, Issue 11 (November 2022)
Published (First Online): 09-12-2022
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Analysis of Flight Fare Detection using Machine Learning

Arya Karambelkar

Information Technology

K. J. Somaiya College of Engineering

Param Mamania

Information Technology

K. J. Somaiya College of Engineering

Mr. Vaibhav Chunekar

Information Technology

J. Somaiya College of Engineering

AbstractRecent years have seen a dramatic rise in air travel for a variety of reasons, including economic expansion and the emergence of low-cost airlines. Fares are very variable because Indian airlines employ a revenue management system to respond to market conditions in real time. The cost of a flight ticket varies based on duration of the flight, destination, route, arrival time, departure time as well as certain events like holidays or vacations. The main aim is to predict the correct flight fare using Machine Learning Techniques and Auto ML. Additionally, the properties of the provided dataset have been examined using a variety of visualisation techniques, including scatter plots, distribution plots, catplots, and ggplots. The results show that the Random Forest Regressor and Randomized Search CV techniques give highest accuracy with the fare prediction dataset.

Keywords Regression, Machine Learning, Deep Learning, Auto ML, Auto SK Learn, Flight Fare Detection
1. INTRODUCTION
  
  In today's fast-paced world, everyone expects the fastest way of approach to any problem. This scenario arose in the sphere of transportation, where it is regarded as the most essential platform for the development of various enterprises such as trade, finance, IT personnel, tourism, and so on. In this scenario, it is of the utmost importance to provide the quickest and safest means of transportation, and the solution to this problem is the transportation provided by airlines. Airline transport is the backbone of the tourism industry. Air travel is the most popular mode of international transportation, and India receives a sizable influx of international visitors every year. This results in the creation of jobs, both in India and in the countries that are visited by tourists.
  
  It is in the best interest of airlines to maximise their profits, and there are two main types of customers they serve: leisure travellers, who are more price-conscious because they're paying the cost themselves, and business travellers, who are less price-conscious because they're not paying the cost themselves and often make their travel decisions much closer to the time of travel. Airline seats are a very perishable good; once a plane takes off, the passenger's ability to generate revenue for the airline has ended. If a company isn't careful with its price, they can have to take a trip with empty seats, or they might have a full plane yet lose money since they couldn't charge more. This means that airfares can range greatly since different customers are ready to pay varying sums to meet their diverse requirements. Airline companies use this reality to their advantage.
  
  Distance, flight time, peak season, number of stops, and destination are just a few of the variables that can drive up or down the cost of an airline ticket. The cost of the flight can be lowered to some amount by adjusting the aforementioned variables. In this paper, we use Machine learning approaches, including Auto ML, to the problem of estimating the cost of airline tickets.
2. LITERARY REVIEW
  
  Recent study on the topic of flight fare prediction has aimed at developing data-driven approaches for forecasting future flight prices and their trends.
  
  Ratnakanth G [1] utilised Deep Neural Network that functions same as the human brain. The data is preprocessed, and the Min-Max normalisation approach was used to change the values that are already present in the dataset in order to obtain excellent performance. Randomised Search CV algorithm is used for hyperparameter tuning of the Deep Learning algorithm. Finally, the dataset was visualised using univariate analysis, bivariate analysis, and correlative analysis for all of the features in the dataset.
  1. Raja Subramanian et al [2] collected data from MakeMyTrip, Data World and Kaggle to build Machine Learning models. The paper uses KNN Regression, Linear Regression, Lasso Regression, Ridge Regression, and Random Forest Regression. The models have been implemented using the sci-kit learn python library. The research found out the Random Forest Regressor algorithm works the best with high accuracy.
  2. Naveen Prasath et al [4] researched and found out the factors that impact the flight fare fluctuations. The paper systematically demonstrates the K-Nearest Neighbours technique to estimate the prices at a particular instance using Machine Learning techniques. After doing a comparison of the highest and lowest levels of airfare for specific days, weekends, and times of the day, such as morning, evening, and night, regression analysis was carried out to predict the flight prices.
    
    Zhichao Zhao et al [5] carried out flight fare prediction in China based on the multi-attribute dual-stage attention (MADA) mechanism. In order to encode and decode the input multi-dimensional fare-related characteristics, a Seq2Seq neural network has been implemented. In addition, effective information variables are extracted by the utilisation of dual- stage attention processes. For the purpose of determining the pattern of fluctuating fares, the mean square error loss function is used to train the real data.
3. METHODOLOGY
  Automated Machine Learning will automate all the machine learning model building and hypertuning parts. Auto SK Learn is used for this purpose in this project. Auto SK Learn only automates the model building part. The preprocessing part needs to be done manually. If raw data is passed in the model without any preprocessing, it will cause the model to fail. This is because there will be different types of data available – unstructured, numerical, object, alphabetical, etc. After performing Auto ML, the top perfectly fitted models are found out. In this case, gradient boosting and random forest were the top models. Figure 5 displays the scatter plot obtained of the predicted and testing data. The cluster in the scatter plot means that the model has high accuracy and the prediction is performed successfully.
  
  Figure 6. Scatter Plot
4. RESULTS AND INFERENCES
  
  The Random Forest Regressor algorithm has a mean absolute error of 1531.75, mean squared error of 6409856.80 and root mean square error of 2531.77. Coefficient of determination, also called as RÂ² score is used to evaluate the performance of a linear regression model. It is the amount of the variation in the output dependent attribute which is predictable from the input independent variable(s). A higher value of RÂ² is desirable as it indicates better results. The model built gives RÂ² value of 0.61.
  
  One paper utilises Deep Learning techniques such as Deep neural network [1]. The paper suggests that the accuracy obtained by using Deep learning is better than the accuracy obtained using Machine learning models. One of the most common limitations of this project is obtaining information because data is acquired from websites that sell flight tickets.
  
  The paper titled, "Airline Fare Prediction Using Machine Learning Algorithms" predicted a root mean square error of
  
  33.36 when Random Forest Regressor algorithm is used [2]. The model used in the paper is hypertuned so that the error is reduced.
5. CONCLUSION

The research paper depicts how using Automated Machine Learning saves the time of model building but highlights that the data preprocessing part must be done manually and that it cannot be automated. The prediction of the flight rate was carried

out successfully using one of the most widely used algorithms – Random Forest Regressor. The accuracy achieved is very high which is seen from the distribution plot and scatter plot obtained from the training data and testing data. The data visualisation techniques have been applied to illustrate the ideology behind the attributes of the dataset. To acquire more reliable findings, more accurate data with greater features might be employed.

REFERENCES

[1] G. Ratnakanth, "Prediction of Flight Fare using Deep Learning Techniques," 2022 International Conference on Computing, Communication and Power Technology (IC3P), 2022, pp. 308-313, doi: 10.1109/IC3P52835.2022.00071.

[2] R. R. Subramanian, M. S. Murali, B. Deepak, P. Deepak, H. N. Reddy and R. R. Sudharsan, "Airline Fare Prediction Using Machine Learning Algorithms," 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022, pp. 877-884, doi: 10.1109/ICSSIT53264.2022.9716563.

[3] C. Chariton and Min-Hyung Choi, "Enhancing usability of flight and fare search functions for airline and travel Web sites," International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., 2004, pp. 320-325 Vol.1, doi: 10.1109/ITCC.2004.1286473.

[4] S. N. Prasath, S. Kumar M and S. Eliyas, "A Prediction of Flight Fare Using K-Nearest Neighbors," 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), 2022, pp. 1347-1351, doi: 10.1109/ICACITE53722.2022.9823876.

[5] Zhao, Z., You, J., Gan, G., Li, X. & Ding, J. 222, "Civil airline fare prediction with a multi-attribute dual-stage attention mechanism", Applied Intelligence, vol. 52, no. 5, pp. 5047-5062.

[6] Malighetti, P., Paleari, S. & Redondi, R. 2009, "Pricing strategies of low- cost airlines: The Ryanair case study", Journal of Air Transport Management, vol. 15, no. 4, pp. 195-203.

[7] Tziridis, K., Kalampokas, T., Papakostas, G.A. & Diamantaras, K.I. 2017, "Airfare prices prediction using machine learning techniques'', 25th European Signal Processing Conference, EUSIPCO 2017, pp. 1036.

[8] Groves, W. & Gini, M. 2013, "An agent for optimizing airline ticket purchasing", 12th International Conference on Autonomous Agents and Multiagent Systems 2013, AAMAS 2013, pp. 1341.

[9] Makridakis, S., Spiliotis, E. & Assimakopoulos, V. 2018, "Statistical and Machine Learning forecasting methods: Concerns and ways forward", PLoS ONE, vol. 13, no. 3.

[10] Lu, M., Zhang, Y. & Lu, C. 2021, "Approach for Dynamic Flight Pricing Based on Strategy Learning", Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, vol. 43, no. 4, pp. 1022-1028.

[11] Joshi, N., Singh, G., Kumar, S., Jain, R. & Nagrath, P. 2020, Airline Prices Analysis and Prediction Using Decision Tree Regressor.

[12] Boruah, A., Baruah, K., Das, B., Das, M.J., Gohain, N.B. (2019). A Bayesian Approach for Flight Fare Prediction Based on Kalman Filter. In: Panigrahi, C., Pujari, A., Misra, S., Pati, B., Li, KC. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 714. Springer, Singapore. https://doi.org/10.1007/978-981-13-0224-4_18

[13] https://www.python.org

[14] https://numpy.org

[15] https://pandas.pydata.org

[16] https://seaborn.pydata.org

[17] https://seaborn.pydata.org/generated/seaborn.catplot.html

[18] https://matplotlib.org

[19] https://en.wikipedia.org/wiki/Machine_learning

[20] https://scikit-learn.org/stable/

[21] https://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRe gressor.html

[22] https://scikit- learn.org/stable/modules/generated/sklearn.model_selection.Randomiz edSearchCV.html

[23] https://colab.research.google.com

[24] https://pandas.pydata.org/pandas- docs/stable/reference/api/pandas.DataFrame.astype.html

[25] https://scikit- learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncode r.html