DOI : 10.17577/IJERTCONV14IS010030- Open Access

- Authors : Karthik, Mr. Sunith Kumar T
- Paper ID : IJERTCONV14IS010030
- Volume & Issue : Volume 14, Issue 01, Techprints 9.0
- Published (First Online) : 01-03-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Enchancing Tourist Personalization through Hybrid Clustering and Classification Techniques
Karthik
Department of Computer Application St Joseph Engineering College
Vamanjoor, Mangaluru, Karnataka 575028
Mr. Sunith Kumar T
Asst. Professor Department of Computer Application
St Joseph Engineering College
Vamanjoor, Mangaluru, Karnataka 575028
Abstract – The demand for personalized travel experiences is rising as the travel industry evolves. Traditional recommendation systems struggle to understand a wide variety of user preferences because they often rely on general filters. To provide tailored travel suggestions, this paper proposes a hybrid recommendation model that uses both supervised classification methods, like KNN and Decision Trees, and unsupervised clustering techniques, such as K-Means. This approach is improved by using TF-IDF and Count Vectorization methods. We collect real-world data through web scraping, which is then processed for machine learning purposes. Experiments show that this hybrid model effectively personalizes tourism, significantly enhancing the relevance and accuracy of travel recommendations compared to standalone models.
Index Terms: Tourism Recommendation, Machine Learning, Clustering, Classification, TF-IDF, Count Vectorizer, Personalization, AI, Tourist Segmentation.
-
INTRODUCTION
In the digital era of traveling, travelers are saturated with information with endless data points via online travel sites, online reviews, and social media platforms. While this provides travelers the benefits of having more data in their decision-making processes, the outcomes have created a situation of paradox of choice where users have difficulty filtering travel destinations that align with each user's preferences. The prior tourism recommendation systems have traditionally leveraged collaborative filtering, rigid recommendations (rules), or dense user-item interactions but these systems cannot provide the actionable experience where decisions are made and insights generated. These models fail to comprehend the expressive intention of users; in many cases also cannot understand the unstructured nature or textual data derived from reviews, blogs, and comments. In response to the limitations described we presenta hybrid intelligent recommendation framework that adopts unsupervised learning and supervised classification for recommender systems. We apply K-Means clustering to group users into behavioral segments based on travel interests and user textual authorship when writing reviews or blogs. Classifications are generated for each cluster so that models such as K-NN (K-Nearest Neighbors) and Decision Tree models can predict and rank destinations based on each
user's preferences. In order to make sense of textual data and process unstructured data we employed vectorization methods including Count Vectorizer for frequency based delivery and TF-IDF (Term Frequency-Inverse Document Frequency) – a vectorization and classification approach.
-
RELATED WORK
The proposed tourism system has evolved quite significantly over the years. The initial version was mainly based on joint filtering, content-based filtering, or rules-based regulations. Classical models utilize user information to deliver personalization, but not in response to shifting tastes of contemporary travelers. They depend mainly on structured data and systems within digital notes devoid of precious ideas from unstructured resources like user critics, travel blogs, and narratives. Recently, the recommendation system has shifted to one which is more context-savvy. Semantic networks and knowledge graphics are employed to bridge different facets like destinations, travelers, activities, and the like. This advancement enables these systems to rise above the mere comparison of keywords to comprehend the meaning and intent behind what people are searching for. Researchers are also experimenting with user grouping approaches. Particularly, they employ uncontrolled methods of teaching like K-mediums and hierarchical groups.Segmenting users into main groups like adventure tourism fans and cultural intelligence fans helps tourism systems make more specific suggestions. It also uses methods such as analyzing main components (ACPs) and silhouette assessment to evaluate the quality of these clusters and how the grouped associations are mixed together.
-
Online Tourism Information System
With the growth of online tourism systems, tourists have different ways to find travel information and decide how to use it. This trend has sparked interest in new research and led to the fact that many scientists are studying various information areas regarding online kaurism orientation. Previous research has shown concerns about information overload and the possibility of developing intellectual filtration systems. New information is always created using content from users on
online tourist platforms, such as social networks, criticism, and travel blogs. This prepared the fields for the optimal system of recommending content and actions by using structured data, using unstructured data, potentially assessing tourist preparations and accepting recommendations according to the data users.
-
Semantic Analysis Travel Recommendation Semantic analysis is becoming more tied to tourism-related recommendation systems. Research indicates that tourists who receive recommendations based on a semantic analysis of travel concepts are more likely to get better suggestions. Semantic networks show that analyzing the relationships between tourism marketers and tourists can help us understand differences in mental models in specific tourism contexts. Additionally, it can provide recommendations or guidance for designing information that tourists actually need.
-
PROBLEM STATEMENT
Despite the rapid growth in travel technology, many tourism recommendation systems still depend on traditional models. In general, these models are not able to provide a unique and contextualized recommendation for the user. Generally, these systems are rule-based or collaborative filtering, where the assumption is that every user behaves in a similar way. However, travelers' preferences differ dramatically and can change and adapt in real time while they are travelling. Travelers are influenced by trip purpose, mood, time of year, money and individual or cultural background. A critical weakness of existing systems is the lack of recognition and understanding of individual users' behaviors. In the absence of effective segmentation to be able to offer personalized recommendations, these recommendations ultimately lack sufficient granularity and contextual meaning, leading to user dissatisfaction. Most travel-related data is also unstructured and textual, including user reviews, user feedback, travel blogs, and destination descriptions.This makes it tough for traditional algorithms to draw meaningful insights, introducing challenges in both data preparation and understanding.
To tackle these issues, our research suggests a hybrid AI-
driven approach. This approach combines clustering and classification techniques to improve the personalization of travel recommendations. Specifically, the solution aims to:
-
Group travelers into behavior-based clusters using unsupervised learning methods like K-Means. This will help group users with similar interests, travel styles, and decision-making patterns.
-
Use machine learning classifiers such as K-Nearest Neighbors and Decision Trees to predict and recommend destinations that best match each clusters references.
-
Utilize rich textual data through text vectorization methods like TF-IDF and Count Vectorizer to better
understand user intent and create contextually relevant suggestions.
-
-
METHODOLOGY
To develop a personalized travel recommendation system, we took a systematic approach that included data gathering, preprocessing, clustering, classification, and system design. Heres a breakdown of each phase:
-
Data Collection and Preprocessing
We started by gathering travel-related text data from online platforms using web scraping tools. The dataset included user reviews, destination details, activity descriptions, and traveler feedback. This raw data was unstructured and needed several preprocessing steps to make it ready for machine learning:
-
Data Cleaning: This step involved removing noise such as punctuation, special characters, and numbers. We also converted all text to lowercase to keep things consistent.
-
Vectorization:
-
Count Vectorizer transformed the cleaned text into a numerical bag-of-words format, capturing word frequency across the dataset.
-
TF-IDF (Term Frequency-Inverse Document Frequency) assigned more weight to important but less frequent terms, which helped the system better capture context and meaning from the text.
-
-
Clustering
After moving forward, we used K-Means Clustering to classify users with similar travel interests and find similarity clusters that the system can find, without any defined labels. Clustering allowed the system to discover hidden patterns among users. Using the Elbow Method, we chose the number of clusters to facilitate optimal segmentation. Each cluster identified was a user group with common preference patterns, e.g. adventure seekers, spiritual travelers, and history lovers. This step was a critical milestone in making suggestions for destinations for the right audience types.
-
Classification
After clustering, we employed classification algorithms to suggest destinations appropriate for each user cluster. We applied K-Nearest Neighbors (KNN) and Decision Tree classifiers to train the model. The classification input features were TF-IDF values, location types (such as beach, mountain, city), and activity-related keywords. This ensured that the system was in a position to make precise predictions about which destinations would be of interest to a user based on their cluster membership.
-
System Architecture Overview
Our recommendation system is structured into three main modules:
-
Input Module: Collects user preferences and travel history, if available.
-
Processing Engine: Performs clustering to identify the user group and classification to determine suitable destinations.
-
Output Module: Displays personalized travel recommendations that match the users profile.
This modular design allows the system to be scalable, easy to update, and ready for future improvements like sentiment analysis or real-time filtering.
-
-
RESULTS
The evaluation of the hybrid recommendation model used common classification metrics like accuracy and F1-score. The K-Nearest Neighbors (KNN) model alone achieved an accuracy of 78.2% and an F1-score of 76.4%. However, after combining it with K-Means clustering, we evaluated the performance of our hybrid recommendation KNN classification. The accuracy increased to 85.6%, and the F1- score rose to 83.3%. These results clearly show that the hybrid approach provides a noticeable improvement in the quality of recommendations compared to using classification alone.
Figure 1: Comparison of Model perfprmance
Comparison of Model Performance (KNN vs Hybrid Model) As seen in the graph:
The KNN-only model achieved an accuracy of 78.2% and an F1-score of 76.4%.
The Hybrid Model, which includes clustering before classification, improved accuracy to 85.6% and the F1-score to 83.3%.
Figure 2: Silhouette Score vs Number of Clusters
Silhouette Score vs Number of Clusters
This line graph shows how the Silhouette Score changes with the number of clusters.
-
The highest silhouette score (0.78) occurs at 5 clusters, indicating this is the most optimal number for user segmentation.
-
-
DISCUSSION
The use of clustering and classification together in a single framework is extremely useful for producing personalized travel recommendations. The model will account for both user behavioral patterns generally and user preferences individually. The hybrid model is a more effective way to understand user intent than either method alone.
The Count Vectorizer is well-suited for shorter user inputs and common terms found in fast-consumed reviews or keyword searches. TF-IDF vectorization better represents the popularity for longer reviews and the importance of uncommon terms, improving the models ability to extract meaning from textual information.
The proposed systems ability to adapt to real-time information is a major advantage. Automated web scraping keeps the model up to date with new travel trends, user-generated content, and seasonal shifts in consumer preferences, providing more timely recommendations.
Performance assessments and analysis demonstrated a clear performance advantage for the hybrid model over traditional separate classifiers. The clustering module allowed for new context to be included in the classification, as it groups travelers with similar user-based behaviours sharing interests like adventure, luxury, culture, etc. These groups provide further context for the classification algorithms to better recognize which destinations to suggest to various types of travelers. Most significantly, there were much needed enhancements to travel personalization. By equally considering the users group memberships and specific content features to create the destination model, the proposed system performed improved personalization.
-
CONCLUSION
This study offers strong evidence that combining clustering and classification techniques within a single framework greatly improves the personalization and effectiveness of tourism recommendation systems. By segmenting users based on their behaviors and preferences with clustering algorithms, and then predicting suitable destinations through classification models, this hybrid approach shows a better understanding of individual traveler profiles. Unlike traditional one-size-fits-all recommendation engines, this model adjusts to each user by learning from both past data and real-time inputs. This two- layer approach enables the system to pick up on macro trends as well as individual details, so that suggestions can be more personalized and relevant. A system design that improves not only the performance measures like accuracy and F1-score, but also addresses the human element of travel decision- making by aligning recommendations with personal values, interests, and experiences. The system then integrates more meaningful text data collected through tools like TF-IDF and
Count Vectorizer, which makes sure that we include more nuanced contextual representations within each recommendation. This recommender system can also improve the overall personalization of the recommendations especially if interpreting unstructured content such as user-generated reviews and destination stories. The model consists of a flexible and scalable structure that is appealing for disclosure of modern digital tourism platforms in particular. It is adaptable to customer trends, utilizes new sources of data and can even scale up into other formats such as voice interfaces or conversational chatbot assistants.
From a practical point of view, this study develops a foundation upo which we can build travel recommendation systems that are intelligent, adaptable, and centered on the user. In time, widespread deployment of these types of models will foster enhanced levels of user engagement, satisfaction, and ease in the process of planning a trip. In the end, this hybrid AI and human model could potentially alter how our society explores the world by providing not only destinations but experiences that are designed to
-
FUTURE SCOPE
-
Conversational AI and Chatbot Integration: If the recommendation system combined with an interactive chatbot interface, it could change the entire user experience. Travelers could receive personalized suggestions through natural language conversations. This would make the planning process feel more natural and engaging. Users could express and clarify their needs or preferences in real-time, which would let the system be more flexible and responsive to their needs.
-
Real-Time and Context-Aware Recommendations: Future versions of the system can use real-time data inputs like the users current location, weather conditions, time of day, and local events. This would allow the platform to adjust its suggestions dynamically, offering personalized and context-aware experiences that match real-world conditions.
-
Dynamic Learning and Feedback Loops: By adding user feedback mechanisms, such as likes, rejections, or ratings, the system could continuously retrain and improve its prediction models. This would ensure that the recommendation engine evolves over time, adjusting to changing user behaviors, new travel trends, and emerging destinations.
-
Cross-Platform and Multi-Modal Integration: Future versions could also work across mobile apps, web portals, and smart assistants, providing a smooth user experience on different devices. The system could also include visual and auditory cues, like image previews of destinations or voice recommendations, to enhance the user journey even further.
REFERENCES
-
L. Ho, R. K. Lee, and K. H. Lim, BTRec: A BERT-based personalized tour recommender, arXiv preprint, arXiv:2310.19886, Oct 2023.
-
L. Ho, R. K. Lee, and K. H. Lim, SBTRec: Sentiment-aware tour recommendation using transformers, arXiv preprint, arXiv:2311.11071, Nov. 2023.
-
H. Liu and H. Chen, Personalized travel recommendation using mixed filtering, in Proc. ADHIP, Springer, 2023.
-
C. Li et al., Spatial clustering-based tourism recommendation system, Appl. Math. Nonlinear Sci., vol. 9, no. 1, pp. 18, Jan. 2024.
-
Georgiou et al., Hybrid recommender system for tourism, Algorithms, vol. 17, no. 4, p. 215, Apr. 2024.
-
M. Qasimi, Fuzzy clustering model for personalized tourism, J. Comput. Allied Intell., vol. 5, no. 2, pp. 112118, Oct. 2024.
-
P. Cao, Tourism recommendation using distributed streaming and behavior clustering, J. Syst. Sci., vol. 19, no. 1, pp. 3339, 2025.
-
Hasan and M. M. Anwar, SEAGET: Seasonal graph-based POI recommendation, arXiv preprint, arXiv:2503.21225, Mar. 2025.
-
Zhong et al., Tourist attraction recommendation based on user behavior, Int. J. Comput. Sci. Inf. Technol., vol. 12, no. 2, pp. 5560, 2023.
-
Y. Fang et., Multi-day itinerary recommender using GA and VND, Appl. Soft Comput., vol. 140, p. 110542, Dec. 2023.
