Smart Media Player Face Recognition Based Personalized Video& Music Playback

Ms.D.Kalaiabiram; K. Sabithasri; K.Bhargavi; R.Swetha; R.Gokul

doi:10.17577/IJERTCONV13IS05023

NCITSETM - 2025 (Volume 13-Issue 05)

Smart Media Player Face Recognition Based Personalized Video& Music Playback

DOI : 10.17577/IJERTCONV13IS05023

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 37
Authors : Ms.D.Kalaiabiram, K. Sabithasri, K.Bhargavi, R.Swetha, R.Gokul
Paper ID : IJERTCONV13IS05023
Volume & Issue : Volume 13, Issue 05 (June 2025)
Published (First Online): 03-06-2025
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Smart Media Player Face Recognition Based Personalized Video& Music Playback

Ms.D.Kalaiabiram(AP/CSE Kangeyam Institute of Technology) K. Sabithasri (BE/CSE, Kangeyam Institute of Technology) K.Bhargavi (B.E/CSE Kangeyam Institute of Technology)R.Swetha (B.E/CSE Kangeyam Institute of Technology) R.Gokul (B.E/CSE Kangeyam Institute of Technology)

ABSTRACT:

Emotional intelligence in human-computer interaction is increasingly important in today's technology-based society. This project demonstrates a real-time emotion-driven multimedia recommendation system that identifies a user's facial emotion through deep learning and provides customized video or music content as a response. The system leverages a pre-trained Convolutional Neural Network (CNN) model (emotion_model.p) to classify emotions like Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral from webcam feeds or uploaded pictures. After the emotion is detected, the user is asked to choose a preferred language (English, Hindi, Telugu, or Tamil). Depending on the pair of detected emotion and chosen language, the system utilizes the YouTube Data API to retrieve suitable media content for enhancing or supplementing the emotional status of the user. To prevent repetition, there is a local JSON-based history mechanism that tracks and eliminates videos that have been played before.

Introduction

A .Overview

The system's Smart Media Player is a revolutionary face recognition media player that presents a personalized and dynamic experience through upscaling up to individual users' preferences, emotions, and settings.

Through upscaling up using facial recognition technology, it automatically recognizes users and tailors content recommendations, adjusts playback based on emotional signals, and uses hands-free control through facial gestures.

B .Objectives

The goals of the Smart Media Player are to incorporate precise face recognition technology for seamless user identification, offer individualized video and music suggestions in accordance with user preferences and emotional states, and facilitate hands-free operation via facial gestures. It is intended to facilitate multi-user setups by preserving separate profiles, allowing for customized experiences for every user. The player is also concerned with maximizing privacy and security by limiting access to personalized content, and with adapting the user interface for individual preferences. In addition, the system aims to continuously improve content suggestions based on user interaction, resulting in a more immersive and interactive media experience.

C. Existing Solutions

A few current systems have features in common with a smart media player with face recognition-based personalized video playback, though none combine all of these as a single commercial product. Current popular streaming channels such as YouTube and Netflix make very personalized recommendations for videos to watch, though these are not based on face recognition. A few models of smart TVs from companies such as Samsung and LG have multi-user profiles supporting personalized content but still need manual profile switching. In the academic and DIY field, several open-source projects and research prototypes apply facial recognition (typically developed with Python, OpenCV, and Raspberry Pi) to initiate actionslike a welcome message for a user or playing a user's favorite music playlistbut are typically simple and proof-of-concept in nature.

D .Proposed Solution

The solution to this problem is to create a smart media player that employs facial recognition to automatically recognize users and provide personalized

video streaming without the need

for manual intervention. During system boot, a camera perpetually scans for faces; when a user is identified via sophisticated face recognition algorithms (e.g., Face Net), system fetches their individual profile that hol ds preferences, viewed history, and favorite genres.A custom playlistis subsequently create d and automatically played using an embedded media player. The system also

dynamically detects user presence pausing the playback if the

user departs or changing profiles if a new face enters. This voice-

free, smart interaction increases user convenience, accommodates multiple users, and provides an extremely customized media experience, and thus it's perfect for smart homes, public entertainment systems, and personal entertainment installations.

E. Logic

The intelligent media player starts by setting up a face recognition system and loading user profiles from a local or cloud database. A camera continuously scans for faces in the viewing area. When a face is found, the system reads facial features (encodings) and matches them against stored profiles to determine the user. When successful recognition is achieved, the corresponding user's preferencesfavorite genres, video platforms, or watch historyare loaded. Based on this information, the system creates an individualized playlist and initiates video playback automatically. During the session, the system keeps looking for the user's presence. When the known user departs or a different face shows up, playback gets paused or otherwise modified. Seamless interaction in such a manner achieves a touch-less, personalized viewing experience, promoting

media consumption through ease and more user-focussed means.

Face Detection Haar Cascades

(OpenCV): Quick, light, and suitable f or real-time face detection.

MTCNN (Multi-task Cascaded Convolutional Networks):

More precise, deals with multiple faces and occlusions.
Face Recognition

LBPH (Local Binary Patterns Histogram): Fast and easy, appropriate for offline use.
FaceNet: Deep neural network that maps face images to a 128- dimensional embedding space.
Dlib Res Net: Pretrained models with good balance of accuracy and speed.
DeepFace / VGG Face: High- accuracy deep CNN models.
User Identification & Matching Cosine Similarity or Euclidean Distance between face embeddings to determine the nearest match.
Media Playback Control VLC Python bindings (python): For controlling video playback from a Python script.
Recommendation Algorithms (for playlist creation)

Content-Based

Filtering: Suggests videos akin to the user's history of choice.
Collaborative Filtering: Suggests based on what similar users watched (can utilize libraries like Surprise).
HybridSystems: Use both methods.

F. System Architecture

The system architecture of the Smart Media Player with Face Recognition- Based Personalized Video Playback is composed of several integrated layers working together to provide a seamless user experience. At the core is the Input Layer, which uses a camera module to capture real-time video frames. These frames are passed to the Face Recognition Layer, where faces are first detected using algorithms like OpenCV or MTCNN, then encoded into numerical vectors using models such as Face Net. The system compares these vectors with stored data in the User Profile Management Layer, which maintains a database containing user face encodings, viewing preferences, and watch history.

F. Architecture Design

The architecture design of the Smart Medi Player system is organized into a number of core layers to maintain smooth functioning and integration of components. The foundation is the Input Layer, which uses a camera module to capture facial images in real-time around the clock. These

inputs are passed to the Face Detection and Recognition Layer, where face recognition machine learning models like FaceNet or Dlib are applied to recognize the user by comparing detected faces with facial embeddings stored in the system database. Once recognized, the User Profile Management Layer pulls linked user information such as watch history, preferences, and playlists.

G. LITERATURE SURVEY

The incorporation of facial recognition within personalized systems has been an increasingly prominent area of focus in research and real-world applications. Schroff et al. (2015) proposed Face Net, a deep convolutional neural network that can map facial images to a Euclidean space for the purpose of face recognition with high accuracy, which has since been central to face-based identification systems. Likewise, Dlib's Res Net-based model and OpenCV's Haar Cascades have been extensively used for real-time face detection and recognition, particularly in embedded devices like Raspberry Pi. For personalized media delivery, services such as Netflix and YouTube use content-based and collaborative filtering algorithms, explained by Adoma vicius and Tuzhilin (2005), to recommend content that is user- centric. These systems, however, rely on manual logins by the user and do not integrate facial recognition as an input. Some research and hobbyist

systems have demonstrated the use of facial recognition in conjunction with home automation and intelligent interfaces, including the Smart Mirror, which employs face detection to provide personalized information such as news or weather. These systems, although useful, tend to be limited in interactivity and do not facilitate dynamic playback of content.

Emotion-aware computing, discussed by Ko (2018), also hints at making user facial expression and mood contribute to personalization, and thus it is useable for media playback systems that will adapt content based on real-time emotional feedback. In spite of improvements in each of these individual technologiesfacial recognition, user profiling, and recommendation algorithmsthere is still no integrated system that identifies users automatically, creates personalized playlists, and manages media playback based on them. This unmet need is an opportunity to build a completely autonomous smart media player that integrates these technologies into one smooth, hands- free entertainment solution.

H. Module description

The proposed Smart Media Player system consists of several interconnected modules working in harmony to deliver a personalized user experience. The Camera & Input Module captures real-time video frames, which are processed by the Face Detection & Recognition Module using algorithms like Face Net or Dlib to identify the user based on stored facial data. Once identified, the User Profile & Database Module retrieves personalized preferences, watch history, and user settings. The Playlist

& Recommendation Module then generates a customized list of videos based on the users interests, using content-based filtering or local media tagging. This playlist is managed by the Media Player Module, which handles playback functions such as play, pause, resume, and skip.

Simultaneously, the Monitoring & Control Module ensures the system responds dynamicallypausing playback if the user leaves and switching profiles if a new face appears. The Logging & History Module tracks viewing behavior to improve future recommendations and resume unfinished content. An optional Security & Privacy Module ensures user data and facial information are securely stored and processed. Together, these modules create a fully automated and intelligent media playback system driven by face recognition.

I. Conclusion

the Smart Media Player with Face Recognition-Based Personalized Video Playback provides a cutting- edge and smart solution for user experience improvement through automation and personalization.

By combining sophisticated facial recognition technology with a media playback system

it obviates the necessity for manual user intervention, allowing effortless c ontent delivery that is customized according to individual preferences. system identifies users dynamically, lo ads their profiles, and

plays personalized video

content continuously while monitoring user presence to control playback accordingly. This not

only enhances convenience but

also brings about a more interactive and user- With its potential to be used in smart homes,

educational environments, and public infotainment systems,

the suggested system is a major leap f orward in the development of personalized entertainment technology.

REFRENCE

F. Schroff, D. Kalenichenko, and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 815823, 2015.
D. E. King, "Dlib-ml: A machine learning toolkit," J. Machine Learning Research, vol. 10, pp. 17551758,

2009.
P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proc. IEEE Conf. CVPR, vol. 1, pp. I-511, 2001.
R. Ko, "A brief overview of facial emotion recognition based on visual information," Sensors, vol. 18, no. 2,

p. 401, 2018.
J. Turkle, Alone Together: Why We Expect More from Technology and Less from Each Other, Basic Books, 2011.
G. Adomavicius and A. Tuzhilin, "Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions," IEEE Transactions on

Knowledge and Data Engineering, vol. 17, no. 6, pp. 734749, Jun. 2005.
B. Ricci, L. Rokach, and B. Shapira, "Introduction to Recommender Systems Handbook," Springer, 2011.
A. Jain, A. Ross, and S. Prabhakar, "An introduction to biometric recognition," IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 420,

Jan. 2004.
M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 7186, 1991.
A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 8490, 2017.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
H. Wang and D. Zhang, "Face recognition using independent component analysis," Pattern Recognition, vol. 37, no. 8, pp. 1473

1475, 2004.
A. K. Jain and S. Z. Li, Handbook of Face Recognition, Springer, 2011.
Zhang and W. Xu, "Privacy- preserving face recognition in cloud computing," IEEE Cloud Computing, vol. 2, no. 2, pp. 3238, Mar.-Apr.

2015.
J. Kaur and A. Bansal, "Smart Mirror using Raspberry Pi for Personalized Information," Int. J.

Computer Applications, vol. 179, no. 39, pp. 2428, 2018.
A. Nagrath, R. Arora, and P. Sethi, "Covid-Facemask Detector using TensorFlow, Keras and OpenCV," Preprint, arXiv:2008.03444, 2020.
C. Szegedy et al., "Going deeper with convolutions," Proc. IEEE Conf. CVPR, pp. 19, 2015.
S.Yadav and S. Dixit, "A Real- Time Smart Surveillance System using Face Recognition," Int. J. Sci. & Engineering Research, vol. 9, no. 5,

pp. 8590, 2018.
T. Ahonen, A. Hadid, and M. PietikÃ¤inen, "Face description with local binary patterns: Application to face recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 20372041, Dec.

2006.
Y. Taigman et al., "DeepFace: Closing the gap to human-level performance in face verification," Proc. IEEE Conf. CVPR, pp. 1701 1708, 2014.
C. Zhang and Z. Zhang, "A survey of recent advances in face detection," Microsoft Research, Tech. Rep. MSR- TR-2010-66, 2010.
J. L. Herlocker et al., "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 553, Jan. 2004.
T. Mikolov et al., "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
J. S. Breese, D. Heckerman, and C. Kadie, "Empirical analysis of predictive algorithms for collaborative filtering," Proc. 14th Conf. on Uncertainty in Artificial Intelligence,

pp. 4352, 1998.
M. Nilashi et al., "A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques," Expert Systems with Applications, vol. 92, pp. 507

520, Feb. 2018.