Multimedia Recommender System using Facial Expression Recognition

:- Biologically, Facial Expressions are derived from the relative position or motion of muscles that lie under the skin. According to certain controversial theories, these also convey the emotional state of the individual at a given time. They are indeed controversial because one can easily fake their expressions. But in the world where communication is one of the most important act, the facial expression is the mean of non-verbal communication. Recommender system, as the name says, simply means a system that could be used to recommend items to a user on the basis of some information or criterion like past feedback of user, or other user pattern. This paper is aimed at not using the user past feedbacks or other pattern, it will rather use the user’s facial expression to recommend him entities like movies or songs. Hence creating a recommendation system that will require less user data and should still be able to work nicely as user requirement might not be related to his past but with the present that is signified by his/her expressions.


INTRODUCTION
The Facial expression recognition software is a technology which uses algorithms for systems like biometrics to find information from human face like expression which conclude the emotion as well. More accurately, this technology is an emotion analysis system which is able to detect number of expressions that a human conveys like happy, sad anger, disgust, etc.
Facial expressions and other gestures convey nonverbal communication cues that play an important role in interpersonal relations.
Therefore, facial expression recognition, because it extracts and analyses information from an image or video feed, it is able to deliver unfiltered, unbiased emotional responses as data.
Research of Psychologist Mehrabian shows that only 7% of the actual information is transmitted orally, and 38% is passes by auxiliary of language, such as the rhythm and speed of speech, tone, etc. The information ratio which is transmitted by the expression of face has reached 55%.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries). The proposed system does not require any of the aforementioned data and works without the continuous and interminable attention of the user. In this framework, we capture the user's eye-gaze and facial expression while exploring websites through inexpensive, visible light "webcam". This paper will feature the recommendation system using facial expression recognition for movies and music as the current generation is more so inclined towards multimedia for their entertainment which is actually a very key factor for the future aspects of the system.
For entities like movies and music/songs, there is a genre associated with them which helps in mapping them to the various expressions inflicted by a human face. Hence the algorithm will involve steps like face identification, then facial expression recognition, taking the user input for whether they want the system to return a movie or music/song, then the system will return a random entity which corresponds to the genre that is associated with the expression or could either return a list of the same and a hyper link to a webpage where the user can get more information using automation tools like BeautifulSoup, Selenium WebDriver, etc.
With the advancement in technology for digital signal processing and other effective feature extraction algorithms, the automated emotion detection in multimedia entities like music or movies is growing rapidly and this recommendation system can play an important role in many potential applications like human-computer interaction system, music entertainment and movie recommenders for theatres.
2. OBJECTIVE • To create an effective interface between user and multimedia. • Implementing Machine Learning in technology.
• Providing a new age platform to movie lovers or music lovers. • Automating certain procedures a user will need to take.
• Most importantly, providing sources of entertainment to the users. • Using expression as an input for suggesting entertainment.
3. SYSTEM The users' face is treated as an input which is captured from a webcam. The face is the usual source of expression which is the key information used in the system to generate the output. Being pretty obvious that if a person is happy, they would want to keep it that way, hence they will certainly prefer comedy movies or pop song. But in case of a sad or an anger expression, they would not want to go instantly happy, hence the genres like melodrama and war for movies, and soft and rock for music respectively should be apt for them.
Then using the Web Automation script like Beautiful, the system could take the user to the corresponding page that contains more information of the multimedia entity which could be a merely google search result as well.

ARCHITECTURE DIAGRAM
The basic architecture will look like:

Face Detection
We will be using classification here. Classification is a process of categorizing a given data set into classes. In this system, this process will be done to segregate the various data set into various types of emotions.
We need to have a large amount of data set as the more dataset to train the classifier, the more accurate it will be. These datasets can be downloaded online or can be created by one as well. The condition is that they should be segregated into multiple subfolder with the name of the emotion that they are associated with.
Here we are doing supervised learning, where the training dataset is associated with their corresponding labels. Then the testing dataset is then compared to the training dataset using algorithms (classification in this scenario) to get associated to the labels which is the result.
For face detection, Haar feature based cascades for object detection can be used which is present in the OpenCV library. The Haar cascades algorithm works using two different types of datasets like one that has the faces and one that don't have the faces. It then extracts features from the dataset for the detection namely edge features, line feature and four rectangle features. These features are then applied on the training images and on the basis of the darker side or the lighter side, it classifies the sides of images to positive and negative.
The features with the minimum error rates is selected and hence they best classify the image into the face ones and non-face ones.

Expression Detection
Now that we have labelled dataset of faces that corresponds to various expressions like happy, anger and sad. Now converting the dataset into vectors using VGG-16 (16-layer convolutional Neural Network) which is a Convolution Neural Network (CNN) for image classification. CNN is a type of neural network which consists of an input layer and an output layer and multiple hidden layers. The hidden layers consists of layers that convolve with dot product and the activation function usually being ReLU (Rectified Linear Unit) layer.
Logistic Regression is the classification model that is used since it has the highest accuracy and relatively lowest error rates. This will classify the test image into what category they fall in i.e. Expression in this case.

Web Automation
After the expression has been recognized, it is just the matter to traverse to the web URL that will have information around the entity that is selected by the user, i.e. among music and movie.
Web Automation, basically means browser automation, is the process of replicating human action on a browser. For example, Beautiful Soup is one such Python library that is used for extracting data from web pages. There are certain functions in the library for clicking on buttons, filling forms, navigating and searching in a browser.
Using beautiful soup, the system will take the user to the required web page like for example, for a movie, the user can be taken to the corresponding IMDB page for the movie, or for a song, user can be taken to a music player playing that song.
6. FUTURE ASPECTS The entertainment recommender system using the facial expression recognition can be upgraded in the future with the addition for compatibility of more emotions. The system can even be embedded with a music player so that web automation will not be required for music player as the music will stay in the system only.
It could also be associated to streaming websites like Netflix, Amazon Prime, Spotify, etc. which will help in improving the library of the system. 7. CONCLUSION In this research paper, the recommender system basically combines two different recommender system i.e. one for movies and other one for music using the human emotion conveyed through expression using face detection and classification algorithm so as to get the emotion. Then, the corresponding genre to the emotion is fetched and then web automation takes place, using tool like Beautiful Soup which fetches the required information for the multimedia entity, which is either music or movie.