Real Time Facial Expression Transformation

Ravikant Sharma; Krithika Suvarna; Buddhghosh Shirsat; Maryam Jawadwala

doi:10.17577/IJERTCONV9IS03095

NTASU - 2020 (Volume 09 - Issue 03)

Real Time Facial Expression Transformation

DOI : 10.17577/IJERTCONV9IS03095

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 1,047
Authors : Ravikant Sharma, Krithika Suvarna, Buddhghosh Shirsat, Maryam Jawadwala
Paper ID : IJERTCONV9IS03095
Volume & Issue : NTASU – 2020 (Volume 09 – Issue 03)
Published (First Online): 22-02-2021
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Real Time Facial Expression Transformation

Ravikant Sharma

Dept of Information Technology V.C.E.T. Vasai, India

Buddhghosh Shirsat

Dept of Information Technology V.C.E.T. Vasai, India

Abstract—Facial Expression capture is the process of electronically converting the movements of a persons face into a digital database using cameras or laser scanners. The method is to transfer facial expressionsfrom an actor in the source video to an actor in a target video in real-time. Thus, enabling the improvised control of the facial expression of the target actor. The originality of our approach lies in the transfer of photorealistic re-rendering of facial deformation and detail into target video in a way that the newly synthesized expressions are made as indistinguishable from the real video as possible. To achieve this a trained dataset of images will be created from the target video, then using any webcam or any camera as input feed sources facial traits will be captured and the target face will be manipulated according to that. For each frame, the environment will be considered and ensured that only facial expressions and motions change and everything else remains intact otherwise the viewer might feel that something is out of place.

Keywords—Reenactment, expression, transfer, real-time, source video, target video.

INTRODUCTION

In recent years, real-time markerless facial performance capture based on community sensors has been demonstrated. Impressive results have been achieved, both based on RGB
1. as well as RGB-D data. These techniques have become increasingly popular for the animation of virtual CG avatars in video games and movies. It is now feasible to run these face capture and tracking algorithms from home, which is the foundation for many VR and AR applications, such as teleconferencing.
  
  Facial Motion Capture is related to body motion capture but is more challenging due to the higher resolution requirements to detect and track subtle expressions possible from small movementsofthe eyesand lips. These movements are often less than a few millimeters, requiring even greater resolution and fidelity and different filtering techniques than usually used in full- body capture. The additional constraints of the face also allowmoreopportunities for using models and rules.
  
  Facial expression capture is similar to Facial Motion Capture. It is a process of using visual or mechanical means to manipulate computer-generated characters with input from human faces or to recognize emotions from a user. And once the facial expression is transferred it can be used in various fields like the gaming world and movies and man more. Facial expression transfer is a difficult task as the background is to be kept stable so that it
  
  Krithika Suvarna
  
  Dept of Information Technology V.C.E.T. Vasai, India
  
  Prof. Maryam Jawadwala
  
  Asst. prof. Dept of Information Technology
  
  V.C.E.T. Vasai, India
  
  doesnt look weird to the viewer. And the database needs to be maintained properly so that no problem is faced while transferring the expression from the source to the target actor. A major challenge is the convincing re- rendering of the synthesized target face into the corresponding video stream. This requires careful consideration of the lighting and the shading design, which both must correspond to the real-world environment.
  
  Generative adversarial networks (GANs) for example, were shown to successfully generate realistic images of fake faces. Conditional GANs (cGANs) were used to transform [5] an image depicting real data from one domain to another and inspired multiple face reenactment schemes. These methods decompose the identity component of the face from the remaining traits and encode identity as the manifestation of latent feature vectors resulting in significant information loss and limiting the quality of the synthesized images.
PROBLEM STATEMENT

Real time facial expression transformation aims to transfer the expression of the source face on to the target face using various techniques and algorithms. A platform wherein we will be doing real-time source-to- target reenactment approach for complete human portrait videos that enable the transfer of head motion, face expression and eye gaze. Given a short video of the target actor, we will impose a real-time reenactment algorithm. Reenactment aims to transfer the motion of a source actor to an image or video of a target actor. Realistic facial expression creation and transformation has been a long- standing problem in computer graphics and computer vision. Thus far popular approaches usually require a driving source or the combination of multiple ones such as capturing a subjects performance and then transferring it to virtual faces. The novelty of the approach lies in the transfer and photo-realistic re- rendering of facial deformations and detail into the target video in a way that the newly-synthesized expressions are made as indistinguishable from a real

video as possible.

To achieve this, a dataset will be created of expressions from the target video, then using a webcam or any camera as input feed sources facial traits will be captured and the targets face will be manipulated according to that. For each frame, the environment is considered and ensures that only facial expressions and motions change and everything else remains intact

otherwise the viewer might feel something is out of place. A major challenge is convincing the re- rendering of the synthesized target face into the corresponding video stream. This requires careful consideration of the lighting and shading design which both must correspond to the real- world environment.
RELATED WORK

Many types of research have been done on different methods for face recognition, detection and re- enactment here some of the prominent works done in this field are discussed.
GANs [11] were appeared to produce counterfeit pictures with a similar conveyance as an objective space. Albeit effective in creating reasonable appearances, preparing GANs can be shaky and limits their application to low- resolution pictures. Consequent strategies, in any case, improved the dependability of the preparation procedure train GANs utilizing a dynamic multiscale plot, from a low to high picture goals. Cycle GAN proposed a cycle consistency misfortune, permitting preparing of solo nonexclusive changes between different spaces. A CGAN with L1 misfortune was applied by Isola et al.
1. to infer the pix2pix strategy was appeared to create engaging amalgamation results for applications, for example, changing edges to faces.
PROPOSED METHODOLOGY
The preparation and test information is produced by a likelihood dispersion over datasets called the information creating process. We ordinarily make a lot of suppositions referred to altogether as

the presumptions. These suppositions are that the models in each dataset are free from one another and that the preparation set and test set are indistinguishably appropriated, drawn from the same likelihood dispersion as one another. This presumption empowers us to portray the information producing process with a likelihood conveyance over a solitary example. The same appropriation is then used to create each train model and each test model. We consider that common fundamental conveyance the information creating circulation, signified information. This probabilistic system and the (i.i.d). presumptions empower us to scientifically consider the connection between preparing mistakes and test blunder. There is a typical presumption that information that is being demonstrated is autonomous and

indistinguishably conveyed (i.e.) tests from likelihood dissemination. There is the equivalent fundamental likelihood dissemination for both the preparation and test datasets. Furthermore, each example is autonomous of different examples.

Fig. 11. Frames
RESULTS AND DISCUSSION

The below image shows our raw face re-enactment results without background removal. We chose examples of varying ethnicity, pose, and expression. A specifically interesting example can be seen in the rightmost column showing our methods ability to cope with extreme expressions. To show the importance of iterative re- enactment, Fig 4.1 provides re-enactments of the same subject for both small and large angle differences. As evident from the last column for large angle differences, the identity and texture are better preserved using multiple iterations. We report quantitative results, to how we defined the face-swapping problem: we validate how well methods preserve the source subject identity while retaining the same pose and expression of the target subject.

Fig. 12. Result

To this end, we first compare the face-swapping result, Fb, of each frame to its nearest neighbour in a pose from the subject face views. We use the dlib face verification method to compare identities and the structural similarity index method (SSIM) to compare their quality. To measure pose accuracy, we calculate the Euclidean distancebetween thethat our method retainspose and expression much better than its baselines. Note that the human eye is very sensitive to artefacts on faces. This should be reflected in the quality score but those artefacts usually capture only a small part of the image and so the SSIM score does not reflect them well.
CONCLUSION

After using the technologies mentioned in the paper above, we have got our first result that is a video-based re- enactment. In which we record the source video and store it and the target video expression will be changed

according to the source video character. There might come changes in the algorithms as the project progresses and also the flow might change as we go for more research on the flow and algorithms for time being we will be using CGAN which currently provides the highest accuracy according to the author of one of the paper still it is to be tested on our level but the results might not differ much in case of accuracy.

We proposed a very flexible methodology for editing facial images according to a target motion defined by a set of facial landmarks. Our methodology can be used for both facial expression/motion transfer, as well as the generation of an image sequence given a single facial image and the sequence of landmarks. We propose a novel way of training such a model to be robust to error accumulation. We demonstrate highly realistic video sequence creation driven by various poses and expressions.

REFERENCES

G. G. Chrysos, E. Antonakos, S. Zafeiriou, and P. Snape. Ofine deformable face tracking in arbitrary videos. The IEEE International Conference on Computer Vision (ICCV) Workshops, December2015.
J. Thies, M. ZollhÃ¶fer, M. NieÃŸner, L. Valgaerts, M. Stamminger, and

C. Theobalt. Real-time expression transfer for facial reenactment. ACM Transactions on Graphics (TOG), 34(6), 2015.
Narayan T. Deshpande and Dr. S. Ravishankar, Face Detection and Recognition using Viola-Jones algorithm and Fusion of PCA and ANN", in Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 5 (2017) pp. 1173-1189.
Ritu Tiwari, Chandra Prakash Meena, Dhirendra Sharma, and A.Shukla, Face Recognition using the morphological method" in Indian Institute of Information Technology and Management Gwalior, India April 2009 DOI: 10.1109/IADCC.2009.4809067.
P.Isola, J.-Y.Zhu, T.Zhou,andA.A.Efros., "Image-to-image translation with conditional adversarial networks. Arxiv, 2016
Justus Thies, Michael ZollhÃ¶fer, Christian Theobalt, Matthias NieÃŸner Face2Face: Real-time Face Capture and Reenactment of RGB Videos in University of Erlangen- Nuremberg and Max- Planck-Onstitute of Infomatics 2015.
Yuval Nirkin, Yosi Keller, Tal Hassener, "FSGAN: Subject Agnostic Face Swapping and Reenactment, arXiv:1908.05932v1, 16 Aug 2019.
Kritaphat Songsri-in, Stefanos Zafeiriou, Imperial College London, University of Oulu, Face Video Generation from a Single Image and Landmarks,arXiv:1904.11521v1, 25 Apr 2019
THIES J., ZOLLHOFER M., STAMMINGER M., THEOBALT C., NIESSNER M., Face2face: Real-time face capture and reenactment of RGB videos, In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016), IEEE.
Enrique Sanchez and Michel Valstar. Triple consistency loss for pairing distributions in gan-based face synthesis. arXiv preprint arXiv:1811.03492, 2018.
Ryota Natsume, Tatsuya Yatagawa, and Shigeo Morishima. Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv preprint arXiv:1804.03447, 2018.
Yuval Nirkin, Iacopo Masi, Anh Tran Tuan, "On face segmentation, face swapping, and face perception. In Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on, pages 98105. IEEE, 2018.

Real Time Facial Expression Transformation

Many types of research have been done on different methods for face recognition, detection and re- enactment here some of the prominent works done in this field are discussed.

Leave a Reply