Digital Video Summarization Techniques: A Survey

:- Video summarization which gives a short and precise representation of original video clips by showing the most representative synopsis is gaining more attention. The main objective of Video summarization is to provide a clear analysis of the video by removing redundant and extracting key frames contents from the video. The architecture in video summarization shows how a large video skims in to short and story contents. Many types of research were done in the past and ongoing until now. Therefore, multiple methods and techniques proposed by researchers from classical computer vision until the recent deep learning approaches. Most literature shows that most of the video generation and summarization approaches shift into deep generative models and variational auto encoders. These techniques may fall into summarized, unsupervised and deep reinforcement learning approaches. Video representation categorized in static and dynamic summarization ways. But video summarization still challenging with different problems, these are computational devices, complexity, and lack of dataset are some them. The effective implementation of video summarization applied in different real-world scenarios like movies tailor in the film industry, highlight in football soccer, anomaly detection video surveillance system.


INTRODUCTION
Video summarization is a mechanism of creating a short time original video while keeping main stories/content [1] from large video dataset. A wide range of applications can be achieved through video summarization. For example, if we have surveillance video at home events intentional intended to reduce a few minutes' meaningful illustrating anomaly events for easier understanding. In a sports video, the same thing is to so to summarize illustrating the most important events such as goals, penalty kicks, etc. In video summarization, the input is a video with whole content as such original. The aim of to choose small content of keyframe from the original input video to produce a summary video that can express with being explicitly watching the whole part and without losing important content [11]. In the long-duration video, viewers may not have enough time to watch the whole video. A viewer may interest to watch on the particular issue under important where the user is searching for [28]. Recently, it has been attracting much interest in extracting the representative visual elements from a video for sharing on social media, which aims to effectively express the semantics of the original lengthy video [24].
In today's digital world there are so many videos that were created and release over different stream media. Especially such videos uploaded to the internet or cloud, therefore it needs a high bandwidth network to browse it. Video summarization which gives a short and precise representation of original video clips by showing the most representative synopsis is gaining more attention. This is good practice to save multiple resources time, storage and other network and multimedia infrastructure [2]. In fact, there are two types of video summaries: 1) static video abstract, which is a sequence of keyframes and 2) dynamic video skimming, which is a collection of dynamicallycomposed audio-video sub-clips, and in both cases, the aim is to collect the most interesting or important video segments that show the essence of the original clips.In the real-world Even though we have plenty of video with large content, all the possible frames may not equality important or some of the content is redundant or irrelevant content. But working on video summarization is quite a difficult task while finding the potential information to create an interesting short video with either no repetitive or missed information from the whole content provided as an input. Many types of research have done in video summarization. In problem is underconstraint since this summery is hag of the subjectivity of understanding [24]. This research explains both classical computer vision techniques as well as the recent deep learning approaches to summarize potential relevant information apart from the whole video contents given as input. It also explores methods or techniques used under video summarization while working with its application area in different scenarios. Literature review and surveys on the main objective, gaps or limitations and in-line with its method and contribution. This paper organizes as follows section I architecture, Section III related works, Section IV Application area, and section V conclusion.

ARCHITECTURE IN VIDEO SUMMARIZATION
In the video summarization is a process that explains how large video content will summarize into short and concise information. The videos in small computation and storage resources regardless of losing an important section of the content. The mapping between the ground truth (original video) and the summarize one also important since. The following figure 1 shows the basic architecture of video summarization in line with the mapping function between the inputs (a large chunk of frame sequences) and summarizes (short and selected frame sequence).

VIDEO SUMMARIZATION
Multiple types of research done in video summarization potential methods and application areas in keyframe scenarios like a high-light football game. The proposed methods outlined by the researchers so far were both supervised and unsupervised approaches.in supervised methods. But recently research the reinforcement learning mechanism also applied to it.

A. Supervised Methods
In a supervised learning approach video, summarization learns from labelled data by consisting of videos and along with ground-truth summary videos. Getting an annotated data is quite expensive, difficult and costly even in some way it becomes impossible [24] [11]. Due to its requirement of human-annotated video-summary pairs or per frame the training label is guided to summarize the video accordingly.
To address under the supervision of human-annotated video to produce a subset of contents. Selection problem. This annotation training sample along with the original source video that can teach how summarization will works while selecting informative subsets [14]. The target label annotation which is a user -created summaries that help by teachers for selecting the best video frames directed on how the algorithm to summarize in accordance with the guidance of user input fashion. Much work has been proposed to measure shot importance through supervised learning.

B. Unsupervised Methods
Unsupervised video summarization in Spatio-temporal feature and reduction with clustering methods. Unlike supervised methods without including annotated video summarize it is possible to create an unsupervised way. In Egocentric video summarization methods have also used unsupervised learning to categorize sports actions [12]. However, the video is challenging problems because of that placement of camera shows in a video a great variation in object vanishing points or angle, illumination conditions, and movement. The author used Alex Net which is a convolutional neural network to filter the key-frames (frames where camera wearer interacts closely with the people) while finding a subset abstract story from whole contents.

C. Reinforcement Methods
Reinforcement deep learning without a label or as an unsupervised video summarization approach works in the sequential process [28]. In this paper, the author used a deep summary network that can predict based on the statistical probability of a given frame in a given video sequence. Use an end to end learning for training so needs high computational resources. 4. VIDEO REPRESENTATION Video representation is an important problem in video preprocessing. A good video representation should include the key point and useful information for discrimination by discarding unnecessary information. Generally, in this video processing, video frames are usually represented as a matrix. In this paper author methods method, use the luminance information to keep the data in every single frame. In video summarization, mainly focus on creating a video summary that can finish watching within a short period of time. During this process, the generating mechanism for creating video frame contents may be static or dynamic approaches. The static video summarization is also known as R frame. Still, images consist of 3 types of classification. These are sampling, shot segmentation and scene-based classification where keyframes are extracted pre sampling in uniform as well as in a random manner. On the other hand, the dynamic video summarization during producing summarize informative video contents.  [30] proposed an improved method for video summarization techniques. The aim is to get a summary content of a video which is interesting to the viewer and representing the whole video. The result is better while it compares with other methods.

LITERATURE REVIEW
Anomaly author, 2020 [29] proposed ILS-SUMM which an iterative local search for unsupervised video summarization. Its objective is to create automatically a short summary of the whole contents. Moreover, to indicate the high scalability of ILS-SUMM, the authors introduce a new dataset consisting of videos of various lengths. Zhou et al., 2018 [28] develop a deep summarization network (DSN) to summarize videos for predicts each video frame probabilities. The training is an end to end reinforcement learning .so the result is better than that of supervised approaches. SARMADI et al., 2017 [27] proposed a general video summarization method that is divided into static and dynamic; Static Summary done through a keyframe. Cai et al., 2018[24] proposed a generative modeling framework to learn representation with a variational autoencoder. Encoder-decoder attention for saliency estimation of raw video for generating the summary. patterns. The results suggest that summaries generated by visual co-occurrence tend to match more closely with human-generated summaries. Agyeman et al.,2019 [7] present a deep learning approach to summarizing long soccer videos which are three-dimensional Convolutional Neural Network (3D-CNN) and Long Short-term Memory (LSTM) -Recurrent Neural Network (RNN). Fajtl et al. 2019 [6] propose a novel method for supervised bidirectional recurrent networks such as BiLSTM combined with attention. Elfeki et al. 2019 [4] conduct extensive experiments on the compiled dataset in addition to three other standard benchmarks. Vasudevan et al., 2017 [3] Introduce a new dataset, annotated with diversity and queryspecific relevance labels. In the video, summarization can be a single-view or multiview. In single-view video Summarization proposed for summarizing a single-view using videos supervised approaches usually stood out with best performances. On the other hand, multi-view Video Summarization proposed a method that tends to rely on feature selection in using an unsupervised optimization paradigm.  6. GAN BASED VIDEO SUMMARIZATION GAN-based training framework is a neural network that consists two adversarial Networks called generator and discriminator. This framework which combines the merits of unsupervised and supervised video summarization approaches [23]. The generator network is an attention-aware Ptr-Net that generates the cutting points of summarized fragments whereas the discriminator is a 3D CNN classifier to judge whether a fragment is from a ground-truth or a generated summarization. Therefore, GAN is a better one than that of others in different metri 3) The temporal relationship between video frames in information like video tags, captions, comments and so on will need to be investigated in the future [22].4) inexpensiveness of training video mostly the annotated dataset.

APPLICATION OF VIDEO SUMMARIZATION
In video summarization, Keyframe extraction is an important part of many video applications, like video indexing, browsing, and video retrieval. Many professional and educational applications that involve generating or using large volumes of video and multimedia data are prime candidates for taking advantage of video content analysis techniques [33] • movie trailer (film industry) • Advert creation (Advertisement) • football highlights (Recreation means) . CONCLUSION This paper presents video summarization techniques, applications, and challenges. the architecture of video summarization focusses on a chunk of video summarize into short skim of potential information. Recently classical computer vision techniques for video summarization methods are dynamically shifting to deep learning especially deep generative model, Recurrent neural network, variational auto encoders. Video summarization may handle in supervised (TVSUM, RNN and DPP SQDPP and BiLSTM), unsupervised (ILSUM, GAN and VAE) and even deep reinforcement learning approach (DSN). GAN-based training framework as a powerful means of image and video generation. Video summarization is challenged different factor these ranges from dataset until computational device, especially in new deep learning models. The application video summarization can be used in a different scenario for different reasons these can be recreation, film industry, and security and reduce computation power. In general, a deep generative model and variational auto encoder is a relatively good way of video summarization techniques in both static and dynamic summarization approaches