 Open Access
 Total Downloads : 18
 Authors : Dr. K N Narasimha Murthy, Fardeen Pasha, Pramita P Mannari, Santhosh S Kashyap
 Paper ID : IJERTCONV1IS04001
 Volume & Issue : NCRTICE – 2013 (Volume 1 – Issue 04)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Robust Identification of Movie Character using FaceName Graph Matching
Robust Identification of Movie Character using FaceName Graph Matching
Dr. K N Narasimha Murthy, Fardeen Pasha, Pramita P Mannari, Santhosh S Kashyap, City Engineering College
Identificatiom of characters in movies using automatic face recognition has drawn significant research and has led to many applications. Due to the vast variation of appearance of each character it is a challenging problem. Although existing methods demonstrate promising results in clean environment, the performances are limited in complex movie scenes due to the noises generated during the face tracking and face clustering process. Here we present two schemes of global facename matching based framework for robust character identification. The contributions of this work include: 1) A noise insensitive character relationship representation is incorporated. 2) We introduce an edit operation based graph matching algorithm. 3) Complex character changes are handled by simultaneously graph partition and graph matching.
Index TermsCharacter identification, graph matching, graph partition, graph edit, sensitivity analysis.

INTRODUCTION

Objective and Motivation
The proliferation of movie and TV provides large amount of digital video data. This has led to the requirement of efficient and effective techniques for video content understanding and organization. Automatic video annotation is one of such key techniques. In this paper our focus is on annotating characters in the movie and TVs, which is called movie character identification [1]. The objective is to identify the faces of the characters in the video and label them with the corresponding names in the cast. The textual cues, like cast lists, scripts, subtitles and closed captions are usually exploited. Fig.1 shows an example in our experiments. In a movie, characters are the focus center of interests for the audience. Their occurrences provide lots of clues about the movie structure and content. Automatic character identification is essential for semantic movie index and retrieval [2], [3], scene segmentation [4],
summarization [5] and other applications [6].
Fig. 1. Examples of character identification from movie Notting Hill.
Character identification, though very intuitive to humans, is a tremendously challenging task in computer vision. The reason is fourfold: 1) Weakly supervised textual cues [7]. There are ambiguity problem in establishing the correspon dence between names and faces: ambiguity can arise from a reaction shot where the person speaking may not be shown in the frames 1; ambiguity can also arise in partially labeled frames when there are multiple speakers in the same scene 2.
2) Face identification in videos is more difficult than that in
images [8]. Low resolution, occlusion, nonrigid deformations, large motion, complex background and other uncontrolled conditions make the results of face detection and tracking unreliable. In movies, the situation is even worse. This brings inevitable noises to the character identification. 3) The same character appears quite differently during the movie [3]. There may be huge pose, expression and illumination variation, wear ing, clothing, even makeup and hairstyle changes. Moreover, characters in some movies go through different age stages, e.g., from youth to the old age. Sometimes, there will even be different actors playing different ages of the same character.

The determination for the number of identical faces is not trivial [2]. Due to the remarkable intraclass variance, the same character name will correspond to faces of huge variant appearances. It will be unreasonable to set the number of identical faces just according to the number of characters in the cast. Our study is motivated by these challenges and aims to find solutions for a robust framework for movie character identification.


RelatedWork
The crux of the character identification problem is to exploit the relations between videos and the associated texts in order to label the faces of characters with names. It has similarities to identifying faces in news videos [9], [10], [11]. However, in news videos, candidate names for the faces are available from the simultaneously appearing captions or local transcripts. While in TV and movies, the names of characters are seldom directly shown in the subtitle or closed caption, and scrip t/screenplay containing character names has no time stamps to align to the video.
International Journal Of Engineering Research and Technology(IJERT), NCRTICE – 2013 Conference Proceedings 94
Fig. 2. Framework of scheme 1: Facename graph matching with #cluster prespecified.
According to the utilized textual cues, we roughly divide the existing movie character identification methods into three categories.

Category 1: Cast list based: These methods only utilize the case list textual resource. In the cast list discovery problem [12], [13], faces are clustered by appearance and faces of a particular character are expected to be collected in a few pure clusters. Names for the clusters are then manually selected from the cast list. Ramanan et al. proposed to manually label an initial set of face clusters and further cluster the rest face instances based on clothing within scenes [14]. In [15], the authors have addressed the problem of finding particular characters by building a model/classifier of the characters appearance from userprovided training data. An interesting work combining character identification with web image retrieval is proposed in [17]. The character names in the cast are used as queries to search face images and constitute gallery set. The probe face tracks in the movie are then identified as one of the characters by multitask joint sparse representation and classification. Recently, metric learning is introduced into character identification in uncontrolled videos [16]. Castspecific metrics are adapted to the people appearing in a particular video in an unsupervised manner. The clustering as well as identification performance are demonstrated to be improved. These cast list based methods are easy for understanding and implementation. However, without other textual cues, they either need manual labeling or guarantee no robust clustering and classification performance due to the large intraclass variances.

Category 2: Subtitle or Closed caption, Local matching based: Subtitle and closed caption provide timestamped dialogues, which can be exploited for alignment to the video frames. Everingham et al. [18], [3] proposed to combine the film script with the subtitle for local facename match ing. Timestamped name annotation and face exemplars are generated. The rest of the faces were then classified into these exemplars for identification. They further extended their work in [19], by replacing the nearest neighbor classifier by multiple kernel learning for features combination.
In the new framework, nonfrontal faces are handled and the coverage is extended. Researchers from University of Pennsylvania utilized the readily available timestamped resource, the closed captions, which is demonstrated more reliable than OCRbased subtitles [20], [7]. They investigated on the ambiguity issues in the local alignment between video, screenplay and closed captions. A partiallysupervised multiclass classification prob lem is formulated. Recently, they attempted to address the character identification problem without the use of screenplay [21]. The reference cues in theclosed captions are employed as multiple instance constraints and face tracks grouping as well as facename association are solved in a convex formulation. The local matching based methods require the timestamped information, which is either extracted by OCR (i.e., subtitle) or unavailable for the majority of movies and TV series (i.e., closed caption). Besides, the ambiguous and partial annotation makes local matching based methods more sensitive to the face detection and tracking noises.

Category 3: Script/Screenplay, Global matching based: Global matching based methods open the possibility of character identification without OCRbased subtitle or closed caption. Since it is not easy to get local name cues, the task of character identification is formulated as a global matching problem in [2], [22], [4]. Our method belongs to this category and can be considered as an extension to Zhangs work [2]. In movies, the names of characters seldom directly appear in the subtitle, while the movie script which contains character names has no time information. Without the local time information, the task of character identification is formulated as a global matching problem between the faces detected from the video and the names extracted from the movie script. Compared with local matching, global statistics are used for nameface association, which enhances the robustness of the algorithms.
Our work differs from the existing research in three fold:

Regarding the fact that characters may show various appearances, the representation of character is often affected
International Journal Of Engineering Research and Technology(IJERT), NCRTICE – 2013 Conference Proceedings 95
Fig. 3. Framework of scheme 2: Facename graph matching without #cluster prespecified.
by the noise introduced by face tracking, face clustering and scene segmentation. Although extensive research efforts have been concentrated on character identification and many applications have been proposed, little work has focused on improving the robustness. We have observed in our investigations that some statistic properties are preserved in spite of these noises. Based on that, we propose a novel representation for character relationship and introduce a nameface matching method which can accommodate a certain noise.

Face track clustering serves as an important step in movie character identification. In most of the existing methods, some cues are utilized to determine the number of target clusters prior to face clustering, e.g., in [2], the number of clusters is the same as the number of distinct speakers appearing in the script. While this seems convinced at first glance, it is rigid and even deteriorating the clustering results sometimes. In this paper, we loose the restriction of one face cluster corresponding to one character name. Face track clustering and facename matching are jointly optimized and conducted in a unique framework.

Sensitivity analysis is common in financial applications, risk analysis, signal processing and any area where models are developed [23], [24]. Good modeling practice requires that the modeler provides an evaluation of the confidence in the model, for example, assessing the uncertainties associ ated with the modeling process and with the outcome of the model itself. For movie character identification, sensitivity analysis offers valid tools for characterizing the robustness to noises for a model. To the best of our knowledge, there have been no efforts directed at the sensitivity analysis for movie character identification. In this paper, we aim to fill this gap by introducing two types of simulated noises.
A preliminary version of this work was introduced by [1]. We provide additional algorithmic and computational details, and extend the framework considering no prespecification for the number of face clusters. Improved performance as well as robustness are demonstrated in movies with large character
appearance changes.



Overview of Our Approach
In this paper, we propose a global facename graph match ing based framework for robust movie character identification. Two schemes are considered. There are connections as well as differences between them. Regarding the connections, firstly, the proposed two schemes both belong to the global matching based category, where external script resources are utilized. Secondly, to improve the robustness, the ordinal graph is employed for face and name graph representation and a novel graph matching algorithm called Error Correcting Graph Matching (ECGM) is introduced. Regarding the differences, scheme 1 sets the number of clusters when performing face clustering (e.g., Kmeans, spectral clustering). The face graph is restricted to have identical number of vertexes with the name graph. While, in scheme 2, no cluster number is required and face tracks are clustered based on their intrinsic data structure (e.g., mean shift, affinity propagation). Moreover, as shown in Fig.2 and Fig.3, scheme 2 has an additional module of graph partition compared with scheme 1. From this perspective, scheme 2 can be seen as an extension to scheme 1.

Scheme 1: The proposed framework for scheme 1 is shown in Fig.2. It is similar to the framework of [2]. Face tracks are clustered using constrained Kmeans, where the number of clusters is set as the number of distinct speakers. Cooccurrence of names in script and face clusters in video constitutes the corresponding face graph and name graph. We modify the traditional global matching framework by using ordinal graphs for robust representation and introducing an ECGMbased graph matching method.
For face and name graph construction, we propose to represent the character cooccurrence in rank ordinal level [25], which scores the strength of the relationships in a rank order from the weakest to strongest. Rank order data carry no numerical meaning and thus are less sensitive to the noises. The affinity graph used in the traditional global matching is interval measures of the cooccurrence relationship between characters. While continuous measures of the strength of relationship holds complete information, it is highly sensitive to noises.
For nameface graph matching, we utilize the ECGM al gorithm. In ECGM, the difference between two graphs is
International Journal Of Engineering Research and Technology(IJERT), NCRTICE – 2013 Conference Proceedings 96
measured by edit distance which is a sequence of graph edit operations. The optimal match is achieved with the least edit distance. According to the noise analysis, we define appropri ate graph edit operations and adapt the distance functions to obtain improved nameface matching performance.

Scheme 2: The proposed framework for scheme 2 is shown in Fig.3. It has two differences from scheme 1 in Fig.2. First, no cluster number is required for the face tracks clustering step. Second, since the face graph and name graph may have different number of vertexes, a graph partition component is added before ordinal graph representation.
The basic premise behind the scheme 2 is that appearances of the same character vary significantly and it is difficult to group them in a unique cluster. Take the movie The Curious Case of Benjamin Button for example. The hero and heroine go through a long time period from their childhood, youth, middleage to the oldage. The intraclass variance is even larger than the interclass variance. In this case, simply enforcing the number of face clusters as the number of characters will disturb the clustering process. Instead of grouping face tracks of the same character into one cluster, face tracks from different characters may be grouped together. In scheme 2, we utilize affinity propagation for the face tracks clustering. With each sample as the potential center of clusters, the face tracks are recursively clustered through appearancebased similarity transmit and ropagation. High cluster purity with large number of clusters is expected. Since one character name may correspond to several face clusters, graph partition is introduced before graph matching. Which face clusters should be further grouped (i.e., divided into the same subgraph) is determined by whether the partitioned face graph achieves an optimal graph matching with the name graph. Actually, face clustering is divided into two steps: coarse clustering by appearance and further modification by script. Moreover, face clustering and graph matching are op timized simultaneously, which improve the robustness against
errors and noises.
In general, the scheme 2 has two advantages over the scheme 1. (a) For scheme 2, no cluster number is required in advance and face tracks are clustered based on their intrin sic data structure. Therefore, the scheme 2 provides certain robustness to the intraclass variance, which is very common in movies where characters change appearance significantly or go through a long time period. (b) Regarding that movie cast cannot include pedestrians whose face is detected and added into the face track, restricting the number of face tracks clusters the same as that of name from movie cast will deteriorate the clustering process. In addition, there is some chance that movie cast does not cover all the characters. In this case, prespecification for the face clusters is risky: face tracks from different characters will be mixed together and graph matching tends to fail.

Sensitivity Analysis: Sensitivity analysis plays an im portant role in characterizing the uncertainties associated with a model. To explicitly analyze the algorithms sensitivity to noises, two types of noises, coverage noise and intensity noise, are introduced. Based on that, we perform sensitivity analysis

respect to the simulated noises.
The rest of paper is organized as follows: We first intro duce the ordinal graph representation and ECGMbased graph matching of scheme 1 in Section II. Section III presents the scheme 2 and the graph partition algorithm. In Section IV, we introduced two simulated noises for sensitivity analysis. A set of experiments with comparisons are conducted and discussed in Section V, and we conclude the paper in Section VI.


SCHEME 1: FACENAME GRAPH MATCHING WITH
NUMBER OF CLUSTER SPECIFIED
In this section we first briefly review the framework of tra ditional global graph matching based character identification. Based on investigations of the noises generated during the affinity graph construction process, we construct the name and face affinity graph in rank ordinal level and employ ECGM with specially designed edit cost function for face name matching.

Review of Global Facename Matching Framework
In a movie, the interactions among characters resemble them into a relationship network. Cooccurrence of names in script and faces in videos can represent such interactions. Affinity graph is built according to the cooccurrence status among characters, which can be represented as a weighted graph G = {V, E} where vertex V denotes the characters and edge E denotes relationships among them. The more scenes where two characters appear together, the closer they are, and the larger the edge weights between them are. In this sense, a name affinity graph from script analysis and a face affinity graph from video analysis can be constructed. Fig.4 demonstrates the adjacency matrices corresponding to the name and face affinity graphs from the movie Noting Hill 3. All the affinity values are normalized into the interval [0, 1]. We can see that some of the face affinity values differ much from the corresponding
name affinity values (e.g. {WIL,SPI} and {Face1,Face2},
{WIL,BEL} and {Face1,Face5}) due to the introduced noises. Subsequently, character identification is formulated as the problem of finding optimal vertex to vertex matching between two graphs. A spectral graph matching algorithm is applied to find the optimal nameface correspondence. More technical details can be referred to [2].

Ordinal Graph Representation
The name affinity graph and face affinity graph are built based on the cooccurrence relationship. Due to the imperfect face detection and tracking results, the face affinity graph can be seen as a transform from the name affinity graph by affixing noises. We have observed in our investigations that, in the generated affinity matrix some statistic properties of the characters are relatively stable and insensitive to the noises, such as character A has more affinities with character B than C, character D has never cooccurred with character A, etc. Delighted from this, we assume that while the absolute
1 The groundtruth mapping is WILFace1, SPIFace2, ANNFace3,
by invIenstteirgnaattiinogntahleJopuerrnfoarlmOafnEcengoifnenearminegfRaceesemaracthchainndg TweicthhnoloMgAyX(IJFEaRceT4), ,BNELCRFaTcIeC5E – 2013 Conference Proceedings 97
(a)
(b)
Fig. 4. Example of affinity matrices from movie Notting Hill: (a) Name affinity matrix Rname (b) Face affinity matrix Rface
quantitative affinity values are changeable, the relative affinity relationships between characters (e.g. A is more closer to B than to C) and the qualitative affinity values (e.g. whether D has cooccurred with A) usually remain unchanged. In this paper, we utilize the preserved statistic properties and propose to represent the character cooccurrence in rank order.
We denote the original affinity matrix as R = {rij }NÃ—N , where N is the number of characters. First we look at the cells along the main diagonal (e.g. A cooccur with A, B cooccur with B). We rank the diagonal affinity values rii in ascending order, then the corresponding diagonal cells rii in the rank
ordinal affinity matrix R:
(a)
(b)
Fig. 5. Example of ordinal affinity matrices corresponding to figure 4: (a) Name ordinal affinity matrix Rname (b) Face ordinal affinity matrix Rface
matrices, the derived ordinal affinity matrices are basically the same. The differences are generated due to the changes of zerocell. A rough conclusion is that the ordinal affinity matrix is less sensitive to the noises than the original affinity matrix. We will further validate the advantage of ordinal graph representation in the experiment section.

ECGMbased Graph Matching
ECGM is a powerful tool for graph matching with distorted inputs. It has various applications in pattern recognition and computer vision [26]. In order to measure the similarity of two graphs, graph edit operations are defined, such as the
rii = Irii
(1)
deletion, insertion and substitution of vertexes and edges. Each of these operations is further assigned a certain cost. The costs
where Irii is the rank index of original diagonal affinity value rii. Zerocell represents that no cooccurrence relationship is specially considered, which is a qualitative measure. From the perspective of graph analysis, there is no edge between the vertexes of row and column for the zerocell. Therefore, change of zerocell involves with changing the graph structure or topology. To distinguish the zerocell change, for each row in the original affinity matrix, we remain the zerocell unchanged. The number of zerocells in the ith row is recorded as nulli. Other than the diagonal cell and zerocell, we sort the rest affinity values in ascending order, i.e., for the ith row, the corresponding cells rij in the ith row of ordinal affinity matrix:
rij = Irij + nulli (2)
where Irij denotes the order of rij . Note that the zerocells are not considered in sorting, but the number of zerocells will be set as the initial rank order 4. The ordinal matrix is not necessarily symmetric. The scales reflect variances in degree of intensity, but not necessarily equal differences. We illustrate in Fig.5 an example of ordinal affinity matrices correspondin to the affinity matrices in Fig. 4. It is shown that although there are major differences between original name and face affinity
2 It can be considered that all the zerocells rank first and the rest cells rank from nulli + 1.
are application dependent and usually reflect the likelihood of graph distortions. The more likely a certain distortion is to occur, the smaller is its cost. Through error correcting graph matching, we can define appropriate graph edit operations according to the noise investigation and design the edit cost function to improve the performance.
For explanation convenience, we provide some notations and definitions taken from [28]. Let L be a finite alphabet of labels for vertexes and edges.
Notation: A graph is a triple g = (V, , ), where V is the finite set of vertexes, : V L is vertex labeling function, and : E L is edge labeling function.
The set of edges E is implicitly given by assuming that graphs
are fully connected, i.e., E = V Ã— V. For the notational convenience, node and edge labels come from the same alphabet 5.
Definition 1. Let g1 = (V1, 1, 1) and g2 = (V2, 2, 2) be two graphs. An ECGM from g1 to g2 is a bijective function f : V1 V2, where V1 V1 and V2 V2.
We say that vertex x V1 is substituted by vertex y V2
if f (x) = y. If 1(x) = 2(f (x)), the substitution is called an identical substitution. The cost of identical vertex or edge substitution is usually assumed to be zero, while the cost of any other edit operation is greater than zero.
3 For weighted graphs, edge label is the weight of the edge.
International Journal Of Engineering Research and Technology(IJERT), NCRTICE – 2013 Conference Proceedings 98
Fig. 6. Three basic graph operators for editing graphs.
Definition 2. The cost of an ECGM f : V1 V2 from graph g1 = (V1, 1, 1) to g2 = (V2, 2, 2) is given by
on a training set with various value of 1 and 2 and select those which maximize the average matching accuracy.
.
(f, g1, g2) =
xV1V1
.
.
cvd(x) +
. xV2V2
cvi(x)
(3)
Recalling the example in Fig.5, apparently no vertex deletion
or insertion operation is involved. Also no vertex substitution or edge substitution operations happen. There involve two edge
+ cvs(x) +
ces(e) insertion operations (edge {Face3, Face5}, {Face5, Face3})
xV1
eE1
and one edge substitution operation. The cost of this ECGM
under our designed cost function C is: C(f, Rf ace, Rname) =
where cvd(x) is the cost of deleting a vertex x V1 V1 from g1, cvi(x) is the cost of inserting a vertex x V2 V2 in g2, cvs(x) is the cost of substituting a vertex x V1 by
22 + 1.
Consider N face clusters and character names, the number of possible states for the solution space is the permutation of
f(x) V2, and ces(e) is the cost of substituting an edge N , i.e., N ! = N Ã— (N 1) Ã— Â· Â· Â·Ã— 2 Ã— 1. A general algorithm
e = (x, y) V1 Ã— V1 by e = (f (x), f (y)) V2 Ã— V2.
Definition 3. Let f be an ECGM from g1 to g2, C a cost function. We call f an optimal ECGM under C if there is no other ECGM f from g1 to g2 with C (f, g1, g2) <
C(f, g1, g2)
In our cases, if we set the number of face track clusters as the same with the number of character names, the name and face affinity graph have the same number of vertexes. There fore, there is no need to search for subgraph isomorphisms
in scheme 1. We have V1 = V1 = V2 = V2. Also, as
no vertex deletion or insertion operation is involved, we can directly assign cvd = cvi = . According to the investigation on noises, we introduce cedc(e) for the cost of destroying an edge e V1 Ã— V1 or creating an edge e V2 Ã— V2. The edit
operation of destroying an edge means certain cell in the name ordinal affinity matrix is nonzero while the corresponding cell in the face ordinal affinity matrix is zero. The edit operation of creating an edge means the opposite. Fig.6 shows the three basic graph operations defined in this paper. We define the cost
to obtain the optimal ECGM is based on the A method [27]. By applying A, we are able to find the best matching by exploring only the most promising avenues, which guarantees a global optimal.


SCHEME 2: FACENAME GRAPH MATCHING WITHOUT
NUMBER OF CLUSTER SPECIFIED
Scheme 2 requires no specification for the face cluster number. Standard affinity propagation [29] is utilized for face tracks clustering. The similarity input s(i, k) is set as the Earth Movers Distance (EMD) [30] between face tracks, which is same as introduced in [2]. All face tracks are equally suitable as exemplars and the preferences s(k, k) are set as the median of the input similarities. There are two kind of messages, availability and responsibility, changed between face tracks. With availability a(i, k) initialized to be zero, the responsibilities r(i, k) are computed and updated using the rule
of an ECGM in our name/face ordinal affinity graph matching application as:
r(i, k) s(i, k) max
k,s.t.k=k
{a(i, k) + s(i, k)} (6)
.
(f, g , g ) =
.
c (x) +
c (s) While, a(i, k) is updated using the rule
1 2 vd vi
. xV1V1
xV2V2 .
.
+ 1(x) 2(x)cvs(x) +
cedc(e) a(i, k) min
0, r(k, k) +
max 0, r(i, k)
(7)
xV1
.
1 (e)Â·2 (e)=0 1 (e)=2 (e)
i ,s.t.i i,k
+ 1(e)2(e)ces(e)
1
eE
(4)
The messagepassing procedure converges when the local decisions remain constant for certain number of iterations. In our case, high cluster purity with large number of clusters is encouraged. Therefore, we set the number of iteration as 3 in
where 1(x) 2(x) and 1(e) 2(e) measure the degree
of vertex substitution and edge substitution, respectively.
According to the likelihood of graph distortions during the graph construction process, we assign different costs to the edit operation of vertex substitution, edge substitution and edge creation/destruction. The cost function C is designed as:
C = (cvd, cvi, cvs, ces, cedc) = (, , 1, 1, 2) (5) where 1 and 2 embody the likelihood of different graph
the experiments, to guarantee concise clusters with consistent appearances.
Since no restriction is set on the onetoone facename correspondence, the graph matching method is expected to cope with the situations where several face clusters correspond to the same character name. In view of this, a graph partition step is conducted before graph matching. Traditional graph partition aims at dividing a graph into disjoint subgraphs of the same size [31]. In this paper, graph partition is only used
distortIinotnesr.nWatiiothnoaul tJporuironrakl nOofwElendgginee, ewreinpgeRrfeosremarecxhpaenridmTeenctshnolotogyd(eIJnEoRteTt)h,eNpCroRcTeIsCs Eofd2i0v1id3inCgoonrfiegriennaclefaPcreocgereadpihnsg.sWe 9d9o
Fig. 7. Simultaneously graph partition and matching for scheme 2.
Fig. 8. The example affinity matrices from the movie The Curious Case of Benjamin Button.
not set separate metrics for an optimal graph partition. Instead, graph partition is jointly optimized with the graph matching. This simultaneous process is illustrated in Fig.7. Instead of separately performing graph partition and graph matching, and
demonstrates the partitioned face graph by p =
{(Face1, Face2, Face3), (Face4, Face5), (Face6), (Face7)} from the original face graph of Fig.8(a). The partitioned face affinity matrix is then transformed to the corresponding
using the partitioned face graph as input for graph matching, ordinal affinity matrix Rface(p) according to Section III.
graph partition and graph matching are optimized in a unique framework.
We first define the graph parttion p with respect to the original face graph Gface. Consider N character names and M face track clusters, it divides Gface into N disjoint subgraphs:
Combining with the ECGM based cost representation, the optimal solution for graph partition and graph matching = (p, f) under cost function C is obtained by:
= arg min C(f, Rface (p),Rname ) (11)
p,f
p = {gf ace face
f ace
Consider N character names and M face track clusters,
1 , g2 , Â· Â· Â· , gn
}. (8) there are P N N MN possible partitions, where P N = M Ã—
M M
k
Each subgraph gface is a sublayer of Gf ace with vertex set (M 1) Â· Â· Â·Ã—(M N + 1) denotes the N permutations of M .
Vkf ace, and
N f ace face
Therefore, the solution space for the joint graph partition and
graph matching has P N N MN Â· N ! possible states. A simple
k
Vf ace
k=0V
= V .
face face
M
(9) pthrerperoicseassveisryussmedalltochfailntceer tthaet cdaifnfdeirdeanttefapcaerstitoiof nths.e Ssainmce
i = , i. Vi Vj = , i = j.
k
where Vf ace denotes the vertex set of face graph Gf ace. In this way, the number of vertexes for each subgraph gface,
gkf ace  {1, 2, Â· Â· Â· , M N + 1}, k = 1, 2, Â· Â· Â· , N . The vertexes in the same subgraph are considered from the same
character and their cooccurrence statistics are integrated. The partitioned face graph has the same vertex number with the name graph. The partitioned face affinity matrix by p, Rf ace(p) is calculated as:
Rf ace
character appear at the same time, the face clusters having large affinities in the original face matrix are unlikely from the same character. Therefore, we add the following constraint to
graph partition:
ij
If rface > Tp, (vi, vj ) / Vk. i = j, k = 1, 2, Â· Â· Â· , N.
(12) where Tp is the threshold to allow certain noises. We set
Tp = 0.005 in the experiments. The filtering process will significantly reduce the solution space. For a typical case of
ii (p) =
Rifjace (p) =
.
i
mVface
.
face mm
r .
mn
rface,i =
(10)
20 face clusters and 10 character names, the original solution
space has a huge O(1018) possible states. After filtering, the solution space is reduced to about O(1012).
mVf ace
j.
,nV face
i m=n j
The affinity matrices in Fig.8 are from the movie
International Journal Of Engineering Research and Technology(IJERT), NCRTICE – 2013 Conference Proceedings 100
The Curious Case of Benjamin Button. Fig.8(b)

CONCLUSIONS
We have shown that the proposed two schemes are useful to improve results for clustering and identification of the face tracks extracted from uncontrolled movie videos. From the sensitivity analysis, we have also shown that to some degree, such schemes have better robustness to the noises in constructing affinity graphs than the traditional methods. A third conclusion is a principle for developing robust character identification method: intensity alike noises must be empha sized more than the coverage alike noises.
In the future, we will extend our work to investigate the optimal functions for different movie genres. Another goal of future work is to exploit more character relationships, e.g., the sequential statistics for the speakers, to build affinity graphs and improve the robustness.
REFERENCES

J. Sang, C. Liang, C. Xu, and J. Cheng, Robust movie character identification and the sensitivity analysis, in ICME, 2011, pp. 16.

Y. Zhang, C. Xu, H. Lu, and Y. Huang, Character identification in featurelength films using global facename matching, IEEE Trans. Multimedia, vol. 11, no. 7, pp. 12761288, November 2009.

M. Everingham, J. Sivic, and A. Zissserman, Taking the bite out of automated naming of characters in tv video, in Jounal of Image and Vision Computing, 2009, pp. 545559.

C. Liang, C. Xu, J. Cheng, and H. Lu, Tvparser: An automatic tv video parsing method, in CVPR, 2011, pp. 33773384

J. Sang and C. Xu, Characterbased movie summarization, in ACM MM, 2010.

R. Hong, M. Wang, M. Xu, S. Yan, and T.S. Chua, Dynamic captioning: video accessibility enhancement for hearing impairment, in ACM Multimedia, 2010, pp. 421430.

T. Cour, B. Sapp, C. Jordan, and B. Taskar, Learning from ambiguously labeled images, in CVPR, 2009, pp. 919926.

J. Stallkamp, H. K. Ekenel, and R. Stiefelhagen, Videobased face recognition on realworld data. in ICCV, 2007, pp. 18.

S. Satoh and T. Kanade, Nameit: Association of face and name in video, in Proceedings of CVPR, 1997, pp. 368373.

T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y. W. Teh, E. G. LearnedMiller, and D. A. Forsyth, Names and faces in the news, in CVPR, 2004, pp. 848854.

J. Yang and A. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, in ACM Int. Conf. Multimedia, 2005, pp. 3140.

A. W. Fitzgibbon and A. Zisserman, On affine invariant clustering and automatic cast listing in movies, in ECCV (3), 2002, pp. 304320.

O. Arandjelovic and R. Cipolla, Automatic cast listing in featurelength films with anisotropic manifold space, in CVPR (2), 2006, pp. 1513 1520.

D. Ramanan, S. Baker, and S. Kakade, Leveraging archival video for building face datasets, in ICCV, 2007, pp. 18.

M. Everingham and A. Zisserman, Identifying individuals in video by combining generative and discriminative head models, in ICCV, 2005, pp. 11031110.

R. G. Cinbis, J. Verbeek, and C. Schmid, Unsupervised Metric Learning for Face Identification in TV Video, in International Conference on Computer Vision, 2011, to appear.

M. Xu, X. Yuan, J. Shen, and S. Yan, Cast2face: character identification in movie with actorcharacter correspondence, in ACM Multimedia, 2010, pp. 831834.

M. Everingham, J. Sivic, and A. Zissserman, Hello! my name is… buffy automatic naming of characters in tv video, in Proceedings of BMVC, 2006, pp. 889908.

J. Sivic, M. Everingham, and A. Zissserman, Who are you? – learning person specific classifiers from video, in Proceedings of CVPR, 2009.

T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, Movie/script: Align ment and parsing of video and text transcription, in ECCV (4), 2008, pp. 158171.

T. Cour, B. Sapp, A. Nagle, and B. Taskar, Talking pictures: Temporal
10141021.

Y. Zhang, C. Xu, J. Cheng, and H. Lu, Naming faces in films using hypergraph matching, in ICME, 2009, pp. 278281.

A. Saltelli, M. Ratto, S. Tarantola, and F. Campolongo, Sensitivity analysis for chemical models, Chemical Reviews, vol. 105, no. 7, pp. 2811 2828, 2005.

E. Bini, M. D. Natale and G. Buttazzo, Sensitivity analysis for fixed priority realtime systems, Realtime Systems, vol. 39, no. 1, pp. 530, 2008.

R. E. Hanneman, Introduction to Social Network Methods. Riverside, CA: University of California: Online Textbook Supporting Sociology 157., 2000.

E. Bengoetxea, Inexact graph matching using estimation of distribution algorithms, PhD thesis, Ecole Nationale Superieure des Telecommuni cations, 2002.

A. Sanfeliu and K. Fu, A distance measure between attributed relational graphs for pattern recognition, IEEE Trams. on SMC, vol. 13, no. 3,
1983.

H. Bunke, On a relation between graph edit distance andmaximum common subgraph, Pattern Recognition Letters, vol. 18, pp. 689694, 1997.

B. J. Frey and D. Dueck, Clustering by passing messages between data points, Science, vol. 315, pp. 972977, 2007.

Y. Rubnr, C. Tomasi, and L. J. Guibas, A metric for distributions with applications to image databases, in ICCV, 1998, pp. 5966.

L. Lin, X. Liu, and S. C. Zhu, Layered graph matching with composite cluster sampling, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 8, pp. 14261442, 2010.

H. Chui and A. Rangarajan, A new point matching algorithm for non rigid registration, Computer Vision and Image Understanding, vol. 89, no. 23, pp. 114141, 2003.

Y. Li, H. Z. Ai, C. Huang and S. H. Lao, Robust head tracking with particles based on multiple cues fusion, HCI/ECCV, 2006, pp. 2939.
groupiInngtaenrdndaitailoongaslupJeoruvirsneadlpOersfoEn nregcoingneietiroinn,g iRn CesVePaRr,c2h01a0n, dppT. echnology(IJERT), NCRTICE – 2013 Conference Proceedings
101