Self-Learning based Image Denoising

DOI : 10.17577/IJERTCONV3IS19079

Download Full-Text PDF Cite this Publication

Text Only Version

Self-Learning based Image Denoising

Rojan Danie Jose.

Dept of Computer Science and Engineering T.John Institute of Technology Bangalore, India

Ms. Rajitha Nair., Asst. Professor Dept of Computer Science and Engineering T.John Institute of Technology

Bangalore, India

Abstract: Decomposition of an image into multiple semantic components has been an effective research topic for various image processing applications such as image denoising, enhancement, and inpainting. In this paper, present a novel self-learning based image decomposition framework. Based on the recent success of sparse representation, the proposed framework first learns an over-complete dictionary from the high spatial frequency parts of the input image for reconstruction purposes. Here perform unsupervised clustering on the observed dictionary atoms (and their corresponding reconstructed image versions) via affinity propagation, which allows us to identify image- dependent components with similar context information. While applying the proposed method for the applications of image denoising, automatically determine the undesirable patterns (e.g., rain streaks or Gaussian noise) from the derived image components directly from the input image, so that the task of single- image denoising can be addressed. Different from prior image processing works with sparse representation, the proposed method does not need to collect training image data in advance, nor do assume image priors such as the relationship between input and output image dictionaries. So here conduct experiments on two denoising problems: single-image denoising with Gaussian noise and rain removal. The empirical results confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art image denoising algorithms.

Index Terms: Denoising, image decomposition, rain removal, self learning, and sparse representation.


Assuming an image is a linear mixture of multiple source components, image decomposition aims at determining such components and the associated weights [1], [2]. For example, how to properly divide an image into texture and non-texture parts has been investigated in applications of image compression [3], image inpainting [4], [5], or related image analysis and synthesis tasks. Consider a fundamental problem of decomposing an image of N pixels into C different N-dimensional components, one need to solve a linear regression problem with unknown variables. While this problem is ill-posed, image sparsity prior has been exploited to address this task [1]. As a result, an input image can be morphologically decomposed into different patches based on such priors for a variety of image processing applications. Before providing the overview and highlighting the contributions of our proposed method, we will first brie fly review morphological component analysis (MCA), which is a sparse representation based image decomposition algorithm, and has been successfully applied and extended to solve the problems of image denoising [6][8], image inpainting [5], [8], and image

deraining (i.e., rain removal) [9], [10].

  1. MCA for Image Decomposition

    MCA utilizes the morphological diversity of different features contained in the data to be decomposed and to associate each morphological component to a dictionary of atoms [1], [5], [11]. Suppose an image I of N pixels is a superposition of K components (called morphological components), denoted by , where denotes the k th component, such as the geometric or textural component

    of the image I. To decompose I into , , MCA iteratively minimizes the following energy function:

    1. Where denotes the sparse coefficients corresponding to with respect to the dictionary, is a regularization parameter, and is the energy function de fined according to the type of (global or local dictionary). Extracted patches, to minimize while fixing; and p-th patch extracted from, and P is the total number of (ii) update of the components: this step updates or while fixing or. More details about MCA can be found in [1], [5], [11].

  2. Overview and Contributions of the proposed Method

In this paper, we propose a self-learning based image decomposition framework. The proposed method identifies image components based on semantical similarity and thus can be easily applied to the applications of image denoising. Unlike prior learning-based image decomposition or denoising works which require the collection of training image data (e.g., raw/noisy inputs vs. denoised outputs, or low-resolution vs. high-resolution output images), our proposed method advocates the self learning of the input (noisy) image directly. After observing dictionary atoms with high spatial frequency (i.e., potential noisy patterns), we advance the unsupervised clustering algorithm of affinity propagation without any prior knowledge of the number of clusters, which allows us to automatically identify the dictionary atoms which correspond to undesirable noise patterns. As a result, removing such noise from the input image can be achieved

by performing image reconstruction without using the associated dictionary atoms. From the above explanation, it can be seen that our proposed method does not require any external training image data (e.g., noisy and ground truth image pairs), and no user interaction or prior knowledge is needed either. Therefore, our method can be considered as an unsupervised approach. And, as verified by our experiments, our method can be directly applied to a single input image and solve single-image problems of rain streaks and Gaussian noise removal. The former type of noise can be considered structured noise patterns, and the latter as the unstructured ones.

The major contribution of this paper is tri-fold: (i) unlike prior MCA based approaches [1], [5], [11], our proposed method allows one to decompose an input image and to observe its representation without the need to learn from pre-collected training data. We do not assume any image priors such as the relationship between the input and desirable output images either. This makes single-image based applications applicable in real-world scenarios; (ii) we advance affinity propagation for identifying key image components which exhibit similar context information, so that those associated with noise or undesirable patterns can be disregarded for automatic image denoising; (iii) while our proposed framework can be applied to address the tasks of single-image denoising and rain removal, we further show that we do not limit our method to the use of any specific preprocessing techniques when retrieving the high spatial frequency parts (e.g., bilateral filtering [12], K-SVD-based image denoising [7], and BM3D filtering [13]).

The effectiveness and robustness of our method will later be confirmed by our experiments. The rest of this paper is organized as follows. We brie fly review sparse representation and dictionary learning techniques in Section II. Section III presents our proposed framework for image decomposition via self-learning. Sections IV and V explain how to apply our method for single-image rain removal and denoising, respectively. Experimental results on two types of denoising problems will be presented in Section VI, followed by the conclusions of this paper



    1. Sparse Representation Sparse coding [14][16] is a technique of representing a signal in terms of a compact linear combination of a set of basis signals (or atoms) from a dictionary. A pioneering working imag sparse representation [14] stated that the receptive fields of simple cells in mammalian primary visual cortex can be characterized as being spatially localized, oriented, and band passed. It was shown that a coding strategy that maximizes sparsity is sufficient to account for the above properties, and that a learning algorithm attempting to determine sparse linear codes for natural scenes will develop a complete family of localized, oriented, and band passed receptive fields. For each image patch extracted from an image I, we can find the corresponding sparse

      coefficient vector with respect to a given dictionary (described in Section II-B) by solving the following optimization problem

      Where is a regularization parameter. It has been shown that

      1. can be efficiently solved using the orthogonal matching pursuit (OMP) algorithm [6], [15], [17].

    2. Dictionary Learning

    To construct a dictionary to sparsely represent each patch extracted from an image, one can use a set of training image patches, for learning purposes. To derive a dictionary which satisfies the above sparse coding scheme, we solve the following optimization problem [6], [17]:

    Where denotes the sparse coefficient vector of with respect to and is a regularization parameter. Equation (3) can be efficiently solved by performing a dictionary learning algorithm, such as online dictionary learning [17] or K- SVD [6] algorithms.


    Fig. 1 shows the proposed framework for image decomposition. As shown in this figure, we approach this problem by solving the task of dictionary learning for image sparse representation, followed by the learning of context-aware image components. While the former aims at reconstructing an input image using sparse representation techniques, the latter will be utilized to identify key image components based on their context information (and thus can be applied for denoising purposes). The notations used in this paper are summarized in Table I. We now detail our proposed method.

    1. Dictionary Learning for Image Sparse Representation In this paper, we focus on addressing image denoising problems. In our proposed framework, we first separate the high spatial frequency parts from the low spatial frequency parts for an input image. This is because most undesirable noise patterns like rain streaks or Gaussian noise are of this type. In order to achieve, we consider the use of three low- pass filtering (LPF) or denoising techniques: bilateral [12], K-SVD [7], and BM3D [13] as the preprocessing stage. We note that these LPF or denoising techniques can be replaced by band-pass filtering if the noise of interest is known to be associated with a particular frequency band. Nevertheless, we can produce by subtracting the resulting smoothed/ filtered version from. However, since we do not have prior knowledge or assumptions on the type of noise to be removed, it is not clear how to identify the image components of which correspond to undesirable noise patterns. As discussed in Section I, MCA has been successfully applied to decompose an image into different components/atoms.

      However, traditional MCA approaches usually use a fixed dictionary (e.g., discrete cosine transform (DCT), wavelet, or curvelet basis) to sparsely represent an image component. For these cases, the selection of dictionaries and parameters become heavily empirical, and the results will be sensitive to the choice of dictionaries. While some advanced training image data to learn dictionaries for improved representation, how to select a proper image set in advance for training remains a challenging problem.

      Moreover, the collection of training data might not be practical in many real-world applications such as single image based processing tasks. Therefore, different from the traditional MCA using fixed dictionaries, we advocate the learning of dictionary directly from the input image. More precisely, we only learn a dictionary based on the high spatial frequency part of the input image (i.e., ). Once such a dictionary is observed, the remaining task is to automatically identify the undesirable components/patterns which correspond to noise, so that one can perform image reconstruction without using such components for achieving image denoising. To be more specific, we extract patches of size from for learning the dictionary via solving (3). We apply an online dictionary learning algorithm [17] for solving the following problem:

      where denotes the sparse coefficient vector of with respect to and is a regularization parameter. The learned dictionary contains M atoms and thus is of size (we have).

    2. Learning of Context-Aware Image Components It is worth noting that, the atoms of are not necessarily distinct from each other in such a over-complete dictionary for sparse image representation. Therefore, it is not easy to estimate the undesirable image patterns in using the observed dictionary atoms. Inspired by MCA, we proposed to separate these atoms into disjoint groups (i.e., those within the same group are semantically similar to each other). Thus, it will be possible to determine the group (and their components) associated with the noise of interest, and the task of image denoising can be achieved by performing image reconstruction without using those undesirable components. We approach this task as solving an unsupervised clustering problem. We group the aforementioned atoms, into different clusters, so that the atoms within the same group will share similar edge or texture information.

    Since the number of clusters is not known, we apply affinity propagation [18] for solving this task, which minimizes the net similarity (NS) between atoms:

    In (5), the function measures the similarity between atoms and. In order to group atoms share similar edge or texture information, we de fine the similarity function as where extracts the features of Histogram of Oriented Gradients [19] describing the shape/texture information of the atom. The coefficient indicates that the atom is the exemplar (i.e., the cluster representative) of the atom, and thus is categorized to cluster (and equals 1 since it is the exemplar cluster). The first term in (5) is to calculate the similarity between atoms with each cluster, while the second term penalizes the case when atoms are assigned to an empty cluster (i.e., but with) The third term in (5), on the other hand, penalizes the condition when atoms belong to more than one cluster, or no cluster label is

    assigned. In practice, the parameter is set to avoid the aforementioned problems. We note that, in addition to HOG, other features describing shape or textural information can also be considered in our work. For example, the features of Histogram of Orientation of Streaks (HOS) have recently been utilized in [20] for representing the patterns of rain or snow in video frames. Since our work focuses on the self-learning and decomposition of an input image, we are particularly interested in identifying and removing a dominant undesirable noise pattern from the input. From our experiments, the use of HOG features is sufficient for us to achieve this goal. After automatically grouping the extracted dictionary atoms into different image clusters, we can derive image components associated with each cluster. That is, the p-th patch of is computed from where is a vector whose nonzero entries are only those associated with the atoms in the k-th cluster. Each image component can considered as being associated to a particular type of context information, as depicted in Fig. 1. This completes the task of image decomposition. In the following sections, we will discuss how we apply this proposed framework for twno denoising tasks, in which rain streaks and Gaussian oise need to be automatically identified and removed.


    1. Vision-Based Rain Removal Since rain streaks presented in images or videos cause complex visual effects on spatial or temporal domains, which may significantly reduce user satisfaction or degrade the performances of surveillance- related applications, removal of such patterns from images/videos has recently received much attention from researchers [9], [10], [20][22]. Prior vision-based approaches typically focus on detecting and removing rain streaks in a video, where both the spatial and temporal information in the video can be employed for rain removal. A pioneering work on rain removal from videos was proposed in [21], in which a correlation model capturing the dynamics of rain and a physics-based motion blur model characterizing the photometry of rain were developed. On the other hand, when only a single image is available, it becomes a more challenging task to detect and remove such noise patterns.

      In [9], we have proposed a single-image rain removal framework, which approaches the rain removal task as the image decomposition problem based on MCA [1]. In this work, we first separated a rain image into low and high- frequency parts via bilateral filtering [12]. The high-

      frequency part was then decomposed into the rain component and the non rain component by learning the associated sparse representation based dictionaries for representing rain and non-rain components, respectively. To achieve this, K means clustering (K = 2) was performed on the learned dictionary atoms for distinguishing between rain and non rain atoms. We note that, while satisfactory results were reported in [9], their use of K-means clustering for separating the rain streak patterns from non-rain ones, and the collection of training image data for dictionary learning, might limit the performance.

    2. Our Proposed Method for Single Image Rain Removal In this study, we formulate the problem of single image rain removal as an image decomposition problem as [9] did. As illustrated in Fig. 1 and discussed in Section III, we first decompose an input rain image I into and using existing low-pass filtering techniques. Fig. 2 shows examples of producing using different filtering techniques. Once is obtained, we learn the dictionary for representation purposes, and the dictionary atoms will be grouped into different clusters based on its HOG features via affinity propagation.

    This clustering stage is to identify dictionary atoms which are similar to each other in terms of their context information. Once this stage is complete, we obtain multiple subsets of dictionary atoms, where. Recall that, each contains dictionary atoms with similar HOG features. As illustrated in Fig. 1, we can reconstruct the image component using the corresponding dictionary set.

    For the task of image rain removal, one of the images from the observed groups would indicate the high spatial frequency rain streak pattern. To identify such patterns, we consider the variance of gradients for each dictionary atoms associated with each group, i.e., we calculate the variance of HOG features of in. If the noise patterns of interest are the rain streaks, the edge directions of the rain streaks would be consistent throughout the patches in and thus dominates one of the resulting clusters. In this case, the variance of the atoms in that cluster would be the smallest among those across different clusters, and thus we can determine the cluster and its components corresponding to such noise patterns accordingly. Once the atoms/components associated with such noise are identified and removed, we can use the remaining atoms for reconstructing the high frequency part of the image. Adding the low spatial frequency parts back to this recovered output, the denoised version of is produced. Fig.

    3 shows example rain removal results using different filtering techniques. It is worth noting that, if the noise pattern or background texture of an input image is very similar to that of the rain streaks (even in a different orientation), we would also observe a low variance value for the associated HOG features. In this case, we do not expect to differentiate between these two similar textural patterns using our proposed method. For this challenging case, one should consider temporal information for solving this particular rain removal task. Since this paper focuses on the self-learning algorithms for single-image decomposition and its potential applications, tasks beyond single-image processing would be out of the scope of this paper.


    A .Image Denoising

    The goal of image denoising is to remove unstructured or structured noise from an image which is acquired in the presence of an additive noise [23]. Numerous approaches have been proposed to address this problem in the literature [7], [8], [13], [24]. Extended from image denoising, algorithms have also been proposed for addressing particular image processing tasks. An example is bilateral filtering [12], which performs image denoising via Gaussian blur while being able to preserve the edge information. Recently, the use of sparse and redundant representations has been successfully applied to address this task [6][8]. With a predetermined dictionary or the one learned from the input image itself, one can effectively recover the denoised version. A representative sparse- representation based denoising work is the K-SVD approach [6], [7]. Once the standard deviation of the Gaussian noise is given, very promising denoising results were reported in [6], [7]. Another popular method is BM3D (block-matching and 3D filtering) [13], which is also on the image sparse representation in the transformed domain. Similar to K-SVD, BM3D also requires the prior knowledge of the standard deviation of the Gaussian noise. B .Our Proposed Method for Removing Gaussian Noise Besides rain removal, we further apply our proposed decomposition method for removing Gaussian noise from input images. It is worth noting that, unlike K-SVD or BM3D, we do not need the standard deviation of such noise patterns to be given in advance, which makes our method more practical for real-world applications. Similar to rain removal, we first decompose the input into and. Fig. 4 shows examples of producing using different filtering

    techniques. Once is obtained, we learn the dictionary and extract the HOG features for each atom. We note that, while HOG is not expected to describe the Gaussian noise, the presence of such noise would result in HOG features in which each bin/attribute is not distinguishable. On the other hand, for noise-free dictionary atoms, we still observe dominant attributes in their HOG features. As a result, the use of HOG still allows us to perform clustering of dictionary atoms. In other words, even the standard deviation of the Gaussian noise is not given, we are still able to identify the image component which corresponds to the presence of such noise using our decomposition and clustering framework. Once this noise component is identified and disregarded, we can reconstruct the image using the remaining HF components and, and example results are shown in Fig. 5.


To evaluate the performance of our proposed method, we conduct experiments for addressing two single- image denoising tasks: rain removal and denoising (with Gaussian noise). We consider the patch size of each image as 16 x 16 pixels, and the number of dictionary atoms. As suggested in [17], the regularization parameter and the maximum sparsity value for the OMP algorithm are set as

0.15 and 10, respectively. For LPF preprocessing techniques, we have the spatial and intensity-domain standard deviations for bilateral filtering as 6 and 0.2, respectively. All images are of size 256 x 256 pixels in our experiments.

A. Performance Evaluation on Single Image Rain Removal we collect several synthetic rain images from the Internet or the photo-realistically rendered rain video frames provided in [21], and thus we have ground-truth images without rain streaks presented for PSNR calculation. To evaluate the performance of our poposed method for rain removal, we compare our method with bilateral filtering (denoted by Bilateral) [12], K-SVD [7], and BM3D [13] denoising algorithms. We set large standard deviation values and 35 for K-SVD and BM3D algorithms, respectively. We note that, during the preprocessing stage of our framework, larger values allow us to remove high spatial frequency patterns including possible rain streaks from the low spatial frequency parts of the input image. We do not (and it is not possible) fine tune such parameters for removing the rain streaks only. In addition to the above methods, we consider two of our prior rain removal works: MCA-based rain removal (denoted by MCA-based) [9] and rain removal via common context pattern discovery (denoted by Context based) [10]. These two methods can be considered as bilateral- filtering based methods, since they require a LPF stage with a bilateral filter. Table II lists the PSNR values of different bilateral- filtering based methods over three different rain images. From this table, we see that our proposed method achieved the highest or comparable PSNR values among different approaches. To show that we do not limit the use of bilateral filtering as the LPF algorithm, we further apply K-SVM and BM3D in our preprocessing stage, and compare the rain removal results with using these two denoising algorithms directly. As listed in Table III, it can be seen that our proposed method clearly improved the PSNR values than these two state-of- the-art denoising algorithms.


In this paper, we presented a learning-based image decomposition framework for single image denoising. The proposed framework first observes the dictionary atoms from the input image for image representation. Image components associated with different context information will be automatically learned from the grouping of the

derived dictionary atoms, which does not need the prior knowledge on the type of images nor the collection of training image data. To address the task of image denoising, our proposed method is able to identify image components which correspond to undesired noise patterns. Experiments on two types of single image denoising tasks (with structured and unstructured noise) confirmed the use of our proposed method, which was shown to quantitatively and qualitatively outperform existing denoising approaches.

Leave a Reply