 Open Access
 Total Downloads : 593
 Authors : V. Balaji, R. Sakthi Kumar
 Paper ID : IJERTV3IS10994
 Volume & Issue : Volume 03, Issue 01 (January 2014)
 Published (First Online): 28012014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
FPGA Implementation of Image Processing Architecture for Various Dip Applications
V. Balaji, R. Sakthi Kumar
PG scholar
Abstract
Digital image processing is mainly focused on ever expanding and dynamic area with applications reaching out into our day today life such as medicine, security purpose, space exploration, surveillance, identification & authentication, automatic industry inspection etc. Applications such as these involve different operations like image compression, image enhancement, object detection and Noise removing. Implementing the image processing applications on a computer can be easier one, but not efficient due to additional constraints on memory and other peripheral devices. However, most general purpose hardware is not suited for strong realtime constraints. This paper gives the implementation of median filter image processing on FPGA. The processors architecture is combining with a reconfigurable binary processing module, input and output image controller units, and peripheral circuits. Reconfigurable binary processing module will perform DCT application and sobel filter, for a 256Ã—256 image. The periphery circuits control the whole image processing and dynamic reconfiguration process .The simulation and experimental results demonstrate that the processor is suitable for realtime binary image processing applications.

Introduction
Image processing is any form of signal processing for which the input is an image, such as a photograph or video signal; the output of image processing may be either an image or a set of characteristics or parameters related to the image. Most of the imageprocessing techniques involves treating the image as a twodimensional signal and applying standard signal processing techniques to it. Digital image processing is the method of computer algorithms to perform image processing on digital images, digital image processing has many advantages over analog image Processing. It allows a many algorithms to be applied to the input data and can avoid problems such as the buildup Of noise and signal distortion during processing. Since images are defined over 2dimensions digital image processing may be modeled in the form of multidimensional systems. Generalpurpose chips
have the architecture of a digital processor, in which each digital processor handles pixel by pixel. When larges sized images are processed, the chip size will become extremely large. Thus, further analyzing needed to design a high performance, small size, and wide range of application for real time binary image processing applications.
This paper presents a binary image processor that consists of a reconfigurable binary processing module, including reconfigurable binary computational units and output control logic, input and output image controller units, and peripheral circuits. The reconfigurable binary compute units are mixed grained architecture, which has the advantages of more flexibility, efficiency and high speed and performance. The processor performance is enhanced by using dynamic reconfiguration method. The processor is implemented to perform real time binary image processing applications. It is found that the processor can process pixellevel images and extract image features. Basic mathematical median operations and complicated algorithms can easily be implemented on it. The processor has the advantages of small size, high speed and simple structure, and wide range of applications. CSD (canonical sign digit) is a simple and hardwareefficient algorithm for the implementation of various elementary, especially trigonometric, functions. Instead of using Calculus based methods such as polynomial or rational functional approximation, it uses simple shift, add, subtract and table lookup operations to achieve this objective Discrete Cosine Transformation (DCT) is the most widely used transformation algorithm. DCT, first proposed by Ahmed [9] et al, 1974, has got more importance in recent years, especially in the fields of Image Compression and Video Compression. This chapter focuses on efficient hardware implementation of DCT by decreasing the number of computations, enhancing the accuracy of reconstruction of the original data, and decreasing chip area. As a result of which the power consumption also decreases. DCT also improves speed, as compared to other standard Image compression algorithms like JPEG. A programmable single instruction multiple data (SIMD) real time vision chip was presented to
achieve highspeed target tracking [10]. In [24], a programmable binary morphology coprocessor was introduced to the visual content analysis engine of the chip used for visual surveillance. A reconfigurable image processing accelerator incorporating eight macro processing elements was designed to support realtime change detection and background registration based on video object segmentation algorithm. Recently, a vision chip with the architecture of a massively parallel cellular array of processing elements was presented for image processing by using the asynchronous or synchronous processing technique Other general purpose chips have the architecture of a digital processor array, in which each digital processor handles one pixel. When large sized images are processed, the chips will become extremely large. Thus, further studies are needed to design a high performance, small size, and wide application range chip for realtime binary image processing DCT applications. This paper presents a binary image processor that consists of a reconfigurable binary processing module, including reconfigurable binary compute units and output control logic, input and output image control units, and peripheral circuits

Reconfigurable Image Processor
FIELDPROGRAMMABLE GATE ARRAYS
(FPGA) were introduced a decade ago, they have only recently becomes very popular. This is not only the fact of programmable logic saves development cost and reducing the time over and complex ASIC designs, but also because the gate counts per FPGA chip has reached numbers that allow for the implementation of more complex applications[11]. Many present days applications utilize a processor and other logic on two or more individual chips. However, with the anticipated ability to build chips with over ten million transistors, it will become possible to implement a processor within a sea of programmable logic, all on one chip.
Such a design approach would allow a great degree of programmability freedom, both in hardware and in software: EDA tools could decide which parts of a source code program are actually to be executed in software and which other parts are enhanced with hardware. The hardware implementation may be needed for application interfacing reasons or may simply represent a coprocessor used to improve execution time. Programmable logic need not only be used for application speedup, it can also be employed as intelligent glue logic for custom interfacing
purposes such as in embedded. Controller applications. Current singlechip embedded processors attempt to provide very exible interfaces that can be used in a large number of applications.

Implementation of on chip processer
Fig. 1. Reconfigurable Image Processor
However, they can often result in interfaces that are less efficient than intended. Furthermore, it might be desirable to perform some bitlevel data computations inbetween the main processor and the actual I/O interface. This paper also investigates the requirements for providing a general purpose eldcongurable interface for embedded processor applications. The Reconfigrable image processor is shown in the Fig. 1. The processors architecture is a combination of a reconfigurable binary processing module, input and output image controller units, and peripheral circuits and on chip memory unit and NIOS2 processor. The reconfigurable binary processing module will perform image compression operations and edge detection operation. The input image is given to preprocessing controller unit after the process the image is loaded into on chip memory unit. Initially analogue image is converted into digital and impulse noise is added using MATLAB. And image is converted into180 x 180 sizes and totally 3600 blocks are stored in text file. The text file accessed by modelsim and calculating the median values and remove the salt and pepper noise. NIOS II processer is used as a controller circuits. Gated clock is used to disable the idle blocks to reduce unnecessary transitions .FIFO synchronization is used to synchronies all the units.
DISCRETE COSINE TRANSFORM –
To Compress Image
SOBEL FILETR – To detect edges

Image Processing Applications
The reconfigurable binary compute units are of a mixed grained architecture, which has the characteristics of high flexibility, efficiency, and
performance. The performance of the processor is enhanced by using the dynamic reconfiguration approach. The processor is implemented to perform real time binary image processing. It is found that the
Processor can process pixellevel images and extract image features, such as boundary and motion images. Basic mathematical median operations and complicated algorithms can easily be implemented on it. The processor has the merit of high speed, simple structure, and wide application range. Although eld programmable gate arrays (FPGA) were introduced a decade ago, they have only recently become more popular. This is not only due to the fact that programmable logic saves development cost and time over increasingly complex ASIC designs, but also because the gate count per FPGA chip has reached numbers that allow for the implementation of more complex applications.


Discrete Cosine Transform
Multimedia data processing, which encompasses almost every aspects of our daily life such as communication broad casting, data search, advertisement, video games, etc has become an integral part of our life style. The most significant part of multimedia systems is application involving image or video, which require computationally intensive data processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia application to run on these portable devices. In order to reduce the volume of multimedia data over wireless channel compression techniques are widely used. Discrete cosine transform (DCT) is one of the major compression schemes owing to its near optimal performance. Its energy compaction efficiency is also greater than any other transform.

Low Complexity 2D Dct Using 1D Dct
Decomposed Matrix
The 1D 8point DCT can be expressed as follows:
(2)
Where xm denotes the input data;
Zn denotes the transform output; Kn = sqrt(1/2) for n=0 .
By neglecting the scaling factor 1/2, the 1D 8 point DCT in (2) can be Divided into even and odd parts:
Fig.2 Decomposed DCT
In 8 point DCT 8 input values are multiplied with 8 x 8 DCT matrix. For getting all 8 outputs 64 multipliers
are used. In decomposed DCT architecture by adding one preprocessing unit we reduce the multipliers usage by 50 %( only 32 multipliers
used). In preprocessing unit we used only adders. Overall we can reduce the hardware complexity.

Binary Conversion
Many techniques have been used to efficiently convert this floating point values into binary representation for digital implementation. Then only we can implement DCT in VLSI.
The two ways of floating point to binary conversion are
(1).Both integral and fractional part is converted separately by repeatedly multiply 2, and considers each one bit as it appears left of the decimal.

DCT coefficients
The 1D DCT given by equation (2) can be split into two matrixes, the odd
The 1D DCT given by equation (5) can be split into two matrixes, the odd and the even.
The odd 1D DCT can be expressed as
The even 1D DCT can be expressed as
where ck = cos k/16 , a = c1, b = c2, c = c3, d =
c4, e = c5, f = c6, g = c7 are the cosine basis.
From the equations (3) and (4), it can be stated that the DCT operation involves multiplication of various cosine coefficients with a fixed input sequence. Hence sub structure sharing technique is used to reduce the number of operators [6]. The
cosine basis is quantized to 8bits for energy efficiency. The cosine coefficients are represented as CSD number which has the advantage of reduced number of ones compared to the binary representation. The cosine basis is chosen up to four decimal places and each one is represented as 7 bit binary number. The number of bits has an impact on the quality of the system. The values of the cosine basis are shown in the Table below. The stronger operator, multiplication is transformed to simple shift and adds operations by applying Horners rule. This reduces the power consumption. For example, consider the cosine coefficients c and g, c *X = 25 + 24 + 22 +1 (X) = (24 (3) + 5) (X) and g*X = 23 + 22 (X)= 22(3) (X)
and the common terms they share is 3X. The
common terms among the cosine basis are 1X, 3X, 5X, and 1X and are shared to compute the partial outputs.
Table 1. Cosine Basis Set
These blocks are termed as precomputing units and an unit is shown in the Figure. The intermediate results from the precomputing blocks are added in the final stage yielding the DCT coefficients. The 3A is constructed by expressing it as 3A = 1A+2A
= {1A + (1A<<1)}. Similarly the 5A can be expressed as {1A + (1A<<2)}. and g, c *X = 25 + 24 + 22 +1 (X) = (24 (3) + 5) (X) and g*X = 23 + 22
(X)= 22(3) (X) and the common terms they share is 3X. The common terms among the cosine basis are 1X, 3X, 5X, and 1X and are shared to compute the partial outputs.

Multiplication is expensive in hardware

Decompose constant multiplications into shifts and additions\

13*X = (1101)2*X = X + X<<2 + X<<3

Signed digits can reduce the number of additions/subtractions

Canonical Signed Digits (CSD)

(57)10 = (0110111)2 = (100 1001)CSD

Up to 50% reduction



Performance Results
The image is converted into pixels using MATLAB and the values are stored as a text file. The text file is accessed by the Model sim ALTERA and the corresponding 2D DCT coefficients are calculated. These values are then fed to the IDCT module which returns the spatial data sequence. These data are written to a text file. The image can be reconstructed from the text file using MATLAB coding.
Fig 3. Simulated output
Table 2. Area comparison table
Fig 4. Input and reconstructed image
6.Conclusion
In this paper, a reconfigurable binary image processor was proposed to perform realtime binary image processing applications. The processor is combination of a reconfigurable binary processing module, input and output image controller units, and peripheral circuits. The reconfigurable binary processing module has a mixedgrained architecture with the characteristics of high efficiency and increase the processor performance. Basic DCT application and mathematical morphology operations can be easily implemented on its simple structure. The processor featured by simple structure, high speed, and wide range of applications are suitable for binary image processing.This increases the efficiency of the system. The filter can removes noise even at higher noise densities and preserves he edges and fine details. The performance of the filter is better when compared to the other filter of this type. The developed filters are tested using 180X180, 8 bits/pixel images. Different levels and the results are compared with MATLAB implementation.
References

Y. Liu and C. PomalazaRaez, A low complexity algorithm for the onchip moment computation of binary images, in Proc. Int. Conf. Mechatron. Autom., 2009, pp. 18711876.

E. C. Pedrino, O. Morandin, Jr., and V. O. Roda, Intelligent FPGA based system for shape recognition, in Proc. 7th Southern Conf. Programmable Logic, 2011, pp. 197202.

M. F. Talu and I. Turkoglu, A novel object recognition method based on improved edge tracing for binary images, in Proc. Int. Conf. Appl. Inform. Commun. Technol., 2009, pp. 15.

A. J. Lipton, H. Fujiyoshi, and R. S. Patil, Moving target classification and tracking from realtime video, in Proc. Workshop Appl. Comput. Vision, 1998, pp. 814.

J. Kim, J. Park, K. Lee et al., A portable surveillance camera architecture using onebit motion detection, IEEE Trans. Consumer Electron., vol. 53, no. 4, pp. 12541259, Nov. 2007.

D. J. Dailey, F. W. Cathey, and S. Pumrin, An algorithm to estimate mean traffic speed using uncalibrated cameras, IEEE Trans. Intell. Transportation Syst., vol. 1, no. 2, pp. 98107, Jun. 2000.

T. Ikenaga and T. Ogura, A fully parallel 1Mb CAM LSI for realtime pixelparallel image processing, IEEE J. SolidState Circuits, vol. 35, no. 4, pp. 536544, Apr. 2000.

E. C. Pedrino, J. H. Saito, and V. O. Roda, Architecture for binary mathematical morphology reconfigurable by genetic programming, in Proc. 6th Southern Programmable Logic Conf., 2010, pp. 9398.

M. R. Lyu, J. Song, and M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Trans. Circuit Syst. Video Technol., vol. 15, no. 2, pp. 243255, Feb. 2005.

W. Miao, Q. Lin, W. Zhang et al., A programmable SIMD vision chip for realtime vision applications, IEEE J. SolidState Circuits, vol. 43, no. 6, pp. 14701479, Jun. 2008.

Bin Zhang, Kuizhi Mei and Nanning Zheng,(MAY 2013), Reconfigurable Processor for Binary Image Processing, IEEE Transactions On Circuits And Systems For Video Technology.

K. Fujii, M. Nakanishi, S. Shigematsu et al., A 500dpi cellularlogic processing array for fingerprintimage enhancement and verification, in Proc. IEEE Custom Integr. Circuits Conf., May 2002, pp. 261264.

H. J. Park, K. B. Kim, J. H. Kim et al., A novel motion detection pointing device using a binary CMOS image sensor, in Proc. IEEE Int. Symp. Circuits Syst., May 2007, pp. 837840.

M. Laiho, J. Poikonen, and A. Paasio, Space dependent binary image processing within a 64Ã—64 mixedmode array processor, in Proc. Eur. Conf. Circuit Theory Design, 2009, pp. 189192.

E. N. Malamas, A. G. Malamos, and T. A. Varvarigou, Fast implementation of binary morphological operations on hardwareefficient systolic architectures, J. VLSI Signal Process., vol. 25, no. 1, pp. 7993, 2000.

J. Velten and A. Kummert, Implementation of a highperformance hardware architecture for binary morphological image processing
operations, in Proc. 47th IEEE Int. Midwest Symp. Circuits Syst., Jul. 2004, pp. 241244.

R. DominguezCastro, S. Espejo, A. RodriguezVazquez et al., A 0.8m CMOS 2D programmable mixedsignal focalplane array processor with onchip binary imaging and instructions storage, IEEE J. SolidState Circuits, vol. 32, no. 7, pp. 10131026, Jul. 1997
BIOGRAPHIES
V.Balaji received the B.E Degree in Electronics and Communication Engineering from the Sri Ramakrishna Engineering College, Coimbatore in 2011. He is currently pursuing the M.E Degree in VLSI Design in Kalaignar Karunanidhi Institute of Technology, Coimbatore. His areas of interest are Image Processing and very large scale integration Architecture design for embedded vision systems.
R.Sakthikumar received the B.E Degree in Electronics and Communication Engineering from the Sri Subramanya college of Engineering and Technology, Palani in 2011. He is currently pursuing the M.E Degree in VLSI Design in Sengunthar Engineering College, Tiruchengode. His areas of interest are Image Process.