Application of Moving Window Algorithm for the Receiver FPGA in Ultrasound Transient Elastography

DOI : 10.17577/IJERTV3IS110257

Download Full-Text PDF Cite this Publication

Text Only Version

Application of Moving Window Algorithm for the Receiver FPGA in Ultrasound Transient Elastography

  1. Jean Rossario Raj Research Scholar

    Centre for Bio-Medical Engineering, Indian Institute of Technology Delhi

    1. M. K Rahman Assistant Professor

      Centre for Bio-Medical Engineering, Indian Institute of Technology Delhi

      Sneh Anand Professor

      Centre for Bio-Medical Engineering,

      Indian Institute of Technology Delhi

      Abstract–In medical sonography the imaging of tumor is a widely applied noninvasive technique. The tumor can be benign or malignant and to diagnose the quality of the tumor based on the stiffness is a challenge. Shear wave velocity is computed by observing the shear wave propagation for determining the thickness of a tissue in Ultrasound Transient Elastography. This requires an ultra-fast scanner which works at frame rates more than 1000 fps. The major difficulty is in collecting huge amount of scanner data and process in the processing engine. Hence the designs are very complex and costly. Moving window algorithm is an innovative approach used for extracting the required information for measuring the shear wave velocity from successive frames. In this approach, one image frame is divided into multiple windows and in a multi frame period, only one window is sent and balance windows are discarded. This window is moved multi frame to multi frame. This data is super imposed on a complete frame data. The shear wave speed is calculated window by window. This algorithm reduces the amount of data sent to the processing engine. This will enable the data from the scanner to be ported to Laptops for processing through standard interfaces such as USB or Ethernet in DICOM format. This makes the transient elastography technology viable to be used for telemedicine applications.

      Keywords: Transient Elastography, Shear modulus, Shear wave velocity, Ultra-fast Scanner, Moving Window Algorithm, Ethernet, DICOM, Telemedicine


        LASTICITY measurement is a method used for the E computation of tissue thickness. Elasticity is the natural characteristic of a solid substance which

        comes back to its original contour after the stress caused by the external forces which caused it distort is taken out. The strain is the relative amount of deformation. The application of ultrasound elastography for clinical applications is given in [1].

        Shear modulus computation using shear wave velocity is given by the equation,

        E = c2 (1)

        where is the density of the tissue. If shear wave velocity is measured, elasticity can be determined.

        In transient elastography, low frequency shear waves are induced. The shear wave velocity is measured by cross correlation of the shear wave propagation between the adjacent frames. Such a method is able to diagnose the tumors in a qualitative manner. However in such a qualitative measurement, the precision of the instrument depends upon the distance between adjacent frames. Though low-frequency shear waves propagate at a low speed of a few m/s in soft tissues, the frame rate of the detection system must be higher than 1000 frames/s to be able to follow their propagation on mm scale [2]. With a frame rate of 800fps and shear wave velocity of 5m/s, a precision of the order of 1mm can be achieved.

        Frame Rate (FR) is calculated using the equation

        FR = C / (2 x D x N) (2)

        where C is the ultrasound speed (1540m/s for normal tissues), N is the No. of Scan Lines per Frame and D is the Depth of Penetration of the ultrasound waves. Maximum Frame Rate is achieved by making N = 1 i.e. all crystals are excited simultaneously.

        In an ultrasound machine operating at an ultrasound frequency of 8MHz and a receiver sampling rate of say 24MSPS (Mega Samples Per Second) and an ADC resolution of 8bits, the per channel data rate would be of the order of 192Mbps. The machine has to transfer all the data from all the frames of all the channels. An ultrasound machine with 64 crystals probe would require around 12Gbps data rate to be transferred from the ultrasound scanner to the processing engine. Moreover, receiving such huge amount of data and processing is also a challenge for portable low cost ultrasound machines.

        The important considerations for a portable transient elastography ultrasound machine are as follows. The data / image processing is done in a laptop. The interfacing of the machine with laptop is using standard interfaces such as USB or Ethernet. Standard DICOM interface is used. The scanned data sent from the ultrasound scanner to the laptop is of the order of around 50Mbps for reasonable processing and display in the laptop.

        Telemedicine applications require portable low cost machines which can be taken to remote village locations. The raw data with the selected data size and format or the retrieved video is possible to be sent to a remote location for

        telemedicine applications. Peak throughput and TCP window sizes are required to be determined for optimum use of the resources [3] for a Telemedicine application.

        For resolving the above limitations, the output data rates are reduced. Gigabit Ethernet interface, makes it easy to transfer and to view the images immediately on a laptop computer that can be a distance away [4]. But the maximum throughput from GE interface is only of the order of 650Mbps. Moreover the Ethernet interface of the Laptop also is required to process the complete data. The moving window algorithm approach presented in this paper is for reducing the data rates without affecting the transient elastography requirement of measurement of inter frame movements.


        A. Data Transfer Requirements

        In the experimental setup, 32 channels of Transmitter and Receiver are used. The data received from the ADCs are temporarily stored in the RAM available in the FPGA. For the proposed ultrasound scanner working at 8MHz, sampling frequency of 24MSPS, ADC resolution of 8bits per sample and PRF of 8 kHz, 3000 bytes of data is to be written per channel. This corresponds to a data rate of 6144 Mbps. In order to reduce the data rates, first a data compression method of peak detection of consecutive 8 samples is carried out. This achieves a compression level of 8 i.e. 375 bytes of data are stored per channel. The data compression reduces the per channel data rates to 24Mbps. Thus for 32 channels, 768Mbps of data is to be transferred. With the additional Ethernet, IP and UDP overheads, the data rates would become around 1Gbps. The proposed Moving Window Algorithm in this paper further reduces the data rates to 45Mbps. This requires 32 Blocks of 375 Bytes RAM storage in the FPGA. With a bus width of 16 Bits, one Block RAM can simultaneously store two channels. Thus 16 Block RAMs of 375 Bytes length would be required for storing one frame of data.

        Two separate RAM areas are proposed for the read and write operations such that while one frame is written into one RAM area, previous frame will be read and transferred via

        1. Moving Window Algorithm

          Even though the transient elastography ultrasound scanner do not possess any limitations in sending the data at these high data rates, the laptops available do not have enough processing capacity for wire rate reception of data at these rates. The main motivation of development of this technique is to reduce the data rate throughput for processing so that the processing can be done in standard hardware such as Laptops. With an ultrasound frequency of f, sampling rate of S, bits per sample as b and number of channels as n, the outpu data rates typically shall be s*b*n. This data rate is of the order of 1Gbps in the developed system.

          Hence a data/image frame is divided into multiple windows say w using an algorithm proposed in this paper as moving window algorithm. In this case a w of 23 is taken i.e. the data frame matrix is divided into 23 windows and selective windows are only transmitted. However in practical implementations, these values and requirements can vary and sufficient flexibility in the methodology can be achieved in choosing the right combinations. Thus the effective data rate is reduced to around 45Mbps. The Moving window Algorithm is implemented in the FPGA.

          In the first time, for few frames, each window is sent one after the other in the low frame rate for the MATLAB video reconstruction algorithm to synchronize and reconstruct the full image. Subsequently, the FPGA repeats the first frame for

          m times and discards the balance windows before moving to the second window and so on. This method of repeating the same window m times at the requisite high frame rate ensures the measurement of frame to frame displacement and shear wave velocity without any difficulty. The bottom line is that the used techniques should not hammer the frame to frame displacement and velocity measurements. In the experimental setup the value of m is taken as 256.

        2. Moving Window Algorithm Flow Chart

        The flow chart of the receive FPGA for the implementation of Moving Window Algorithm is shown in the figure-1 below.

        Output Start data

        Ethernet from the second RAM area. The Ethernet Frame is created by the FPGA by multiplexing the data channels and the frame bytes of the Ethernet Frame, IP/UDP packets. The UDP Source/destination port 104 Digital Imaging and Communication in Medicine [DICOM] is used as a standard interface protocol.

        B. Choosing the FPGA

        For logic emulation systems the Field Programmable Gate Array (FPGA) provides faster computation as compared to software simulation. The logic designs are customized for high performance in different types of applications. In multimode system, the FPGA yield significant hardware savings and provides generic hardware in [7]. In order to meet such requirements, Xilinx FPGA with the following

        Read Y

        configuration and write to MAC



        Read Data from Microcontroller

        Write FPGA Register

        Configure Gigabit MAC


        Packet No n=1 Sub Frame m = 1 Multi Frame MF=1


        Send data

        to Ethernet

        Output Ethernet Header Output IP Header Output UDP Header

        Output Application Header

        Process channels 4n to 4n+3

        Sub Frame m

        Output RAM Data to MAC

        Increment Packet No

        If Packet No>9, Increment MF Packet No = 1

        If MF > 256,

        Increment Sub frame m MF No = 1

        If Sub frame m > 16, Sub Frame m = 1

        specifications is chosen. This device has 172 input/output (I/O Pins), 216K Block RAM, LVDS (Low Voltage Differential Signaling) interface is used for interfacing with High voltage pulser and the Receiver chips, 622Mbps speed of the IO Bus and EEPROM/Master-Slave/JTAG Programming Headers.

        Figure-1: Receive FPGA software algorithm flow chart

        In the experimental setup described in this paper, 368 Bytes of data per channel is read out of the FPGA through the Ethernet port. The 368 Bytes are divided into 23 windows.

        I.e. each window comprises of 16 bytes per channel or 16 x 32 Bytes per ultrasound frame. During one write cycle to the

        FPGA, only one window of 16 x 32 bytes is transferred to the laptop and balance data is discarded. A multi frame consists of 256 such sub frames where in only the first frame is only read. Once one multi frame is read, it moves on to read the next window and so on. This algorithm reduced the effective output data rate by 23. This also ensured that the high frame rate is retained so that the measurement of frame to frame displacement and hence the transient wave velocity is not affected.

        Video reconstruction algorithm in MATLAB does intelligent algorithm. Based on the initial consecutive windows, the first image is reconstructed. Subsequent frames are superimposed on the initial frame. For enabling this arrangement of windows and frames, the window id and frame id are sent along with the packet in the UDP payload. The window matrix is superimposed on the complete frame matrix and image is displayed. Since motion detection calculates the difference between the frames, shear wave motion is detected in the moving sector.


        A. General Working Procedure

        The Tx FPGA generates the Transmit pulses at 8MHz and at a PRF of 8 kHz for all the channels. 8 Channel High Voltage Pulser consists of logic interfaces and amplifies the digital pulses generated by the FPGA for exciting the piezo electric crystals located in the Ultrasound transducer probe. 8 channel receiver has LNA to amplify the low level receive signals received from the piezoelectric crystals, TGC for Time Gain Compensation, AAF the Anti Aliasing Filter and the ADC which performs the Analog to Digital Conversion. TGC implementation in ultrasound, see [10].




        The Receive FPGA has sufficient I/O Buses for interfacing with the ADCs, Ethernet MAC and the Microcontroller as given in the Figure-2 below. Serial Peripheral Interface [SPI] programming for FPGA, see [5]

        frame of data. Thus the FPGA requires two data banks, which will be switched between the write and read operations. The interface logic is embedded in Field Programmable Gate Array and therefore the FPGA includes both user logic and interface logic [6].

        Likewise all the 32 channels of receive data are written into the databanks. 375 Bytes per channel is stored in the FPGA databank.

        1. Storing Overhead data in FPGA Registers

          The overhead data for the Ethernet Frame, IP Packet and UDP datagram are stored in the FPGA Registers. Some of these data values are fixed values where as some of the values like source, destination IP addresses etc are assigned by the Microcontroller. The Microcontroller in turn is programmed from the MATLAB GUI through the USB interface as shown in Figure-2.

        2. FPGA Receive Packet Formation System Architecture

        Block RAM-1 16Bit wide, 1024 locations

        750 locations used


        Start MAC Data Write Ethernet Header Data IP Header Data

        UDP Header Data

        The FPGA receive packet formation system architecture uses the Moving window algorithm. The data header generated by the FPGA contains the MAC Data Write Start Bytes, Ethernet header Data, IP Header Data and the UDP Header Data. After sending the data headers, the data from any one of the FPGA RAM data bank is read using the Moving Window Algorithm. On completion of the data read, the MAC data write stop bytes are sent which will enable the MAC to send the complete packet to the Ethernet interface. This is given in Figure-3 below.

        Serial Peripheral Interface

        Laptop MATLAB GUI

        Serial Programming Interface

        Block RAM-2 16Bit wide, 1024 locations

        750 locations used


        25MHz active clock Read n locations (n*0.04µS)


        Packet formation Logic

        Data From RAM (Moving Window Algorithm)

        MAC Data Write Stop

        active clock

        Gigabit Ethernet Interfacing

        Receive FPGA


        and ADC

        Data Bus


        Figure-2: Block schematic of the Receive section of Ultrasound Scanner

        B. Storing the receive data in the FPGA RAM using two data banks

        Figure-3: Receive FPGA logical block schematic

        The pipelined architecture of the Field Programmable Gate Array and the distributed Random Access Memory for high I/O resources of an image classifier implementing object classification stages in object detection system is discused in [8].

        The LNA supplies two clocks FCO and DCO for synchronizing and reading the data by the FPGA LVDS Receiver. Various clocks required for receiving and processing of the data are generated in the FPGA. The internal RAM of the FPGA acts as the temporary storage of the scanned data. The receive data is converted into serial to parallel stream and stored in the FPGA Block RAM. Two Block RAMs of the FPGA are used for writing the alternate

        Some of the header bytes like the checksum etc are written into the Ethernet frame by the Gigabit Ethernet MAC chip. All other headers are written through the microcontroller into the FPGA registers. The Gigabit MAC chip also requires the start and stop bytes from the FPGA. A counter is used for sending the data sequentially in the order of start bits, Ethernet header, IP header, UDP header, Application header, Data from the block RAM, Ethernet end

        of frame and stop bits. The data is transferred at very high speeds to the Gigabit Ethernet MAC chip.

        Gigabit Ethernet controller supports full duplex operation with 1000Mbps data Rate, High-performance non-PCI local bus, EEPROM interface and 16/32-bit SRAM-like host interface. It does the Ethernet framing of the data and inserts the IP and UDP header checksums. Physical Layer (PHY) devices supports 1000BASE-T standards in full-duplex mode, and support the RGMII interface operating at 125MHz towards the Gigabit Ethernet controller. It carries out the Physical layer level translations and conversions to Gigabit Ethernet speeds over copper interface.

        The data processing and image processing is carried out in the MATLAB based GUI. The device configurations are controlled from the GUI through a microcontroller in the Ultrasound board.

        1. Simulation Results

          The simulation results of various waveforms of host clock, RAM enable clocks etc can be seen in the figure-4 below. The various clocks generated by the FPGA including the RAM read clocks from different databanks are seen in the figure.

          Figure-4: Simulation results during the design phase using FPGA

      4. RESULTS

        The image reconstruction using the moving window in a MATLAB GUI is given in the Figure-5. The image is progressively getting reconstructed in this method. The final image can be seen in Figure-6. Further, the displacement of the propagating shear wave is measured as a function of time and space in [10] using MATLAB based algorithms. Transient Elastography measurements require the cross correlation measurements between successive frames which are moving at frame rates of the order of 1000fps. In this method, one window is continuously transmitted for say 256 frames.

        (a) (b)

        Figure-5: Moving window Algorithm display in MATLAB GUI with (a) 4 and (b) 10 Windows respectively

        Hence the velocity of propagation of the shear waves can be measured within the window using the existing methods. This method is repeated for successive windows and the resultant velocity graph is combined to get expected results.

        The arrival time envelope satisfies the Eikonal equation. The distance method is used to solve the inverse Eikonal equation given the arrival times of a propagating wave, to find the wave speed [9].

        Figure-6: Final acquired image after the moving window algorithm on a homogeneous medium used as phantom


        For observing the shear wave propagation and to compute the shear modulus, an ultrafast scanner is required which works at frame rates more than 1000 fps. Such ultrasound machines are required to collect huge amount of scanner data and process the same in the processing engine. This makes their design very complex and costly. Through this paper, a new algorithm named Moving window algorithm is introduced which is found to be an innovative approach by extracting the required information for measuring the shear wave velocity from successive frames.

        In this approach, one image frame is divided into multiple windows say 16 and in a multi frame period, only one window is sent and balance windows are discarded. This window is moved multi frame to multi frame. This data is super imposed on a complete frame data. The shear wave speed is calculated window by window. This algorithm reduces the amount of data sent to the processing engine. This will enable the data from the scanner could be ported to Laptops for processing through standard interfaces such as USB or Ethernet. This makes the transient elastography technology viable to be used for telemedicine applications.

      6. ACKNOWLEDGEMENTS Department of Science and Technology


    1. Elisa E. Konofagou, Jonathan Ophir, Thomas A. Krouskop and Brian

      S. Elastography: from theory to clinical applications Garra, Focused Ultrasound Laboratory, Department of Radiology, Brigham and Women's Hospital – Harvard Medical School, Boston, MA, 2003 Summer Bioengineering Conference, June 25-29, Sonesta Beach Resort in Key Biscayne, Florida

    2. J. Bercoff,* S. Chaffai,* M. Tanter,* L. Sandrin,* S. Catheline,* M. Fink*, J. L. Gennisson* And M. Meunier, In Vivo Breast Tumor Detection Using Transient Elastography, *LaboratoireOndes et Acoustique, E.S.P.C.I., Universite´ Paris VII, U.M.R. 7587 C.N.R.S 1503, Paris, France; and Institut Curie, Service de Radio diagnostique, Paris, France, Ultrasound in Med. & Biol., Vol. 29, No. 10, pp. 13871396, 2003

    3. Optimization of wide-area ATM and local-Area Ethernet/FDDI network configurations for high-speed telemedicine communications employing NASA's ACTS McDermott, W.R. ; Maya Found., USA

      ; Tri, J.L. ; Mitchell, M.P. ; Levens, S.P. Published in: Network, IEEE (Volume:13 , Issue: 4 ) Digital Object Identifier : 10.1109/65.777439

    4. Zentai, G.; Partain, L., "Development of a high resolution, portable x- ray imager for security applications," Imaging Systems and Techniques, 2007. IST '07. IEEE International Workshop on , vol., no., pp.1,5, 5-5 May 2007 doi: 10.1109/IST.2007.379590

    5. Trupti D. Shingare, R. T. Patil, SPI Implementation on FPGA, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278-3075, Volume-2, Issue-2, January 2013

    6. A design of embedded Gigabit Ethernet interface, Li Mingwei Electron. Eng. Dept., Dalian Univ. of Technol., Dalian, China, Li Yanxia ; HuYanguo; IEEE International Conference on Mechanic Automation and Control Engineering (MACE), 2010; IEEE Digital Object Identifier 10.1109/MACE.2010.5535339

    7. Hauck, S., "The roles of FPGAs in reprogrammable systems," Proceedings of the IEEE , vol.86, no.4, pp.615,638, Apr 1998 doi: 10.1109/5.663540

    8. McCurry, P.; Morgan, F.; Kilmartin, L., "Xilinx FPGA implementation of an image classifier for object detection applications," Image Processing, 2001. Proceedings. 2001 International Conference on , vol.3, no., pp.346,349 vol.3, 2001, doi: 10.1109/ICIP.2001.958122

    9. Joyce McLaughlin and Daniel Renzi; Shear wave speed recovery in transient Elastography and supersonic imaging using propagating fronts; Institute of Physics Publishing; Published 27 March 2006 Online

    10. Mingwang Tang; FeiLuo; Dong Liu, "Automatic Time Gain Compensation in Ultrasound Imaging System," Bioinformatics and Biomedical Engineering, 2009. ICBBE 2009. 3rd International Conference on , vol., no., pp.1,4, 11-13 June 2009 doi: 10.1109/ICBBE.2009.5162432

Leave a Reply