Autonomous Vehicle System Using Lidar

DOI : 10.17577/IJERTCONV11IS03033

Download Full-Text PDF Cite this Publication

  • Open Access
  • Authors : Miss.D.Ragavi, D.Emilin Pearl Sharal, S.Gayathri, N.Sharmila, G.Manisha
  • Paper ID : IJERTCONV11IS03033
  • Volume & Issue : Volume 11, Issue 03
  • Published (First Online): 22-06-2023
  • ISSN (Online) : 2278-0181
  • Publisher Name : IJERT
  • License: Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License

Text Only Version

Autonomous Vehicle System Using Lidar

Miss.D.Ragavi Assistant Professor

D.Emilin Pearl Sharal Final ECE

S.Gayathri Final ECE

N.Sharmila Final ECE

G.Manisha Final ECE

Abstract – Today, the use of self-driving vehicles is increasing exponentially and is beginning to appear on the roads of emerging nations. For self-driving cars to work the capacity to sense one's surroundings is one of the fundamental abilities that must be cultivated. Raw data is gathered for this purpose by integrating devices like webcams, LiDAR, or radar. This study aims to assess a hybrid approach of cameras and LiDARs (4 and 64 beams) for 3D object recognition in cloudy weather. Following the fusion of the data from the two input sensors, a study of the individual contributions of each sensor is carried out. Using the well-known KITTI dataset and various fog intensities, we compute average accuracy in our study. The following are the major findings. High performance (90.15%, 89.26%) is achieved with binaural cameras and 4 or 64 rays LiDAR. In hazy weather, the 4 beams LiDAR's performance drops off significantly (13.43%). Accuracy is still very good (89.36%) when only using a camera-based model. In conclusion, stereo cameras on their own can accurately identify 3D objects in hazy conditions, and when combined with LIDAR sensors, their performance slightly increases.

Keywords : LiDAR, KITTI, 3D object detection, Hazy weather.


    In wealthy nations today, self-driving automobiles are gaining popularity. These

    vehicles must make precise assumptions in order to recognise obstacles, traffic signs, and lanes. Several self-driving car startups have formed as a result of the importance and allure of autonomous cars, and nearly every traditional carmaker has made some investment in this market. Self-driving automobiles are a very popular study subject since a lot of money has been spent on developing the hardware and software for them. Self-driving cars include sensors like cameras, radars, and LiDARs to better understand their surroundings. These sensors are used to gather environmental data. The information required for vehicle navigation is then extracted from the data by feeding it into predictive models including segmentation, object identification, and semantic categorization. This allows the car to recognise lanes, barriers, and traffic signals and make judgements about steering wheel angle and wheel speed. High safety requirements must be upheld yet these models must operate with extremely high precision. As a result, the performance of self-driving cars depends significantly on the accuracy of prediction models.

    A camera-based or LIDAR-based detection technique was created in earlier research on environment perception. The data was gathered in optimum circumstances (daylight, clear skies). These experiments showed that models based on LiDAR outperformed those based on cameras. This

    is because LiDAR, an active sensor, can estimate a self-driving car's distance from an object extremely precisely, but a camera cannot, especially at a distance. In the meanwhile, approaches that mix camera and LiDAR have not delivered the expected outcomes that may benefit from the advantages of both technologies. Further research must be done to determine how these models can be employed in harsh weather circumstances like rain, snow, or fog because 3D object identification technologies already offer solid findings on data under regular weather situations. Also, we are unsure of the model (camera-based, LiDAR-based, or fusion-based technique) that performs the best in these circumstances. Many studies have shown that data gathered by cameras and LiDAR is severely corrupted in severe weather. There is, however, little research that demonstrates how these skewed data affect the effectiveness of the predictive model for 3D object recognition. The primary goal of this study is to evaluate how the two sensors – a camera and a LiDAR – affect 3D object recognition in foggy weather. This study offers three contributions:

    • The SLS-Fusion neural network is divided and modified to take stereo camera or LiDAR into consideration individually. Two separate subneural networks result from this.

    • In comparison to SLS- Fusion, we examine how each subneural network performs when the weather is foggy.

    • Next, using the KITTI dataset and six levels of fog, we examine how well the 3D object identification models (for stereo cameras and LiDAR) perform.


    The part on lidar point clouds begins with a study of the KITTI dataset, which has developed into the de facto industry benchmark for self-driving perception tasks.

    The lidar coordinate frame is thus properly defined, and it is used to represent both the coordinates of returned lidar points and the projected oriented 3d boxes at the output of detection networks. The data format of the returning lidar points is then described. We begin the section on the backdrop of 3D object identification by explicitly defining the job of 3D object recognition and discussing the 6 degrees of freedom used to represent each expected oriented 3D box. Then, we show the focused loss and hard negative mining as the standard approaches to handle the background class imbalance in classification loss of 3d object recognition networks, as well as smooth L1 as a reliable regression loss against outliers. We also contrast 3D object identification networks' regression objectives with and without anchor boxes. Additionally, in order to ensure better generalization, data augmentation is discussed as a crucial component of training pipelines for 3D object detection networks. Two search-based methods that use RL and evolutionary algorithms to find the best data augmentation policies are described in more detail.

    The difficulties of processing lidar points by neural networks due to the permutation invariance characteristic of point clouds as unordered collections of points are first discussed in the section on 3D object identification neural networks. Finally, we categorize 3D object identification networks into two groups: point cloud ordered grid representation networks and networks with input-wise permutation invariance. PointNet, a permutation-invariant architecture for lidar-based classification and segmentation, is described in relation to 3D object detection networks with input-wise permutation invariance. In addition, we examine Frustum PointNets, which extends PointNet for 3d object

    detection tasks by using image-based 2d object detection networks.

    Later, we shift gears and concentrate on 3D object detection networks using ordered grid representations of point clouds. We first discuss the two primary forms of CNN object detection networks: single-shot and region-proposal-based, as these networks come within the category of CNN object detection networks. The fundamental components of CNN object detection networks, including backbone networks and feature pyramid networks like FPN and PANet, are also described. Also offered as a post-processing procedure to filter the predictions generated by dense object identification networks is non-maximum suppression. The advantages and disadvantages of range image representation, which views point clouds as 360-degree images of 3D surroundings recorded by lidar sensors, are then discussed. As quantizations of the 3D cuboids subspaces around the lidar sensors, 3D voxelization representations are shown. The inefficiency of 3D voxelization representations in terms of memory and computation is explored. Moreover, to process 3d voxelization tensors, 3d convolutions are presented as natural convolutional layers.

    Last but not least, we concentrate on 2d voxelization representation as a substitute for 3d voxelization representation, which exhibits better computational efficiency by using 2d convolutional layers rather than 3d convolutional layers. Moreover, we contrast feature encoders created by hand with those learnt automatically as a component of 3d object identification networks that use 2d voxelization representation as their inputs.

    A . KITTI dataset :

    The KITTI dataset has emerged as the de facto benchmark dataset for image-based monocular and stereo depth estimates,

    optical flow, semantic and instance segmentation, and 2D and 3D object identification for self-driving vehicles. This dataset was produced using the multi-sensor recoding platform (a vehicle with two forward-facing cameras and a lidar sensor) by generating data from the streets of Karlsruhe, Germany. Fig 1.

    Figure 1 KITTI multi modal sensor suit

    Each training sample in the KITTI dataset is a labeled three-dimensional picture that was obtained using two camera images produced by the two forward-facing cameras and a point cloud produced by the Velodyne HDL-64E lidar sensor positioned on the car's top. In the KITTI dataset, there are 7481 training scenes and 7581 test scenes. The lidar sensor's 100 millisecond 360-degree scan duration results in a sample frequency of 10 HZ for this multi-modal sensor suite from the outside 3D environment.

    B . Need for the 3D object detection : Modern computer vision systems can accurately and quickly identify objects in 2D data like photos and video (a sequence of image frames). Nevertheless, it may not be efficient and is computationally expensive to use the camera sensor for tasks like localization, determining the distance between the objects, and computing depth information. Fig.2

    Figure 2 KITTI point cloud viewer

    One of the most well-known sensors for providing 3D information about an item in the form of a point cloud that can be used to locate and characterize forms is LiDAR. Modern 3D object detectors including VeloFCN, 3DOP, 3D YOLO, PointNet, PointNet++, and a number of others have recently been suggested for 3D object identification.


    A . Light Detection and Ragging :

    According to Light Detection and Ranging (LiDAR), a distant sensor with the potential to aid in mapping and monitoring procedures, LiDAR employs light in the form of a pulsed laser to compute ranges to the Earth. Compared to monocular cameras, LiDAR offers various benefits for AV safety, including mapping the area. Whether it's raining or dark out, monocular cameras have a hard time mapping the area. LiDAR is a sensor that can be used to measure the separation between two objects by scanning with pulsed laser light and figuring out how long the pulsed light takes to return to the sensor.

    In order to build a depiction of all the things close to the AV using the LiDAR method, the AV environment must first be mapped. The map illustrates the measurement of density created by computing the 3D and 2D

    histograms of the point cloud for the 3D and 2D models, respectively. It can track and forecast an object's motion using the information obtained. The environment map may also categorize each object using the bounding box technique or various colors.

    While scanning with LiDAR, there are two views available: the front view and the bird's-eye view, as seen in Fig. 3. An elevated perspective from above is referred to as a bird's-eye view (a), and it may be stored in three various ways, including height, intensity, and density. The front view

    (b) provides a piece of matching information to the bird's eye view since it is a view of the front from the standpoint of the human eye. The RGB view (c) is a view that replicates the details and colors produced by the human eye.

    Figure 3 LiDAR view vs Human view

    B . You Only Look Once Algorithm (YOLO): In order to achieve a frame size of 7x7x1024 using the CNN network, YOLO will also execute a linear regression tilizing two fully connected layers to create boundary box predictions using a frame size of 7x7x2.

    It is hence much quicker than R-CNN. The picture will then be divided into areas using YOLO, and bounding boxes and probabilities will be predicted for each region. In order to provide an educated forecast in the whole context, the estimated probabilities then balance these bounding boxes when testing the entire image.

    Figure 4 Architecture of YOLOv2

    1. Object Detection :

      Because pedestrians might move in a variety of surprising ways, including jumping, sprinting, and crouching, detecting them can be challenging. Based on the characteristics of the items, the findings give a fair description of the texture and contours of the objects. Another approach to improving pedestrian recognition while partially obscured by other objects uses 3D LiDAR for Autonomous Vehicles. Despite the high expense of the research, the findings were encouraging since they might enhance the identification of people who are partially obscured.

    2. Noise Removal :

      After removing the noise, processing cleaned data is more effective since the random point cloud can no longer interfere with detection. The data's size is also reduced.

    3. Down Sample :

      The point of cloud in the downsampled data is smaller than it was in the original data. The data is made simpler so that it may be viewed more clearly and compactly.

    4. Transform :

    The input point cloud is transformed by rotating (a stiff transformation) and shearing (a non-rigid transformation). An affine transformation object defined by rotation defines a 45-degree rotation along the z-axis. An affine transformation object that defines shearing along the x-axis is created by shearing.


    It is necessary to construct a precision map using a variety of laser characteristics in order to accomplish exact localization. We want to shorten the time it takes to create the map by extracting reliable, obvious characteristics that characterize the environment as precisely as possible. The suggested technique specifically makes use of the kerb, reflection intensity, and height properties. The kerb feature is utilized to compute the heading angle and remove the localization's lateral inaccuracy. The road surface is identified by the reflection intensity map for precise matching. The 3D environmental data on each side of the road is captured by the height map.

    1. Crub map :

      The kerb points were located using the Velodyne HDL-32E laser sensor. On both sides of the road, kerb points are typically dispersed at the edge points within a particular height range. The dispersion of the laser beam gets sparse as the detecting distance increases, leading to measurement inaccuracies. We only use 20 of the 32 LiDAR beams, with a coverage range of around 30 m, to increase the detection accuracy. In order to prevent the misdemeanor of obstructions, we only employ laser pointers within a height of 10-15 cm from the ground plane. As the kerb point is a particular kind of edge point, there are clear height changes everywhere around it, as seen in Fig. 5. As a result, from the readings from the i-th laser scan, the height gradient filter is utilized to identify the curb point. Si

      We scan each laser beam using a sliding window approach to increase the curb detection's resilience.

      The search step is contained in each sliding window Wi, j, which also contains the new

      points from each laser scan measurement Si. To extract the kerb points within each sliding window, we employ the following techniques:

      1. Height features: If a sliding window satisfies the requirements listed below, we go on to the next steps:

        where (zmax,zmin) represents the highest and lowest height values for each data point inside the frame Wi, j.

      2. Height gradient functionality: Each point's height gradient value (gi, k) from window Wi, j is determined as follows:

        Figure 5. Curb description. (a) LiDAR point cloud containing the curb points; (b) zoomed-in view of the red portion of (a).

        where k is the index of the laser scan measurement point, zi,k is the height of point pi,k, and dist(pi,k, pi,k+1) is the distance in Euclidean space between pi,k and pi,k+1 in the x, y coordinate system. The point pi,k with the highest gradient value is chosen.

      3. The smoothness of the road surface: The height variations before and after point Pi,K, where vbi,k and v fi,k are the respective decided in a way that:

      We can get the kerb points for a single frame in the laser coordinate system by utilizing these three variables. For moving cars, it is possible to identify several kerb locations and convert them into the target GPS coordinate system to create a global kerb map. The kerb map is displayed in Fig. 6 following human modification of nearby locations.

      Figure 6. Sample of crub map

    2. Intensity and Height Maps:

    Grid cells are used to hold the height and intensity attributes. For the laser reflection height and intensity inside each cell, we employ a Gaussian model to describe the surroundings. Moreover, the intensity and height maps are constructed using certain segments of the single laser data.

    The ground points are used to build the intensity map due to the road marker's specific texture information. The measured laser reflection intensity must be calibrated before the compilation of the reflection intensity map in order to offer precise data with little variance. The calibration process utilized here simply needs the intensity data from a random environment to be finished. Next, using intensity value im in, we find pertinent intensity information, such as:

    Lighting has no effect on the observed intensity, however moving objects'

    trajectories will change the intensity map's average. So, to record this impact, the intensity variance inside the cell is kept. The average and variation of the intensity are kept for each cell (ci, j), as seen in Fig. 7.

    Figure 7. LiDAR reflection intensity map.

    (a) LiDAR reflection intensity map; (b) zoomed-in view of the red box of (a).

    We utilize the spots that satisfy the following requirement for building the height map:

    Dynamic obstacles are more likely to be present at obstacle spots that are on the ground. So, we employ points with heights larger than h max in order to record as many static high points as feasible, such as buildings and trees. As seen in Fig. 8, we keep the average height for each cell (ci, j). It should be noted that the variance of height is not recorded since the variance is not a stable representation of the 3D structure of the surrounding environment, but the average value is.

    We store cell values in hash tables to increase storage efficiency. The relative position of the map's origin, which is where the hash table is stored, serves as its key value. Data on height and reflection intensity are statistically represented by the associated hash value. The map's resolution, which needed to retain a specific level of sharpness and storage efficiency, was set to

    0.1 m.

    Figure 8. LiDAR height map. (a) LiDAR height map of Tongji campus; (b) zoomed-in view of the red box of (a).


    Our project flow is aimed to develop for rectifying autonomous vehicles in the hazy weather conditions . Fig.9 Original image is an input given to the detection and classification mechanism after the classification the objects are identified and use an alerting system in the LiDAR to alert the vehicles if the objects are below 10m distance.

    Figure 9. Project flow

    In fig.10 the detection and classification mechanism is explained briefly.

    Figure 10. Detection and Classification mechanism


    In conclusion the autonomous vehicle detection using LiDAR is simulated successfully. The maximum number of epochs can be calculated to analyze the loss percentage. We can use algorithms like CNN for 2D and YOLO for 3D object detection.


    Our future plan is to develop object detection using LiDAR sensors physically. To use 360 degree binaural cameras instead of using 180 degree cameras for the accurate detection and to store data in very small files.


[1] N. A. Minh Mai, P. Duthon, P. H. Salmane,

L. Khoudour, A. Crouzil and S. A. Velastin, "Camera and LiDAR analysis for 3D object detection in foggy weather conditions," 2022 12th International Conference on Pattern Recognition Systems (ICPRS), Saint-Etienne, France, 2022, pp. 1-7, doi: 10.1109/ICPRS54038.2022.9854073.

[2] S. Bhatlawande, S. Shilaskar and A. Dhanawade, "LIDAR based Detection of Small Vehicles," 2022 3rd International Conference for Emerging Technology (INCET), Belgaum, India, 2022, pp. 1-5, doi: 10.1109/INCET54531.2022.9824051.

[3] H. Gao, S. Cheng, Z. Chen, X. Song, Z. Xu and X. Xu, "Design and Implementation of Autonomous Mapping System for UGV Based on Lidar," 2022 IEEE International Conference on Networking, Sensing and Control (ICNSC), Shanghai, China, 2022, pp. 1-6, doi: 10.1109/ICNSC55942.2022.10004073.

[4] H. An and K. Zhang, "Functional Safety Design of Lidar System for Autonomous Vehicles," 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 2022, pp. 1219-1225, doi: 10.1109/AEECA55500.2022.9919087.

[5] Y. Li and J. Ibanez-Guzman, "Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems," in IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50-61, July 2020, doi: 10.1109/MSP.2020.2973615.

[6] M. A. Yahya, S. Abdul-Rahman and S. Mutalib, "Object Detection for Autonomous Vehicle with LiDAR Using Deep Learning," 2020 IEEE 10th International Conference on System Engineering and Technology (ICSET), Shah Alam, Malaysia, 2020, pp. 207-212, doi:


[7] H. An and K. Zhang, "Functional Safety Design of Lidar System for Autonomous Vehicles," 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 2022, pp. 1219-1225, doi:


[8] J. Barbosa, C. Hernandez, D. Paredes and R. Játiva E., "Design and Implementation of an autonomous vehicle with LIDAR-based navigation," 2020 International Conference on Mechatronics, Electronics and Automotive Engineering (ICMEAE), Cuernavaca, Mexico, 2020, pp. 98-103, doi:


[9] R. Domínguez, E. Onieva, J. Alonso, J. Villagra and C. González, "LIDAR based perception solution for autonomous vehicles," 2011 11th International Conference on Intelligent Systems Design and Applications, Cordoba, Spain, 2011, pp. 790-795, doi: 10.1109/ISDA.2011.6121753.

[10] Robust lidar localization using multi resolution gaussian mixture maps for autonomous driving, The International Journal of Robotics Research, vol. 36, no. 3, pp. 292319, 2017.

[11] P. J. Besl and N. D. McKay, Method for registration of 3-d shapes, IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 239256,


[12] A. Segal, D. Haehnel, and S. Thrun, Generalized-icp. in Robotics: science and systems. Seattle, WA, 2009.

[13] M. Magnusson, The three-dimensional normal-distributions transform: an efficent representation for registration, surface analysis, and loop detection, Ph.D. dissertation, Orebro universitet, 2009.

[14] J. Zhang and S. Singh, Low-drift and real-time lidar odometry and mapping, Autonomous Robots, vol. 41, no. 2, pp. 401416, 2017.

[15] R. Matthaei, G. Bagschik, and M. Maurer, Map-relative localization in lane-level maps for adas and autonomous driving, in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp. 4955.

[16] L. Wang, Y. Zhang, and J. Wang, Map-based localization method for autonomous vehicles using 3d-lidar, in The 20th World Congress of the International Federation of Automatic Control (IFAC). International Federation of Automatic Control, 2017.

[17] A. Y. Hata and D. F. Wolf, Feature detection for vehicle localization in urban environments using a multilayer lidar, IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 2, pp. 420429, 2015.

[18] S. Thrun, Probabilistic robotics, Communications of the ACM, vol. 45, no. 3, pp. 5257, 2002.

[19] G. Wan, X. Yang, R. Cai, H. Li, Y.

Zhou, H. Wang, and S. Song, Robust and precise vehicle localization based on multi-sensor fusion in diverse city scenes, in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 46704677.

[20] S. Verghese, "Self-driving cars and lidar," 2017 Conference on Lasers and Electro-Optics (CLEO), San Jose, CA, USA, 2017, pp. 1-1.

[21] R. Roriz, J. Cabral and T. Gomes, "Automotive LiDAR Technology: A Survey," in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6282-6297, July 2022, doi:


[22] D. Bastos, P. P. Monteiro, A. S. R. Oliveira and M. V. Drummond, "An Overview of LiDAR Requirements and Techniques for Autonomous Driving," 2021 Telecoms Conference (ConfTELE), Leiria, Portugal, 2021, pp. 1-6, doi: 10.1109/ConfTELE50222.2021.9435580.

[23] M. E. Warren, "Automotive LIDAR Technology," 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019, pp. C254-C255, doi: