A Deep Learning Based Framework For Robust Traffic Sign Detection Under Challenging Weather

Objective:

The primary objective is to design and implement a robust traffic sign detection system capable of accurately identifying road signs in challenging weather conditions.

The academic goal is to deepen understanding in Deep Learning, computer vision and image processing techniques.

Introduction:

Robust detection and recognition of traffic signs (TSDR) are crucial for the successful implementation of autonomous vehicle technology. Numerous research efforts have been dedicated to this task, resulting in the proposal of various promising methods in the literature.

However, most of these methods have been tested on clean datasets, neglecting the performance decline under challenging conditions that can obscure traffic sign images in real-world scenarios. This project addresses the issue under challenging conditions, emphasising the performance deterioration linked to such conditions. Compared to the objects that frequently appear in the existing object detection datasets, traffic sign regions are very small and thus have a very small region of interest (ROI) to background ratio. As a solution, we introduce a framework that enhances performance through a Convolutional Neural Network (CNN) based approach. The whole task can be subdivided into two independent tasks: Traffic Sign Detection and Traffic Sign Recognition. A major advantage of DL-based methods is that they are completely data-driven and do not require any manual feature engineering. Here, we use the GTSDB dataset.

We compare our approach with different CNN based methods

Methodology:

The project aims to develop a Convolutional Neural Network (CNN) architecture specialised in detecting traffic signs. The dataset will encompass images captured under diverse weather conditions to ensure the model's robustness. Transfer learning strategies, leveraging pre-trained models like ResNet or VGG, will be explored to enhance performance, capitalising on learned features.

Moreover, fine-tuning the model will involve augmentation techniques to replicate various weather scenarios during training, enhancing the model's ability to generalise across different environmental conditions. Post-processing methods, such as non-maximum suppression, will be deployed to refine detection results, ensuring accurate identification of traffic signs.

Python, alongside TensorFlow and Keras libraries, will serve as the primary development tools, facilitating the implementation and training of the CNN architecture. The chosen datasets will include traffic sign images with diverse weather conditions, such as the German Traffic Sign Detection Benchmark (GTSDB) dataset from Kaggle. These datasets will provide a rich variety of real-world scenarios for training and evaluation.

GPU acceleration will be utilised for efficient model training, reducing computation time and enabling rapid experimentation. Image preprocessing tools will be employed to enhance data quality and aid in feature extraction. Evaluation metrics like precision, recall, and F1 score will be utilised to assess the model's performance accurately, ensuring its effectiveness in traffic sign detection tasks.

Development and experimentation will be conducted using Jupyter notebooks, facilitating iterative refinement of the CNN architecture and training process. By iteratively refining the model based on experimental results and dataset insights, the project aims to achieve state-of-the-art performance in traffic sign detection under varying weather conditions.

References to datasets used, such as the GTSDB dataset, will provide transparency and reproducibility, enabling others to validate and build upon the project's findings. Overall, the project combines advanced machine learning techniques, robust evaluation methodologies, and real-world datasets to develop an effective and reliable traffic sign detection system capable of operating in diverse weather conditions.

Training:

Popular algorithms used to perform object detection include convolutional neural networks (R-CNN, Region-Based Convolutional Neural Networks), Fast R-CNN, and YOLO (You Only Look Once). The R-CNNs are in the R-CNN family, while YOLO is part of the single-shot detector family.

1.With R-CNN + ResNet backbone:

Residual Neural Network (ResNet) uses skip connections with batch normalisation. Resnet solves the problem of vanishing gradients with VGG.

When the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This results in the weights never updating its values and therefore, no learning is being performed.

With ResNets, the gradients can flow directly through the skip connections backwards from later layers to initial filters.

2. With YOLO:

You only look once (YOLO) is a state-of-the-art, real-time object detection system. It has become the standard way of detection of objects in the field of computer vision.

It has surpassed other algorithms such as sliding window object detection, R CNN, Fast R CNN, Faster R CNN, etc.

Prior detection systems use classifiers or localisers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections. We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

The algorithm works based on residual blocks, bounding box regression, intersection over unions or IOU and non-maximum suppression.

3.With VGG:

VGG16 is a deep convolutional neural network model used for image classification tasks. The network is composed of 16 layers of artificial neurons, which each work to process image information incrementally and improve the accuracy of its predictions.

Instead of having a large number of hyper-parameters, VGG16 uses convolution layers with a 3x3 filter and a stride 1 that are in the same padding and maxpool layer of 2x2 filter of stride 2. It follows this arrangement of convolution and max pool layers consistently throughout the whole architecture. In the end it has two fully connected layers, followed by a softmax for output.

In VGG16, ‘VGG’ refers to the Visual Geometry Group of the University of Oxford, while the ‘16’ refers to the network’s 16 layers that have weights. This network is a pretty large network, and it has about 138 million parameters.

Accuracy was 0.95 using VGG model when used for training.

Here’s the graph of accuracy and losses during training:

Optimizer - Stochastic gradient descent

Activation function – Sigmoid

Loss Function - categorical_crossentropy

Predictions using VGG model:

Results:

Faster R-CNN tends to offer higher object detection accuracy compared to YOLO, but when prioritising the real-time processing speed with accuracy YOLO may be a better choice.

References:

https://ieeexplore.ieee.org/document/9345465/citations#citations

Mentors:

SADDALA REDDY RAHUL(221ME344)
KANDULA GNANESHWAR(221IT035)

Mentees:

ADITHYA A(231EC102)
DHANYA D(231EC214)
KANDAGADLU NIKHIL GUPTHA(231DS015)

Meet Link: https://meet.google.com/qyv-cvyr-wdb

Virtual Expo 2024

Abstract

Abstract

Report Information

Team Members

Team Members

Report Details

Report Details

Explore More Projects