The GitHub Repository: link
Google Meet link for the Expo: https://meet.google.com/icf-ahya-nzx
To use ARLP, click on this link and run this Jupyter Book:
In today's digital era, the demand for intelligent systems capable of understanding and processing visual information is ever-increasing. One critical application domain is the recognition and interpretation of vehicle number plates, which are essential for various purposes ranging from law enforcement to traffic management. As part of Envision, IEEE NITK (2024), this project (I3) focuses on developing a robust Vehicle Number Plate Recognition (NPR) system tailored specifically for Indian vehicles.
We designed a customized CNN model, taking inspiration from various CNN architectures worldwide. Additionally, we implemented transfer learning by incorporating the Inception ResNet V2 model and trained it concurrently. We then evaluated and compared the outcomes of both approaches. We employed TesseractOCR and EasyOCR to extract text from number plates.
The primary objective of this project is to design and deploy a specialized Number Plate Recognition system for Indian vehicles. The goal is to create a Custom CNN model by studying various architectures of CNNs and also create a transfer learning model using Inception Resnet v2 to enable efficient and accurate identification of number plates, considering the unique characteristics and variations of license plates. This NPR system aims to provide a practical solution for automated toll collection, traffic monitoring, and law enforcement applications.
To get started with ARLP, follow these steps:
You can download the books and run them on your local machine. However, you can run the results book on Google Colab.
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. It supports multiple programming paradigms, including structured, object-oriented, and functional programming.
Machine learning is a branch of artificial intelligence (AI) that focuses on developing algorithms and techniques that enable computers to learn from data and make predictions or decisions without being explicitly programmed. It encompasses various approaches, such as supervised, unsupervised, and reinforcement learning.
Neural networks are computational models inspired by the structure and function of biological neural networks in the human brain. They consist of interconnected nodes, called neurons, organized in layers. Each neuron applies a weighted sum of its inputs and an activation function to produce an output. Neural networks can learn complex patterns and relationships from data, making them suitable for various tasks, including image recognition, natural language processing, and reinforcement learning.
TensorFlow is an open-source machine learning framework developed by Google Brain for building and training deep learning models. It provides a comprehensive ecosystem of tools, libraries, and resources for efficiently developing and deploying machine learning applications. TensorFlow supports high-level APIs for easy model development and low-level APIs for flexibility and customization.
Convolutional Neural Networks (CNNs) are a class of deep neural networks designed for processing structured grid-like data, such as images. They consist of multiple layers, including convolutional layers, pooling layers, and fully connected layers. CNNs are highly effective for tasks such as image classification, object detection, and image segmentation due to their ability to learn hierarchical representations of features from input images automatically.
Spatial attention mechanisms focus on relevant spatial regions within input data. The SpatialAttention
layer in the code computes attention maps based on input feature maps' spatial information. It generates attention maps highlighting important regions, allowing the network to prioritize relevant areas for improved performance in tasks like image recognition.
Channel attention mechanisms emphasize or suppress specific channels within feature maps. The cbam_block
function computes attention weights for each channel using global pooling and dense layers. These weights modulate feature maps, highlighting informative channels. By recalibrating feature representations adaptively, channel attention mechanisms enhance CNNs' ability to capture fine-grained details.
The Convolutional Block Attention Module (CBAM) integrates spatial and channel attention mechanisms. By combining them, CBAM enables networks to focus on relevant spatial regions and informative channels within feature maps. This adaptive recalibration enhances feature representations, improving performance in image classification and object detection tasks, as demonstrated in the provided code.
Dual Path Networks (DPNs) integrate dual paths within building blocks. The dpn_block
Function constructs blocks with dense and residual paths. DPNs capture richer features and promote effective information exchange by concatenating information from both paths. This approach improves pattern learning and performance in image classification and semantic segmentation tasks.
The Squeeze-and-Excitation (SE) block models channel-wise dependencies within neural networks. Combining squeeze (global pooling) and excitation (fully connected layers) operations, SE blocks adaptively recalibrate feature maps. This enhances feature discrimination and generalization by emphasizing informative channels while suppressing less relevant ones. Integrated into CNNs, SE blocks enhance representational power and performance in image-related tasks.
Inception ResNet v2 is a deep convolutional neural network (CNN) architecture that combines the Inception and ResNet modules. Google introduced it in 2016 and is known for its exceptional performance in image recognition tasks. The architecture employs residual connections to address the vanishing gradient problem and utilizes the Inception modules to capture multi-scale features efficiently.
Inception ResNet v2's architecture
Inception-ResNet-v2 is a convolutional neural network trained on over a million images from the ImageNet database. The network is 164 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The Inception-ResNet-v2 was used for the classification task. Inception ResNet v2 consists of multiple blocks of Inception-ResNet modules, each containing a combination of Inception and ResNet components. These modules allow the network to learn complex hierarchical features from input images, making it suitable for image classification, object detection, and feature extraction tasks.
The InceptionResNetV2
class provided by TensorFlow's Keras API allows easy loading of the pre-trained Inception ResNet v2 model trained on the ImageNet dataset. This model can be fine-tuned or used as a feature extractor for various computer vision tasks, including number plate recognition.
First, we collected the dataset from Kaggle, which consists of images of vehicles along with corresponding XML files containing coordinates for bounding boxes around the license plates. We utilized the glob
and os
modules in Python to process and extract the required information from the dataset efficiently. Each image was converted into a numpy array for further processing.
The datasets we used are
There are 567 + 453 + 433 = 1453 images in total.
We created a custom model with around 96 million parameters (96,127,521 parameters). We then stored it as a .keras file. The total size of the file is 366 MB. We implemented Inception-ResNet Blocks, Attention Mechanisms such as Spatial Attention, Channel Attention, CBAM Blocks and Dual Path Networks (DPNs). We used the Swish activation function instead of regular activation functions (like relu). The model's Architecture graph is in this repo's ARLP Resources folder - link.
We've achieved an encouraging accuracy of 64.4897% with our model. Although there's room for improvement in tuning the hyperparameters, this presents an exciting opportunity for future refinement and enhancement of our project. With a positive outlook, we're eager to delve into further optimization to unlock the full potential of our model and elevate its performance even further.
We tried imitating the architecture of Inception-ResNet models and tried improving them. So, these blocks are pivotal components in the architecture and are responsible for feature extraction and refinement. They combine Inception and ResNet architecture elements, incorporating convolutional layers with residual connections to facilitate efficient feature propagation and learning.
inception_resnet_block
function defines the structure of each Inception-ResNet block.The architecture integrates advanced attention mechanisms, including Spatial Attention, Channel Attention, and Convolutional Block Attention Module (CBAM), to dynamically adjust feature representations and focus on relevant information.
SpatialAttention
layer computes attention maps based on the spatial information of input feature maps.cbam_block
function implements channel attention by computing attention weights for each channel using global pooling and dense layers.Dual Path Networks (DPNs) are incorporated to promote effective information exchange and feature learning within the architecture. They enhance the network's capability to capture richer feature representations, improving recognition accuracy.
dpn_block
function constructs building blocks consisting of dense and residual paths.Throughout the architecture, the Swish activation function introduces non-linearity and facilitates faster convergence during training.
For the Inception ResNet v2 approach, we leveraged transfer learning by importing the pre-trained model from TensorFlow's Keras API. We added two additional layers on top of the pre-trained model to fine-tune it for our specific task. The model was then trained using the collected dataset. TensorBoard
was used to monitor various metrics during training. Once trained, the model could predict the location of the license plate, i.e., the bounding box coordinates. Subsequently, we cropped the license plate region from the image and applied EasyOCR and Tesseract for text recognition. This is the architecture - link.
Comparison between both the models
Comparison between Cropped images of both models
Accuracy Vs Epoch Graph of the Custom Model
ptxas warning : Registers are spilled to local memory in function 'loop_add_subtract_fusion_49', 224 bytes spill stores, 224 bytes spill loads
.
Accuracy Vs Epoch Graph of the Inception Model
Some Reference papers we used are:
Inception ResNet V2:
Attention Mechanisms:
Dual Path Networks (DPNs):
Swish Activation Function:
Squeeze-and-Excitation (SE) Block:
Report prepared on May 6, 2024, 11:46 p.m. by:
Report reviewed and approved by Nikesh Shetty [Piston] on May 10, 2024, 7:10 a.m..