Virtual Expo 2024

Bird Call Classification

Envision
Diode

Techniques used

Mel-frequency spectrogram 

The mel-fequency spectrogram is a representation of the spectrum of a signal as it varies over time. It is derived from the traditional spectrogram, which displays the frequency content of a signal over time. However, instead of linearly spaced frequency bins, the mel spectrogram uses frequency bins that are spaced according to the mel scale, which is a perceptual scale of pitches based on human hearing. This scaling is designed to better represent how humans perceive differences in pitch.

Steps to get the Mel spectrogram:

  1. The Short Time Fourier Transform is calculated, amplitude is converted to decibels.

       2. Convert frequencies to Mel scale; Choose the number of mel bands and construct mel filter banks, which is now applied to the spectrogram

Convolutional Neural Network

CNNs, or Convolutional Neural Networks, are deep learning architectures particularly effective for image processing tasks. They consist of layers that apply convolution operations to capture features like edges and textures, pooling layers to reduce spatial dimensions, activation functions for non-linearity, and fully connected layers for classification or regression. CNNs excel at automatically learning hierarchical representations from raw data, making them invaluable for tasks such as image classification, object detection, and segmentation, where they have achieved state-of-the-art performance.

Dataset

The dataset we are working with consists of Indian bird calls. The dataset used is taken from a larger dataset consisting of a large number of classes. Each class is a separate genus, so the bird calls are more differentiable.  There are 5 different classes we are working with(Liocichla phoenicea, Dicrurus andamanensis, Cyornis poliogenys,  Arborophila torqueola,  Alcippe cinerea) each having between 20 to 30 audio .wav files. Overall, the dataset is balanced and evenly distributed, but there were some problems with the data, including noise and the uneven length of samples making it difficult to extract mel spectrograms of the same shape. Each sample was truncated to 15 seconds to solve the issue of uneven length, but the noise was not addressed.

Data preprocessing 

Audio files are loaded using a specified sampling rate and a duration of 15 seconds. The loaded audio is then split into smaller chunks of 5 seconds each ensuring a consistent signal length for further processing. Next, each 5 second section is converted into a mel spectrogram, which can be considered a visual representation of the audio's frequency spectrum. Moreover, we also convert them to decibel scale and normalize them.

Architecture

The noisy nature of the data in audio classification tasks are often a cause for overfitting. As a result, a simple but efficient feature extractor, in this case a CNN is employed. It's structure consits of 3 convolutional layers each followed by a maxpooling layer. Finally a dense layer is used with a 50% dropout to help deal with overfitting followed by a softmax output layer.

 

Results

On the left is the detailed classification report for  the task with random initialization. We get a test accuracy of 83.33% and a test loss of 0.4696. The graphs show the accuracy and loss through epochs. With a training accuracy of 80.87% and a test accuracy of 83.33% we can rule out overfitting, which is, as previously mentioned, prevalent in audio classification. 

Conclusion

The deep learning model developed in this project successfully classified different species of birds based on their vocalizations. Using a dataset from Kaggle containing audio recordings of five bird species, we processed the audio data with the Python library librosa, converting the recordings into log mel-spectrogram images to capture the time-frequency characteristics of the bird calls. Finally, we achieve a test accuracy of approximately 83.33%, indicating the effectiveness of using deep learning with audio spectrograms for bird species classification.

 

Mentors:

Vaibhav Santhosh

Aryan Herur

Mentees:

Rudra Gandhi

Guhan Balaji

Yash Kedia

Prakhyath Sai V

 

Meet link:https://meet.google.com/otz-monc-prq

METADATA

Report prepared on May 9, 2024, 4:16 p.m. by:

  • Vaibhav Santhosh [Diode]
  • Aryan N Herur [Diode]

Report reviewed and approved by Aditya Pandia [CompSoc] on None.

Check out more projects!