Leveraging Frequency Analysis for Deepfake Image Classification

AIM:
Deep fake technology has gained significant attention in recent years due to its potential to generate highly realistic counterfeit images and videos, raising concerns about the integrity of visual media. Addressing the challenge of detecting deep fake images is crucial to ensure trustworthiness in various domains, including journalism, forensics, and social media platforms. This project aims to explore the effectiveness of frequency analysis techniques and deep learning for deep fake image recognition, offering a comprehensive study to enhance the reliability of detection methods.

INTRODUCTION:

The project utilizes DCT as a frequency analysis technique to extract spectral features that capture unique characteristics of deep fake images. These features are then used to train deep learning models, enhancing their ability to accurately classify deep fake images.
We have also used StyleGAN, a generative model to generate fake images. The dataset used was the FFHQ dataset. We created our own dataset of real and fake images with the help of generated fake images and the original dataset.
We have also built a frontend web app which identifies a deep fake image. A user can upload an image and it will classify the image as fake or real. We use Gradio for the frontend display of the web-app.
By empowering users to detect deep fake images and promoting transparency, the project contributes to combating the harmful effects of deepfakes and fostering a more trustworthy digital media environment.

METHODOLOGY:

We utilize the StyleGAN model to generate fake images for our dataset. Some key characteristics of the StyleGAN architecture are mentioned as follows:

The AdaIN layer is normalizing the statistics of inputs and outputs to the convolution layer, and feeding the statistics of the style input. This way we can keep the information of x while transforming the distribution of the output of AdaIN to be similar to the style input.
The synthesis network uses a progressive GAN structure as its backbone, where the network grows from low resolution to high resolution as training continues. Noise is given as an input to each synthesis block in order to capture variational details and thus rendering the image to be seen more realistic.
The input latent vector characterizes important features such as gender, ethnicity and hairstyle. This leads to more controllability of the input style vector. Style mixing in StyleGAN involves interpolating the latent code vectors (Z vectors) of two input images at specific layers in the network. By blending the style vectors (W vectors) of these images, it allows for the synthesis of new images that exhibit a combination of visual attributes from both source images, effectively influencing the appearance of generated outputs at different hierarchical levels within the network.

Architecture of StyleGAN

Next we create our own dataset of real and fake images ( each 1000 in number). For real images we use the thubmnails of FFHQ dataset which are of 128 x 128 resolution. The fake images have been generated from the StyleGAN. Then we follow the below steps to build a deep fake image classifier:

We transform images into the frequency domain using the discrete cosine transform (DCT). The DCT expresses, much like the discrete Fourier transform (DFT), a finite sequence of data points as a sum of cosine functions oscillating at different frequencies. In practice, we compute the 2D-DCT as a product of two 1D-DCTs, i.e. for images we first compute a DCT along the columns and then a DCT along the rows.
When we plot the DCT spectrum, we depict the DCT coefficients as a heatmap. This helps us to understand the differences between the images in the frequency domain
To classify the images based on their frequency domain counterparts, we use a simple linear classifier. To demonstrate this, we perform a ridge regression on real and generated images, after applying a DCT. We also perform a ridge regression on the original images without any transformations for comparative analysis.
We use a ridge classifier for the classification which is a modification of linear regression where regularization, specifically L2 regularization, is applied to the coefficients.

RESULTS:

Few fake images generated by the StyleGAN

A large improvement in detection of deepfake images is noticed when taking the DCT of images compared to normal images as input to a ridge classifier. DCT is able to pick up on artifacts present in deepfake images generated by the StyleGAN.

The accuracy while using DCT- Analysis on the images is 100% whereas it is 54.75% on the original images.

A screenshot of the webpage

Our implementation can be found here: Github Repository

CONCLUSION:

In conclusion, this project represents a significant step forward in the ongoing efforts to combat the proliferation of deep fake technology and its potential negative impacts on society. By leveraging DCT-based frequency analysis alongside deep learning techniques, we have demonstrated a promising approach to accurately classify deep fake images.

REFERENCES:

1. Leveraging Frequency Analysis for Deep Fake Image Classification

2. Implementation of the above mentioned paper

Virtual Expo 2024

Abstract

Abstract

Report Information

Team Members

Team Members

Report Details

Report Details

Explore More Projects