Virtual Expo 2024

Stegosaurus - Hiding in Plain Sight

Envision
CompSoc

Stegosaurus - Hiding in Plain Sight

Meeting Link
 

Mentees: Aditi Pandey, Calisto Mathias, Jovita Paul, Padmavati, Prabhav S Korwar, Rudransh, Sanstuti Mishra, Arjun Rijesh
Mentors: Prabhanjan Prabhu, Singaraju B V Sreedakshinya

 

INTRODUCTION

Steganography is the art of hiding messages so that no person can find out about the presence of a message. Compared to cryptography, the practice of protecting the contents of a message alone, steganography is concerned with concealing the fact that a secret message is being sent and its contents. The project aimed to make a command line steganography tool for Linux-based operating systems that is capable of encrypting and hiding messages/files within other files (images, audio files) using C++.

 

LITERATURE SURVEY / CONCEPTUAL REVIEW

The main components of the project include LZW Compression and Decompression, Steganography in WAV files and PNG files, and the AES Encryption Scheme, which have been discussed in this section.

  • LZW (Lempel–Ziv–Welch) Compression / Decompression:

LZW compression is widely used in formats like GIF, PDF, and TIFF for its ability to save space by identifying and encoding repeated patterns. It works by grouping symbols into strings and converting them into codes, which take up less space than the original strings. A code table is used, typically with 4096 entries. Codes 0-255 represent single bytes, while codes 256-4095 represent sequences of bytes. As encoding progresses, LZW adds repeated sequences to the code table. Decoding involves translating each code to find the corresponding character or characters. This method extends the library beyond the typical 8-bit ASCII characters, allowing for up to 12 bits per character. While it may not compress short, diverse strings efficiently, it excels at compressing redundant data. Additionally, LZW can compress and decompress data without saving the dictionary separately.

During LZW decompression, the decompressor reconstructs the same string table that was used during compression. It initialises the first 256 table entries to single characters. As it processes each character in the input stream, except the first one, it updates the string table accordingly. Decoding occurs by reading codes from the compressed data and translating them through the evolving code table. This process ensures that the original data is faithfully reconstructed during decompression, allowing for lossless recovery of the compressed information.

  • Advanced Encryption Standard- 128

AES-128, also known as the Rijndael Cipher, was introduced as a replacement for the earlier-used Data Encryption Standard (DES), which became vulnerable to brute force attacks due to advancements in computational power. Recognising that 56-bit keys were increasingly insecure, NIST organised a competition in which the Rijndael cipher was selected as the winner. To date, AES remains one of the most secure encryption algorithms available, even in the face of potential quantum computer threats. This is due to its symmetric architecture. According to NIST, given certain mitigating factors, it is highly probable that Grover's algorithm will offer minimal or no benefit in compromising AES. Consequently, AES-128 is expected to maintain its security for many years ahead.

Steps Involved in AES-128 Encryption:

  • Preparing the plaintext: The AES encryption algorithm works by encrypting 16-byte blocks at a time. Therefore, if the plaintext is longer than 16 bytes, the plaintext is padded using a suitable padding scheme such as PKCS#7 and then broken into blocks of 16 bytes each. Each of these blocks is then encrypted separately.
  • Key Expansion: Key expansion is a key initial step in the AES-128 encryption procedure. This particular step involves the generation of 10 expanded 16-byte keys from the initial 16-byte key that was given as input to the encryption algorithm in the case of the 128-bit AES algorithm. The initial key is labelled K0, while the 10 generated keys are labelled K1 to K10.
  • Addition of Round Key: The data block is XORed with the key corresponding to the round of encryption.
  • Shifting of Rows: In the AES-128 encryption process, each data block, organised as a 4 * 4 grid, undergoes a specific shifting sequence. This involves cyclically shifting each nth row to the left (n-1) times.
  • Substituting Bytes: In this step, each byte of the data block is replaced with a value from a pre-determined substitution lookup table. The state of each byte gives the corresponding index for the lookup table, and the value at that index in the lookup table is then used to replace the said byte in the data block.
  • Mix Columns: The MixColumns operation in AES encryption is a critical step for enhancing data diffusion. In binary contexts, these additions simplify to straightforward XOR operations. This procedure can be streamlined to referencing values in certain lookup tables thanks to pre-calculated steps. This step effectively shuffles the data within each column, mixing the influence of each byte on the final output and improving the overall confusion of the encryption process.

Steps Involved in AES-128 Decryption

  • Key Expansion: Key expansion is a key step in the AES-128 Decryption procedure. It is the process of generating an expanded set of 128-bit keys from the initial 128-bit key that is given as input to the decryption algorithm. In the case of the 128-bit variant of AES, we generate 10 additional keys from the initially provided key. We label these keys K0 to K10, K1 to K10 being the generated keys.
  • Addition of Round Key: In AES-128 decryption, data blocks undergo multiple rounds where specific operations are executed. One key step in this process is the addition of a round key to the data blocks. Since 11 keys are used in the AES-128 decryption algorithm, each round involves incorporating one of these round keys into the data.
  • Inverse Shifting of Rows: Conversely, during the AES-128 decryption process, each block of encrypted data is similarly adjusted but in the opposite direction. Here, each nth row is cyclically shifted to the right (n-1) times.
  • Inverse Substitute of Bytes: During this phase of the decryption process, each byte within the data is replaced with a corresponding byte from pre-determined inverse S-box lookup tables. These tables, which have been standardised, provide the specific substitution bytes needed for the decryption.
  • Inverse Mix Columns: The inverse mix columns step is arguably the most complex part of the encryption/decryption process. It involves multiplying the data grid, treated as a matrix, by a specific inverse matrix that corresponds to the one used in the Mix Column step of the encryption process. This operation includes both multiplication and addition within binary Galois fields. In binary contexts, these additions simplify to straightforward XOR operations. This procedure can be streamlined to referencing values in certain lookup tables thanks to pre-calculated steps.
  • Steganography in WAV files

WAV files consist of multiple segments of data, each with a specific function. The primary segment for steganography within WAV files is the audio data segment, which contains the actual sound waveform. Modifying this segment enables the concealment of information within the audio data without perceptibly altering the audible characteristics of the sound.

Steps followed for extracting hidden information from the WAV file:

  • Segment Iteration: The process begins by iterating through the segments of the WAV file until it reaches the audio data segment, which represents the waveform of the sound.
  • Analysis of Encoding Parameters: Within the audio data segment, the algorithm examines the encoding parameters, such as sample rate and bit depth. These parameters may indicate the presence of encoding techniques or additional layers of security applied to the hidden message.
  • Message Size Extraction: Following the analysis of encoding parameters, the algorithm extracts the size of the hidden message encoded within the audio data segment. This size information is critical for accurately extracting the hidden message from the audio data.
  • Hidden Message Extraction: Utilising the extracted size information, the algorithm decodes the hidden message from the audio data segment. The hidden message is typically encoded within the least significant bits of the audio samples, ensuring minimal perceptual impact on the sound itself.
  • Message Processing: Once the hidden message is successfully extracted, it can be further processed or stored as required by the application. This may involve decryption or additional analysis, depending on the nature of the hidden data.

Hiding information in WAV files involves altering the least significant bits of the audio samples to encode the hidden message.

The process involves:

  • Segment Iteration: The process starts by parsing the WAV file and iterating through its segments. Each segment serves a specific function, with the audio data segment being the primary focus for hiding information.
  • Identification of Audio Data Segment: The algorithm locates the audio data segment, which contains the sound waveform. This segment is where the bits alteration is done to hide the information.
  • Analysis of Encoding Parameters: Within the audio data segment, the algorithm examines the encoding parameters such as sample rate, bit depth, and compression techniques. 
  • Message Embedding: After analyzing the encoding parameters, the algorithm determines the number of least significant bits available for embedding without causing noticeable distortion in the audio. The least significant bits of each audio sample are replaced with the bits of the hidden message.
  • Hidden Message Size: Before embedding the hidden message, the algorithm ensures that the message size does not exceed the capacity of the available LSBs in the audio data segment. If necessary, the message may undergo compression to fit within the available space.
  • Embedded Message Encoding: The hidden message is encoded into the least significant bits of the audio samples.
  • Optional Encryption: To enhance security, the hidden message may be encrypted before embedding it into the audio data segment. This prevents unauthorized access to the concealed information.
  • Message Processing: Once the hidden message is successfully embedded, the modified audio data segment is reconstructed to form the modified WAV file
  • Steganography in PNG files

PNG files comprise multiple chunks of data, each serving a specific purpose. The critical chunk for steganography within PNG files is the IDAT (Image Data) chunk, which contains the actual pixel information that forms the image. Manipulating the IDAT chunk allows for the concealment of information within the image data without noticeably altering the visual appearance of the image.

Steps followed for extracting hidden information from the PNG file:

  • Chunk Iteration: The code begins by iterating through the chunks of the PNG file until it encounters the IDAT chunk, which signifies the start of the image data.
  • Reading Encryption and Compression Bits: Within the IDAT chunk, the implementation reads the encryption and compression bits. These bits may be used to determine the presence of encryption or compression applied to the hidden message, providing additional layers of security.
  • Message Size Extraction: Following the encryption and compression bits, the implementation extracts the size of the hidden message encoded within the IDAT chunk. This size information is crucial for properly extracting the hidden message from the image data.
  • Hidden Message Extraction: Using the extracted size information, the algorithm proceeds to decode the hidden message from the IDAT chunk. The hidden message is encoded within the least significant bits of each byte in the image data, ensuring minimal visual impact on the image itself.
  • Message Processing: Once the hidden message is successfully extracted, it can be further processed or stored as required by the application. This may involve decryption or additional analysis, depending on the nature of the hidden data.

Steps followed for hiding information in the PNG file:

  • Chunk Iteration: The code begins by iterating through the chunks of the PNG file until it encounters the IDAT chunk, which signifies the start of the image data.
  • Updating the Metadata: The IDAT chunk which consists of bits indicating whether the message is encrypted or not, whether the message is compressed or not and the size of the hidden message are updated accordingly.
  • Hiding / Insertion of the Message: The message is hidden by overwriting some specific bits of the file with the bits of the message.
  • CRC Updation: The CRC bytes must be updated in order to match the new contents of the file. This ensures that the file does not get corrupted.

 

IMPLEMENTATION

The tool was implemented as an integration of 8 main subsections, i.e., Compression and Decompression of the message, Encryption and Decryption of the messages, Hiding and Extraction of the message from both wav files and png files, each of which are explained below.

 

  • LZW Compression / Decompression:

Lzw Compression:

The data from messageBuffer is obtained by reading fixed 12 bits. The idea of the compression algorithm is as follows: as the input data is being processed, a dictionary keeps a correspondence between the longest words encountered and a list of code values. The words are replaced by their corresponding codes and so the input file is compressed. Therefore, the algorithm's efficiency increases as the number of long, repetitive words in the input data increases.

LZW Decompression:

The code takes encoded data from messageBuffer and turns it back into the original message using a dictionary named dictionary. It starts by reading numbers that represent parts of the message and storing them in a vector named codes_of_dictionary. These codes are read from messageBuffer by reading 12 fixed bits. These numbers are like codes that stand for different parts of the message. Then, it looks up each code in the dictionary to find the corresponding part of the message. As it goes through the data, it keeps adding new parts to the dictionary to understand the message better. After decoding, it puts the original message back into messageBuffer, giving us the uncompressed message.

 

  • AES Encryption / Decryption

AES-128 Encryption:

The AES encryption segment was implemented by writing functions for each step involved, which were essentially helper functions to the main AES_encrypt function. Within the main AES_encrypt function, each of these functions was called sequentially as per the algorithm.

The plaintext was first obtained from the message buffer and then padded using the PKCS#7 padding scheme. The resulting padded plaintext was broken into blocks of 16 bytes each and stored in a 2D array. Each member of this array is a message block of 16 bytes, and each of these blocks is made to undergo encryption.

The key expansion step is also an elaborate process. First, a 44 * 4 array was created, and the key given as input was filled into this array, i.e. the first four rows were filled with the given key.

This key is made to undergo operations that create the remaining 10 keys. Finally, once all 44 rows are filled, the array is flattened to a 1D array of 176 elements and an 11 * 16 array is created and filled with the elements from the 1D array, thereby giving an array of 11 keys.

After this step, the remaining operations are carried out as per the encryption algorithm, and the resulting ciphertext is pushed back into the message buffer.

AES-128 Decryption:

The AES-128 Decryption algorithm has been implemented with a Decryptor class containing all the functions and instance variables meant for storing data. Each instance of this Decryptor encapsulates all the information and functions required for decryption. Every function implemented is a utility function called in the specific order dictated by the AES-128 Decryption algorithm.

Initially, the message is read through the message_buffer and divided into blocks of 16 bytes. These blocks are then decrypted individually using the Cipher Block Chaining method. The final block is depadded of the PKCS#7 padding applied during encryption. Subsequently, the decrypted text is fed back into the message_buffer for further use by other programs.

Many steps of the decryption process closely resemble those of encryption. The Key expansion step remains identical to the encryption process. The remaining functions effectively achieve the exact inverse of their encryption counterparts. During key expansion, an additional 10 keys (each of 4 words where 1 word is equivalent to 4 bytes) are generated. These words are stored in a 2D array and recalled using a utility function. These keys are crucial during decryption and are fed into the algorithm in reverse order (Key 10 to Key 0).

The remaining functions are called sequentially according to the decryption algorithm, with each decrypted block added to the message buffer as well.

 

  • Extracting from and Hiding in WAV files

The algorithm for reading a message from a WAV file was implemented by iterating through the input buffer until the data subchunk was reached. Initially, the 12 bytes representing the WAV header were skipped. Then, 8 bytes were read to identify the type and size of each subchunk. If the type was not "data," the algorithm skipped ahead by the size of the subchunk. Upon reaching the data subchunk, the first data byte was read, and encryption and compression statuses were determined based on the least significant bits. The size of the message was then extracted from the subsequent 16 bytes. For each set of 4 input bytes, the algorithm isolated the last two bits from each byte and constructed the message byte accordingly, shifting it left by 2 bits after each extraction. This process was repeated sz times to obtain the complete message. Finally, each message byte was stored in the message buffer.

The code for hiding a message in WAV files is implemented by initializing a pointer variable to keep track of the current position within the input data. It appends the first 12 characters of the input buffer to the output buffer. It then enters a while loop that continues until the pointer reaches the end of the input buffer. Within each iteration, it reads the subchunk_id and subchunk_size from the input buffer. If the subchunk_id is not "data", it copies it along with its size and content directly to the output buffer. If the subchunk_id is "data" (indicating the data subchunk), it proceeds with embedding the message into this subchunk. It first writes "data" to the output buffer followed by the modified subchunk_size. It then sets the first_byte based on encryption and compression flags. Next, it calculates the size of the message to be embedded and writes it to the output buffer. Then it iterates over each byte of the message, extracting two bits at a time and embedding them into the least significant bits of each byte of the audio file. After processing each subchunk, the pointer is updated to move to the next subchunk.

 

  • Extracting from and Hiding in PNG files

A separate section of the code is designed to extract a hidden message from a PNG file. The first encryption bit and compression bit are initialised to 0 to store information about the encryption and compression of the extracted message. Once that is done, the function iterates through the PNG file, which is represented as a string (input_str) starting from index 8, which typically indicates the beginning of the PNG data chunks. Then, each chunk's size is extracted from the PNG file. If the chunk is of type "IDAT" (image data), the function proceeds to extract the hidden message. Once that is done, the function extracts the size of the hidden message encoded within the PNG file. This size is stored in messageSize. Then, using nested loops, the function decodes each byte of the hidden message from the image data. Within the inner loop, it extracts 2 bits from each byte of the image data and assembles them into a message byte. Each decoded message byte is stored in a buffer (message_buffer) for later retrieval. Once the entire hidden message is extracted, the loop breaks and the function exits.

The code for hiding a message in a given PNG file is implemented by maintaining a pointer in the input buffer to keep track of the data read till that point. The first 8 bytes (PNG header) are written directly to the output buffer. The following 8 bytes, which signify the size and type of the chunk, are read. If the type is not of the "IDAT" type, the entire chunk is written to the output buffer without any modifications. If the chunk is of type "IDAT", then the data is hidden in the data portion of the chunk. The 2 least significant bits of the first byte are modified to indicate whether the message has been encrypted or compressed respectively. The following 16 bytes of the data section are hidden with the size of the message which is being hidden. Then, using a loop to iterate over the entire message or the data size, 1 byte of the message buffer is hidden in 4 input bytes. Once this is done, the CRC of the file is updated to match the new contents of the file, and the function exits once this is completed. 

 

GITHUB REPOSITORY

The source codes for the Command-Line Interface tool can be found here.

 

CONCLUSION AND FUTURE WORK

This project provided a great opportunity for the mentees to learn about steganography, AES (a popular encryption scheme), compression and decompression of files and hiding and extracting contents from files. The CLI tool made during the project encrypts and hides the user-given message in the audio/image file provided by the user. The tool can also extract and decrypt the message hidden in any given audio/image file.

Possible future work in this area can be applying the same techniques to file types other than .wav and .png files, and trying different encryption schemes for protecting the data.

METADATA

Report prepared on May 9, 2024, 11:24 p.m. by:

  • Singaraju B V Sreedakshinya [CompSoc]
  • Prabhanjan Bolanthur Prabhu [CompSoc]

Report reviewed and approved by Aditya Pandia [CompSoc] on None.

Check out more projects!