Hand Gesture Controlled Virtual Mouse using ESP32-CAM
10 May , 18:00 - 20:00
Join with Google meet: https://meet.google.com/amv-xkqx-imi
Or join by phone : (US) +1 218-288-2641 PIN: 762 342 364#
Mentors
Mentees
In this project we aim to design a virtual mouse controlled by hand gestures. The project will implement an ESP32CAM microcontroller attached to the OV2640 Camera Module. The microcontroller is used to capture the hand gestures, and the hand gestures are processed and identified using Image Processing using OpenCV. The mouse would be able to be controlled wirelessly and perform tracking and clicking operations.
Human-computer interaction (HCI) plays a pivotal role in modern computing systems, shaping the way users interact with technology. Gone are the days of cumbersome peripherals—our system offers an alternative that empowers users to interact with technology in a natural and intuitive manner. With hands-free control, users can seamlessly navigate interfaces, manipulate objects, and execute commands with ease. Accessibility is a must, and such a system provides inclusivity for users with mobility impairments. With ongoing advancements in hardware and software, we envision a world where gesture-based interaction becomes the norm. This project hopes to capture and demonstrate those futuristic ideas and scenarios. The stepping stone to implementing and experiencing such technology is through the hands-free control of the machines closest to us, our computers. The objective is to develop the hands-free control of the mouse cursor on a PC using a ESP32CAM to capture hand gestures, and through its Wi-Fi capabilities broadcast is camera feed to be processed by an external computer using Python scripts and libraries, resulting in the contactless control over the mouse cursor using only your hand gestures.
The project consists of two main parts – the ESP32CAM embedded system and Image processing using python.
The ESP32CAM is a microcontroller which we will be using to record continuous video and stream it to a web server created using the wi-fi capabilities of the ESP32CAM.
The streamed video will be received on a PC where the Image processing part of the project starts.
The Image processing consists of writing python code(using OpenCV libraries) to identify hands and fingers in the streamed video and recognize hand gestures to wirelessly control a mouse on screen of the laptop to which the video is being streamed to.
The functionalities of the virtual mouse will include movement and clicking.
ESP32-CAM is a low-cost ESP32-based development board with onboard camera, small in size. It is an ideal solution for IoT application, prototypes constructions and DIY projects. The board integrates WiFi, traditional Bluetooth and low power BLE , with 2 high performance 32-bit LX6 CPUs. It adopts 7-stage pipeline architecture, on-chip sensor, Hall sensor, temperature sensor and so on, and its main frequency adjustment ranges from 80MHz to 240MHz. Fully compliant with WiFi 802.11b/g/n/e/i and Bluetooth 4.2 standards, it can be used as a master mode to build an independent network controller, or as a slave to other host MCUs to add networking capabilities to existing devices ESP32-CAM can be widely used in various IoT applications. It is suitable for home smart devices, industrial wireless control, wireless monitoring, QR wireless identification, wireless positioning system signals and other IoT applications. It is an ideal solution for IoT applications.
Some of it's important features are:
Up to 160MHz clock speed,Summary computing power up to 600 DMIPS
Built-in 520 KB SRAM, external 4MPSRAM
Supports UART/SPI/I2C/PWM/ADC/DAC
Support OV2640 and OV7670 cameras, Built-in Flash lamp.
Support image WiFI upload
The FT232RL is a USB-to-serial UART interface converter chip manufactured by FTDI (Future Technology Devices International). It allows serial communication between a computer's USB port and other devices using UART communication protocol.
Its key features involve:
USB Interface: Connects to a computer's USB port.
UART Interface: Provides serial communication to connected devices.
Integrated EEPROM: Stores vendor-specific information such as product ID, serial number, etc.
FT232RL is commonly used in projects involving microcontroller programming, serial communication with external devices, and USB-to-serial conversion.
The micro USB cable serves as the physical interface between the FT232RL FTDI module and the computer. When connected to a computer's USB port, the micro USB cable provides power to the FT232RL FTDI module and establishes a serial communication link between the module and the computer allowing it to function and communicate.
Both serve as a semi-permanent platform to implement the hardware and allowing for ease of testing and debugging.
To connect ESP32-CAM to FT232RL FTDI, follow these step:
Identify Pinout:
ESP32-CAM: Locate the GPIO pins used for UART communication. These are labeled as UOR (UART receive) and UOT (UART transmit).
FT232RL FTDI: Identify the TX (transmit), RX (receive), and GND (ground) pins.
Connect TX to RX and RX to TX as per UART protocol:
Connect the UOT pin of the ESP32-CAM to the RX pin of the FT232RL FTDI.
Connect the UOR pin of the ESP32-CAM to the TX pin of the FT232RL FTDI.
Connect Ground:
Connect the GND pin of the ESP32-CAM to the GND pin of the FT232RL FTDI.
Provide Power:
Ensure both devices are powered adequately.
Connect 5V pin of the ESP32-CAM to the VCC(5V) pin of the FT232RL FTDI and short the GPIO 0 pin with GND pin on the ESP32-CAM to flash the code onto the ESP32CAM.
Connect USB to Computer:
Connect the USB port of the FT232RL FTDI to a USB port on your computer.
The micro USB cable powers the FT232RL FTDI module and establishes communication with the computer.
The ESP32CAM is programmed using the Arduino IDE.
Using the built-in “esp32cam” and “Wi-Fi” libraries, a web server is setup on a local network, given the SSID and password to the local network, and an IP address is assigned to the same web server.
The web server is written in HTML and is programmed to capture/stream the image and video output in different file formats and resolutions as shown below.
The IP address of the web server is displayed onto the serial monitor of the Arduino IDE and is used to integrate the web server into the python script.
Install the necessary Python libraries, including OpenCV (cv2), NumPy (numpy), keyboard, autopy, time and possibly others depending on your specific requirements. You can use pip to install these libraries, eg:
pip install opencv-python numpy.
Import the required libraries at the beginning of your Python script:
import cv2, import numpy as np.
The gesture recognition works by continuously capturing frames from the video stream provided by the ESP32-CAM using OpenCV's cv2.VideoCapture() function. Images from the camera webserver are taken in quick intervals resulting in a video via urllib request module.
The image is passed through a pre trained model of hand detection by mediapipe to detect a single hand, and the digits and joints present on the hand as well.
Once the hand is detected, analyze the hand regions to recognize specific gestures. This can involve techniques such as contour analysis, convex hull calculation, or template matching to identify predefined gestures (e.g., gestures for cursor movement, clicking, scrolling).
Map recognized gestures by setting conditions on distances between each element detected by the mediapipe model, to cursor movements on the screen. We can use the position of the hand within the frame to determine the movement direction and speed of the cursor.
The implementation is as follows:
When the index finger raised: the mouse follows the position of the hand
When the index and middle finger raised together: the mouse waits for the distance between the two fingers to be less than a threshold to send the left click signal.
In summary, the video captured by the ESP32-CAM is streamed over Wi-Fi to your Python environment, where it's processed using OpenCV and mediapipe for hand gesture recognition and cursor control. This integration allows you to leverage the capabilities of both the ESP32-CAM and Python/OpenCV to create a simple gesture-based interface.
Frame Rate: The virtual mouse system achieved a good frame rate during testing, indicating smooth and responsive video streaming and gesture recognition.
Latency: The system demonstrated low latency, with an average delay in milliseconds between hand gestures and cursor movements, ensuring real-time interaction with minimal delay.
Accuracy: Gesture recognition algorithms exhibited high accuracy, correctly identifying and recognizing predefined hand gestures.
Precision: Cursor movements closely mirrored hand movements, demonstrating precise control and alignment with user intentions.
Responsiveness: Cursor movements were highly responsive to hand gestures, with minimal lag or delay observed during testing.
Enhancements: Future enhancements for the virtual mouse system may include incorporating more gesture, multi hand recognition and better detection models, expanding compatibility with other devices and platforms.
Throughout the project, we successfully integrated hardware components, software libraries, and algorithms to create a versatile and intuitive interface for cursor control without the need for a physical mouse.
The implementation of the ESP32-CAM module provided a reliable platform for capturing video streams and streaming them over Wi-Fi, enabling seamless communication with the Python environment. By leveraging OpenCV's powerful image processing capabilities, we were able to detect and recognize hand gestures in real-time, translating them into precise cursor movements and actions.
The integration of hand gesture recognition with cursor control opens up new possibilities for interactive applications, accessibility features, and user interfaces. This project not only demonstrates the potential of combining hardware and software technologies but also highlights the importance of human-centric design principles in creating intuitive and user-friendly systems.
Moving forward, further optimization and refinement of the virtual mouse system can enhance its performance, accuracy, and usability. This includes fine-tuning gesture recognition algorithms, optimizing network communication, and implementing additional features such as multi-gesture support and customizable gestures.
Report prepared on May 5, 2024, 9:32 p.m. by:
Report reviewed and approved by Aditya Pandia [CompSoc] on May 9, 2024, 10:49 p.m..