Virtual Expo 2024

Bipedal Robot Simulation using MATLAB

Envision
Diode
Piston

Objectives

  • Prime objective: Design and simulate a walking bipedal robot using MATLAB
  • Academic goal: Develop skills in using MATLAB, Simulink, and Simscape Multibody
  • Outcome: Ability to model complex systems using the knowledge gained from this project

Mentees

  • Tejhuram Ravichandran
  • Aryan
  • Rishadd Ranjith
  • Karthikeswar Nadh

Introduction

Bipedal robots designed to mimic the locomotion of humans are making significant strides in various sectors of society. They’re used in disaster recovery where human access is risky, and in healthcare for rehabilitation or assisting the elderly. In entertainment, they’re used for realistic animations or theme park attractions. Simulating these robots using tools like MATLAB is beneficial as it allows for testing and optimization in a virtual environment before physical prototyping. This can save time, reduce costs, and improve the robot’s performance and safety. Thus, the advancement of bipedal robots, coupled with simulation tools, holds promising potential for societal benefits.

Softwares Used

  • MATLAB
  • Simulink
  • Simscape Multibody

Literature Review

MATLAB

MATLAB is a high-level programming language and environment designed for numerical computation. It combines computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Simulink, a companion application to MATLAB, is an interactive system for modeling, simulating, and analyzing dynamic systems. It provides an interactive graphical environment and a customizable set of block libraries that let you design, simulate, implement, and test a variety of time-varying systems. MATLAB and Simulink are widely used in various fields such as engineering, physics, and finance for tasks like data analysis, algorithm development, modeling and simulation, and even for creating graphical user interfaces. They offer a powerful and flexible environment for solving complex problems and developing sophisticated applications.

Simscape Multibody

Simscape Multibody is a specialized product from MATLAB used for simulating 3D mechanical systems like robots, vehicle suspensions, and aircraft landing gear. It allows you to model multibody systems using blocks representing bodies, joints, constraints, force elements, and sensors. You can import complete CAD assemblies into your model. It formulates and solves the equations of motion for the complete mechanical system. An automatically generated 3D animation lets you visualize the system dynamics. It's a powerful tool for developing control systems and testing system-level performance.

The 3D Linear Inverted Pendulum Mode A simple modeling for a biped walking pattern generation - Shuuji Kajita et al.

This paper presents a new model for 3D walking control of a biped robot. The authors analyze the dynamics of a three-dimensional inverted pendulum, where motion is constrained to move along an arbitrarily defined plane. This analysis leads to a simple linear dynamics model, named the Three-Dimensional Linear Inverted Pendulum Mode (3D-LIPM). The paper discusses the geometric nature of trajectories under the 3D-LIPM and a method for walking pattern generation. It also includes a simulation result of a walking control using a 12 d.o.f. biped robot model. The 3D-LIPM simplifies the walking pattern generation significantly.

Three-Dimensional Bipedal Walking Control Based on Divergent Component of Motion - Johannes Englsberger et al.

This paper presents a method for real-time planning and control of Divergent Component of Motion (DCM) trajectories in 3D. The authors introduce the "Enhanced Centroidal Moment Pivot point" (eCMP) and the "Virtual Repellent Point" (VRP), which allow for the encoding of both direction and magnitude of the external forces and the total force acting on the robot. They address the problem of underactuation and propose methods to guarantee feasibility of the finally commanded forces. The capabilities of the proposed control framework are verified in simulations.

Methodology

Modelling the mechanics of the legs

Each individual leg is modelled using a combination of joints, rigid transforms and solid blocks. The bot starts off by first dropping onto the world plane which is modelled as a long solid. The knee joints bend to a predetermined angle, causing the torso (main body) of the bot to fall down to a lower height. Each individual joint is actuated by feeding in an input from a matrix called jAngs using the Physical Signal to Simulink Signal converter block. The calculation of joint angles is elaborated on in the section below in the Linear Inverted Pendulum. The bot then falls forward, just when the swing foot is placed forward to stop the bot from falling onto the ground. This foot now acts as the anchor foot to allow the other foot to swing forward. This sequence of steps is repeated to move the bot.

Friction between the world plane and the bot's feet are modelled using Spatial Contact Force blocks. This block models forces between solids using a penalty force. The force is modelled as a stiff spring when the two bodies are in contact. Else, there is no force that is acted upon the solids. These Spatial Contact Force blocks are placed as 4 spheres on all the frames at the corners of the feet to simulate friction between the feet and the world plane.

The torso is connected to the legs via two 6-DOF joints and contains no moving parts. It acts the bob for the Linear Inverted Pendulum.

Linear Inverted Pendulum (LIPM) and joint angles calculation

The legs of the bot are a slightly more complicated version of a Linear Inverted Pendulum. The Linear Inverted Pendulum is a simple model used in dynamics and control theory. It represents a pendulum with its center of mass above its pivot point, making it inherently unstable. The LIPM has linear dynamics, which are useful for designing control algorithms and biped gait in robotics. Stabilization of the LIP requires active balance, often achieved by moving the pivot point horizontally as part of a feedback system. This model is a benchmark for testing control strategies.

The LIPM's simulation can be observed in the LIPM app. Using this app, we determine the conditions that result in the most symmetric trajectory. We then find the initial conditions of the LIPM using the findInitialConditions function. This produces the trajectories of the center of mass (torso) and the foothold of the bot, along with the trajectories of the individual feet. We observe that the first step is slightly shorter than the steps it takes later as the bot has to first fall forward to initiate motion. 

Inverse Kinematics allows us to calculate the joint angles from the analysis of the feet trajectories relative to the torso. We use an analytic solution of inverse kinematics for the leg based on Closed-Form Inverse Kinematic Joint Solution for Humanoid Robots by Ali et al. The function for inverse kinematics is called invKinBody2Foot. The input to the simulation will be the feet transform matrix that contain position and orientation information. This produces the walking pattern. The produced joint angles form the jAngs matrices for the left and right legs.

Simulating Actuation

We have been directly providing motion profiles to the joints in the form of simulink signals. Now to model the closed loop system of a robot's sensors and actuators, we have used actuator profiles from MATLAB's File Exchange. In here, there are 3 methods that use different physical sensors and actuators to simulate what a real robot would do. 

The first method does not involve any controllers and actuators, while the second method involves creating torque profiles and supplying them to the joints to simulate a servo. The third method uses Simscape Electrical models to simulate the robot using mechatronic models.

Control (Reinforcement Learning):

The is to make the robot walk in a straight line using minimal control effort. Reinforcement Learning using the MATLAB RL toolkit facilitates an easier way to perform the task and achieve the goal.

Agent - Environment Interaction Map

I) Robot Model:

In the 3D model:

  • In the neutral 0 rad position, both of the legs are straight and the ankles are flat.

  • The foot contact is modeled using the Spatial Contact Force (Simscape Multibody) block.

  • The agent can control 3 individual joints (ankle, knee, and hip) on both legs of the robot by applying torque signals from -3 to 3 N·m. The actual computed action signals are normalized between -1 and 1.

The environment provides the following 29 observations to the agent.

  • Y (lateral) and Z (vertical) translations of the torso center of mass. The translation in the Z direction is normalized to a range similar to the other observations.

  • X (forward), Y (lateral), and Z (vertical) translation velocities.

  • Yaw, pitch, and roll angles of the torso.

  • Yaw, pitch, and roll angular velocities of the torso.

  • Angular positions and velocities of the three joints (ankle, knee, hip) on both legs.

  • Action values from the previous time step.

The episode terminates if either of the following conditions occurs.

  • The robot's torso's center of mass is less than 0.1 m in the Z direction (the robot falls) or more than 1 m in either Y direction (the robot moves too far to the side).

  • The absolute value of either the roll, pitch, or yaw is more significant than 0.7854 rad.

The following reward function rt, which is provided at every time step, is inspired by [2].

 

rt=vx−3y2−50ˆz2+25TsTf−0.02∑iuit−12

 

Here:

  • Vis the robot's translation velocity in the X direction (forward toward the goal).

  • Y is the lateral translation displacement of the robot from the target straight-line trajectory.

  • Z is the normalized vertical translation displacement of the robot's center of mass.

  • uit−1 is the torque from the joint I from the previous time step.

  • Ts is the sample time of the environment.

  • Tf is the final simulation time of the environment.

This reward function encourages the agent to move forward by providing a positive reward for positive forward velocity. It also encourages the agent to avoid episode termination by delivering a constant reward (25TsTf) at every time step. The other terms in the reward function are penalties for substantial changes in lateral and vertical translations and for the use of excess control effort.

2) Agent:

The Agent is decided by comparing the performance of DDPG and TD3 algorithms based on the task requirements. As we are creating a simulation, we will test out both the agents (DDPG and TD3) and find out their pros and cons. These agents has been trained for 3000 epochs with each epoch lasting at most maxSteps time steps.

- DDPG : DDPG agents use a parametrized deterministic policy over continuous action spaces, which is learned by a continuous deterministic actor, and a parametrized Q-value function approximator to estimate the value of the policy. Use neural networks to model both the policy and the Q-value function. 

- TD3 Agent : The critic of a DDPG agent can overestimate the Q value. Since the agent uses the Q value to update its policy (actor), the resultant policy can be suboptimal and can accumulate training errors that can lead to divergent behavior. The TD3 algorithm is an extension of DDPG with improvements that make it more robust by preventing overestimation of Q values.

  • Two critic networks — TD3 agents learn two critic networks independently and use the minimum value function estimate to update the actor (policy). Doing so prevents accumulation of error in subsequent steps and overestimation of Q values.

  • Addition of target policy noise — Adding clipped noise to value functions smooths out Q function values over similar actions. Doing so prevents learning an incorrect sharp peak of noisy value estimate.

  • Delayed policy and target updates — For a TD3 agent, delaying the actor network update allows more time for the Q function to reduce error (get closer to the required target) before updating the policy. Doing so prevents variance in value estimates and results in a higher quality policy update. This example sets the policy and target frequency options to 1 as to better compare the impact of two critic networks and target policy noise between the DDPG and TD3 agents.

Training Code Snippet : 

maxEpisodes = 3000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
    MaxEpisodes=maxEpisodes,...
    MaxStepsPerEpisode=maxSteps,...
    ScoreAveragingWindowLength=250,...
    Verbose=false,...
    Plots="training-progress",...
    StopTrainingCriteria="EpisodeCount",...
    StopTrainingValue=maxEpisodes,...
    SimulationStorageType="file");

For Parallel Training:

trainOpts.UseParallel = true;
trainOpts.ParallelizationOptions.Mode = "async";
Training Performance of DDPG Agent
Training Performance of TD3 Agent​​​​

- Simulation of Trained Agents:

simOptions = rlSimulationOptions(MaxSteps=maxSteps);
experience = sim(env,agent,simOptions);
Bot Simulation

 

Learning Curve Comparison Plot
Q0 Episode Comparison Plot
 

Based on the Learning curve comparison plot:

  • The DDPG agent can pick up learning faster than the TD3 agent in some cases, however the DDPG agent tends to have inconsistent long-term training performance.

  • The TD3 agent consistently learns satisfactory walking policies for the environment and tends to show better convergence properties compared to DDPG.

Based on the Episode Q0 comparison plot:

  • The TD3 agent more conservatively estimates the discounted long-term reward compared to DDPG. This is because TD3 estimates the discounted long-term reward as the minimumum of the values approximated from the two critics.

Results

The bipedal robot simulation was successfully conducted using MATLAB and Simscape Multibody. The joint trajectories were smooth and continuous, reflecting the effectiveness of the control algorithm. The robot was also able to recover from minor disturbances, demonstrating the robustness of the control system. These results validate the effectiveness of the proposed design and control strategy for bipedal locomotion. Future work will enhance the robot’s ability to adapt to uneven terrains and external disturbances.

 

Future Scope

  • Terrain Adaptability: Enhance the robot's ability to adapt to varying terrains, including slopes and uneven surfaces.
  • Disturbance Recovery: Improve the control algorithm to handle larger external disturbances and maintain balance.
  • Energy Efficiency: Optimize the gait pattern and control strategy to reduce energy consumption during locomotion.
  • Advanced Maneuvers: Extend the capabilities to include running, jumping, and other complex maneuvers for increased versatility.

References

GitHub Repository

All the simulation files and source codes can be found here.

Acknowledgement

As executive members of IEEE NITK, we are incredibly grateful for the opportunity to learn and work on this project under the prestigious name of the IEEE NITK Student Chapter. We want to extend our heartfelt thanks to IEEE for providing us with the funds to complete this project successfully.

Meet Link

METADATA

Report prepared on May 6, 2024, 10:53 p.m. by:

  • Adithya Kasal [Diode]
  • Nishchay Pallav [Diode]
  • Chirag Ingle [Piston]

Report reviewed and approved by Aditya Pandia [CompSoc] on May 9, 2024, 10:49 p.m..

Check out more projects!