Optimization of a Learned Dynamic Model for Inverted Pendulum

 Skills Used: Deep Learning, Optimization, Ray Tune, PyTorch Lightning, Optuna

GitHub Repo

Context

As part of my final project for the Optimization class offered at BYU, I wanted to test a proof of concept for my research. As part of my research, I am looking at learned dynamic models for complex systems where it may not be feasible to derive the exact dynamic equations. For this project, I decided to create a learned model of an under actuated inverted pendulum (state: [theta, theta_dot], input: [torque]). The system being under actuated means that there is not enough available torque to arbitrarily swing the pendulum from the bottom position to the top position. The goal of this project was to create a learned model that could be used in the control task seen on the left (shown with the analytical model). This task requires forecasting with the model, meaning the model needs to be able to forward propagate the dynamics of the inverted pendulum many time steps.  


Approach 

To solve this problem in the context of optimization, I decided to pick a simple Feed Forward network architecture and have the following design variables:

I used PyTorch Lightning to perform the actual model training. Training data was generated from a random state, input pair and the state at the next time step was found using the analytical model of the inverted pendulum. The final format of the training data is:

network input: [theta, theta_dot, torque] 

network output:   [theta, theta_dot] at the next time step


To handle the optimization, I used the Ray Tune optimization package, with the Optuna Tree-Structured Parzen Estimator (TPE) Algorithm. This algorithm is a Bayesian method, which means that it constructs a model that can estimate the performance of a set of hyper parameters. Each time a new model is trained, the TPE model gets better, resulting in a better learned model of the dynamics.

The optimization objective is to maximize theta accuracy.  A prediction is considered accurate if it is within 0.01 radians (0.57 degrees) of the ground truth. For this optimization, I did not consider theta_dot because the final position of the pendulum was more important. 

For the control of the inverted pendulum I used NEMPC, a non-linear MPC developed at BYU. Essentially NEMPC forecasts a certain number of trajectories using evolutionary methods and then selects the best trajectory. For more information, see the linked paper. 

Results/Discussion

On the left is the animation of the best learned model in the same control task as seen above. The performance is near identical to the analytical model. For more figures and info here is the link to a Google Slide presentation on this project. 

The results here are especially exciting in my research because I will focus on creating learned dynamic models for systems that have either an inaccurate or no analytical model. This will help to perform control on very complex systems, such as soft robots.

My video presentation is embedded below.  

Cheney_optimization_project.mp4