Quad rocket balancing Jaime.
A reinforcement learning experiment. In this video it’s trying to find its balance with an initial tilt.
Some time later, it has stabilized fairly well.
The next problem is much harder, finding its balance and moving to 0,0,0 from various locations in space.
It’s basically a classifier, with 16 states. The flipping between action 5 and 15 you see means it’s changing between two diagonal rockets (action 5) and all four rockets (action 15), going down and up respectively.
That it’s using action 5 and not 0 (no rockets) going down is not efficient, but understandable since it wants to minimize deltaV. Adding a penalty to using rockets might make it save fuel.
This all comes from me hanging in the OpenAI Gym. After solving the CartPole problems I wanted to do the lunar lander one, but Box2D wouldn’t install, so I made my own simulator.