robotics

Yet Another Tutorial on PPO

I will explain PPO from its surrogate objective, its relationship to policy gradient, importance-ratio clipping, GAE, and critic/value-network training.