This chapter is from the book
6.9 Further Reading
“Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems,” Barto et al., 1983 [11].
“High Dimensional Continuous Control with Generalized Advantage Estimation,” Schulman et al., 2015 [123].
“Trust Region Policy Optimization,” Schulman et al., 2015 [122].