6.7 Experimental Results
In this section we will run two Actor-Critic experiments using SLM Lab. The first experiment will study the effect of the step size when using the n-step returns advantage estimate. The second experiment studies the effect of λ when using GAE. We will use the Atari Breakout game as a more challenging environment.
6.7.1 Experiment: The Effect of n-Step Returns
The step size n controls the bias-variance trade-off in the n-step returns advantage estimate—the larger n, the larger the variance. The step size n is a tunable hyperparameter.
In this experiment, we study the effect of using different values of n for the n-step returns in Actor-Critic by performing a grid search. The experiment spec file is extended from Code 6.7 by adding a search spec for num_step_returns, as shown in Code 6.13.
Note that we now use a different environment, Breakout, which is slightly more challenging than Pong. Lines 4 and 7 specify the change in the environment. Line 19 specifies a grid search over a list of n values num_step_returns. The full spec file is available in SLM Lab at slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json.
Code 6.13 A2C with n-step returns spec file with search spec for different values of n, num_step_returns.
1 # slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json 2 3 { 4 "a2c_nstep_breakout": { 5 ... 6 "env": [{ 7 "name": "BreakoutNoFrameskip-v4", 8 "frame_op": "concat", 9 "frame_op_len": 4, 10 "reward_scale": "sign", 11 "num_envs": 16, 12 "max_t": null, 13 "max_frame": 1e7 14 }], 15 ... 16 "search": { 17 "agent": [{ 18 "algorithm": { 19 "num_step_returns__grid_search": [1, 3, 5, 7, 9, 11] 20 } 21 }] 22 } 23 } 24 }
To run the experiment in SLM Lab, use the commands shown in Code 6.14.
Code 6.14 Run an experiment to search over different step sizes n in n-step returns, as defined in the spec file.
1 conda activate lab 2 python run_lab.py slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json ↪ a2c_nstep_breakout search
This will run an Experiment which spawns six Trials, each with a different value of num_step_returns substituted in the original Actor-Critic spec. Each Trial runs four Sessions to obtain an average. The multitrial graphs are shown in Figure 6.5.
FIGURE 6.5 The effect of different n-step returns step sizes of Actor-Critic on the Breakout environment. Larger step sizes n perform better.
Figure 6.5 shows the effect of different n-step returns step sizes on Actor-Critic in the Breakout environment. Larger step sizes n perform better, and we can also see that n is not a very sensitive hyperparameter. With n = 1, the n-step returns reduce to the Temporal Difference estimate of returns, which yields the worst performance in this experiment.
6.7.2 Experiment: The Effect of λ of GAE
Recall that GAE is an exponentially weighted average of all the n-step returns advantages, and the higher the decay factor λ, the higher the variance of the estimate. The optimal value of λ is a hyperparameter that is tuned for a specific problem.
In this experiment, we look at the effect of using different λ values on GAE in Actor-Critic by performing a grid search. The experiment spec file is extended from Code 6.9 by adding a search spec for lam, as shown in Code 6.15.
We will also use Breakout as the environment. Lines 4 and 7 specify the change in the environment. Line 19 specifies a grid search over a list of λ values lam. The full spec file is available in SLM Lab at slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json.
Code 6.15 Actor-Critic spec file with search spec for different values of GAE λ, lam
1 # slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json 2 3 { 4 "a2c_gae_breakout": { 5 ... 6 "env": [{ 7 "name": "BreakoutNoFrameskip-v4", 8 "frame_op": "concat", 9 "frame_op_len": 4, 10 "reward_scale": "sign", 11 "num_envs": 16, 12 "max_t": null, 13 "max_frame": 1e7 14 }], 15 ... 16 "search": { 17 "agent": [{ 18 "algorithm": { 19 "lam__grid_search": [0.50, 0.70, 0.90, 0.95, 0.97, 0.99] 20 } 21 }] 22 } 23 } 24 }
To run the experiment in SLM Lab, use the commands shown in Code 6.16.
Code 6.16 Run an experiment to search over different values of GAE λ as defined in the spec file.
1 conda activate lab 2 python run_lab.py slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json ↪ a2c_gae_breakout search
This will run an Experiment which spawns six Trials, each with a different value of lam substituted in the original Actor-Critic spec. Each Trial runs four Sessions to obtain an average. The multitrial graphs are shown in Figure 6.6.
FIGURE 6.6 The effect of different values of GAE λ on Actor-Critic in the Breakout environment. λ = 0.97 performs best, followed closely by 0.90 and 0.99.
As we can see in Figure 6.6, λ = 0.97 performs best with an episodic score of near 400, followed closely by λ = 0.90 and 0.99. This experiment also demonstrates that λ is not a very sensitive hyperparameter. Performance is impacted only slightly when λ is deviates from the optimal value. For instance, λ = 0.70 still produces good result, but λ = 0.50 yields poor performance.