Reinforcement Learning - The Actor-Critic Algorithm

By Laura Graesser and Wah Loon Keng
Dec 5, 2019

📄 Contents

␡

⎙ Print

< Back Page 7 of 10 Next >

This chapter is from the book 

Foundations of Deep Reinforcement Learning: Theory and Practice in Python

Learn More Buy

6.7 Experimental Results

In this section we will run two Actor-Critic experiments using SLM Lab. The first experiment will study the effect of the step size when using the n-step returns advantage estimate. The second experiment studies the effect of λ when using GAE. We will use the Atari Breakout game as a more challenging environment.

6.7.1 Experiment: The Effect of n-Step Returns

The step size n controls the bias-variance trade-off in the n-step returns advantage estimate—the larger n, the larger the variance. The step size n is a tunable hyperparameter.

In this experiment, we study the effect of using different values of n for the n-step returns in Actor-Critic by performing a grid search. The experiment spec file is extended from Code 6.7 by adding a search spec for num_step_returns, as shown in Code 6.13.

Note that we now use a different environment, Breakout, which is slightly more challenging than Pong. Lines 4 and 7 specify the change in the environment. Line 19 specifies a grid search over a list of n values num_step_returns. The full spec file is available in SLM Lab at slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json.

Code 6.13 A2C with n-step returns spec file with search spec for different values of n, `num_step_returns`.

 1  # slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json
 2
 3   {
 4     "a2c_nstep_breakout": {
 5       ...
 6       "env": [{
 7         "name": "BreakoutNoFrameskip-v4",
 8         "frame_op": "concat",
 9         "frame_op_len": 4,
10         "reward_scale": "sign",
11         "num_envs": 16,
12         "max_t": null,
13         "max_frame": 1e7
14       }],
15       ...
16       "search": {
17         "agent": [{
18           "algorithm": {
19             "num_step_returns__grid_search": [1, 3, 5, 7, 9, 11]
20           }
21         }]
22       }
23     }
24   }

To run the experiment in SLM Lab, use the commands shown in Code 6.14.

Code 6.14 Run an experiment to search over different step sizes n in n-step returns, as defined in the spec file.

1 conda activate lab
2 python run_lab.py slm_lab/spec/experimental/a2c/a2c_nstep_n_search.json
  ↪ a2c_nstep_breakout search

This will run an Experiment which spawns six Trials, each with a different value of num_step_returns substituted in the original Actor-Critic spec. Each Trial runs four Sessions to obtain an average. The multitrial graphs are shown in Figure 6.5.

FIGURE 6.5 The effect of different n-step returns step sizes of Actor-Critic on the Breakout environment. Larger step sizes n perform better.

Figure 6.5 shows the effect of different n-step returns step sizes on Actor-Critic in the Breakout environment. Larger step sizes n perform better, and we can also see that n is not a very sensitive hyperparameter. With n = 1, the n-step returns reduce to the Temporal Difference estimate of returns, which yields the worst performance in this experiment.

6.7.2 Experiment: The Effect of λ of GAE

Recall that GAE is an exponentially weighted average of all the n-step returns advantages, and the higher the decay factor λ, the higher the variance of the estimate. The optimal value of λ is a hyperparameter that is tuned for a specific problem.

In this experiment, we look at the effect of using different λ values on GAE in Actor-Critic by performing a grid search. The experiment spec file is extended from Code 6.9 by adding a search spec for lam, as shown in Code 6.15.

We will also use Breakout as the environment. Lines 4 and 7 specify the change in the environment. Line 19 specifies a grid search over a list of λ values lam. The full spec file is available in SLM Lab at slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json.

Code 6.15 Actor-Critic spec file with search spec for different values of GAE λ, `lam`

 1  # slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json
 2
 3   {
 4     "a2c_gae_breakout": {
 5       ...
 6       "env": [{
 7         "name": "BreakoutNoFrameskip-v4",
 8         "frame_op": "concat",
 9         "frame_op_len": 4,
10         "reward_scale": "sign",
11         "num_envs": 16,
12         "max_t": null,
13         "max_frame": 1e7
14       }],
15       ...
16       "search": {
17         "agent": [{
18           "algorithm": {
19             "lam__grid_search": [0.50, 0.70, 0.90, 0.95, 0.97, 0.99]
20           }
21         }]
22       }
23     }
24   }

To run the experiment in SLM Lab, use the commands shown in Code 6.16.

Code 6.16 Run an experiment to search over different values of GAE λ as defined in the spec file.

1  conda activate lab
2   python run_lab.py slm_lab/spec/experimental/a2c/a2c_gae_lam_search.json
    ↪ a2c_gae_breakout search

This will run an Experiment which spawns six Trials, each with a different value of lam substituted in the original Actor-Critic spec. Each Trial runs four Sessions to obtain an average. The multitrial graphs are shown in Figure 6.6.

FIGURE 6.6 The effect of different values of GAE λ on Actor-Critic in the Breakout environment. λ = 0.97 performs best, followed closely by 0.90 and 0.99.

As we can see in Figure 6.6, λ = 0.97 performs best with an episodic score of near 400, followed closely by λ = 0.90 and 0.99. This experiment also demonstrates that λ is not a very sensitive hyperparameter. Performance is impacted only slightly when λ is deviates from the optimal value. For instance, λ = 0.70 still produces good result, but λ = 0.50 yields poor performance.

< Back Page 7 of 10 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address