Answer by gwtw14 for Why does REINFORCE perform badly at first in Sutton and...
I'm actually working on this example too, implemented the REINFORCE algorithm, and got the same result as you. My only guess is that the authors chose a different initial $\theta$ value to show the...
View ArticleWhy does REINFORCE perform badly at first in Sutton and Barto Figure 13.1?
In Sutton and Barto (PDF, page 265), 2nd edition, Figure 13.1 applies REINFORCE to the "short corridor with switched actions" environment from Example 13.1. The figure looks like this:My question is,...
View Article
More Pages to Explore .....