reset password
Author Message
rabbott
Posts: 1649
Posted 22:00 Sep 11, 2018 |

As originally written, you were asked to jump into the CS 188 code and complete some of the methods. I now think that was too much to ask given the rest of the material this week. Instead, I'd like you to write code that generates the numbers in the figure on the right hand side of Example 3.5 in Sutton and Barto (p. 60). This is an extended version of Exercise 3.14. The online assignment (4 MDPs The Value Function) has been modified to reflect this change.

Your code should generate an array as shown and then use value iteration to compute the values.  It will take a number of iteration steps, perhaps about 30. So your code should loop through the cells in the array and repeatedly compute their values based on the cells to which they transition.

As stated in Example 3.5, 

Actions that would take the agent off the grid leave its location unchanged, but also result in a reward of −1. Other actions result in a reward of 0, except those that move the agent out of the special states A and B. From state A, all four actions yield a reward of +10 and take the agent to A'. From state B, all actions yield a reward of +5 and take the agent to B'.

Don't forget about the discount factor of 0.9.

Writing this code and getting the numbers in Example 3.5, demonstrates at least an implicit understanding of the mathematical discussion of the week. So focus on writing the code.

Last edited by rabbott at 22:14 Sep 11, 2018.
Rolf Castro
Posts: 2
Posted 22:01 Sep 14, 2018 |

Is there any way to extend the due date of the programming part?

The Bellman Equation is a lot to digest and programming it to work, I assume, like the Cart-Pole Problem from scratch is plenty difficult.

rabbott
Posts: 1649
Posted 19:42 Sep 15, 2018 |

Most assignments are intended as preparation for upcoming class meetings. We talked about value iteration today in the Saturday class and will talk about it next week in the MW class. I'd like people to be prepared for that discussion -- and to show their code if they are willing.