reset password
Author Message
rabbott
Posts: 1649
Posted 14:35 Nov 17, 2018 |

A number of students asked about the role of states in feature-based learning. Apolinar Sanchez pointed to the top equation on this slide. Since s is a state, why do we say that we aren't dealing with states in feature-based learning?.

That's a good question.

The way to think about it is that s represents the state of the world. The world has more states than we can or want to keep track of. But still, the state of the world matters. In Q-learning we want to know the value of taking each of the possible actions in any state. The equation tells us. It is an approximation of the value of taking action a in state s.

When we learn Q(s, a) we look to see what state we reach on action a. That's s'. Then we update Q(s, a) by comparing its current (stored) value to the reward for taking action a plus the discounted Q-value of s', i.e, max (over all a') Q(s', a').

But we are not storing and updating Q-values for Q(s, a). Instead we are storing weights, which we use to approximate the Q-values. So, instead of updated a stored value for Q(s, a) we update the weights we use to approximate Q(s, a).

In the original Pacman assignment, as well as in Taxi and Capture-the-flag, we know s', the state we reach (more or less; the ghosts may haved moved a little) by taking action a in the current state. So we can compute max (over all a') Q(s', a'). It's a bit trickier when you can't predict s'. Cart-pole and Pong must face that issue. That's a separate discussion.

 

Last edited by rabbott at 14:37 Nov 17, 2018.