Author | Message |
---|---|
margent
Posts: 4
|
Posted 14:34 Oct 06, 2018 |
Can somebody please explain their code for #4? Also this is the code that I have (it works) but I didn't know how to explain it to the standard of the quiz. So if you can also explain what I did, please do so. # get the most optimal action for the state policies = util.Counter() for action in self.mdp.getPossibleActions(state): # how optimal is an action # Where action = qvalue (which considers each outcome) policies[action] = self.getQValue(state, action) # return the most optimal action return policies.argMax()
|
jpatel77
Posts: 44
|
Posted 15:08 Oct 06, 2018 |
Although we (Saturday class) haven't had this quiz yet, I will try to explain it anyway as much as I can understand. I suspect that this def returns the Q given the state 'state'. Okay so, After initializing the policies dictionary, you then populate it with all the actions with its respective Q-value given the state 'state'. So, your dict might look like, {'a1': 3, 'a2': 4, 'a3': 8} something like that... At the end, "return the most optimal action". What that return statement does is, it actually returns the key (argMax() returns the argument from the collection that produces the maximum value, which in this case is the key, i.e. one of the {'a1', 'a2', 'a3'} that produces maximum value). Which turns out to be 'a3' in our case. So, I do not find anything controversial except for one comment i.e. # how optimal is an action # Where action = qvalue (which considers each outcome) I didn't quite understand what do you mean by "where action = qvalue" ? Again, We haven't gone through this quiz yet so, if I misunderstood your question, totally ignore it :D |
margent
Posts: 4
|
Posted 19:30 Oct 06, 2018 |
I guess what I meant by those comments are that: The policies list holds actions which are really the result of the value function (the qvalue). Therefore action = q value. The action/qvalue would then be inserted into the list of policies.
I am not sure if that reasoning is correct or not, but that is what I meant.
Last edited by margent at
19:30 Oct 06, 2018.
|
rabbott
Posts: 1649
|
Posted 20:37 Oct 06, 2018 |
Your code and explanation look fine. I'm not sure why I marked you down on the quiz. (Calling the Counter "policies" was confusing and misleading, but that's not so terrible.) I'll revise your quiz score. |