reset password
Author Message
rabbott
Posts: 1649
Posted 13:10 Oct 22, 2018 |

I ran the code again and got this result after 300 steps.

After 300 steps:
(0, 0) -> (1, 0): 28.4; (0, 1): 8.3; 
(0, 1) -> (0, 0): 70.0; (0, 2): 40.3; (1, 1): 32.3; 
(0, 2) -> (0, 1): 75.0; (1, 2): 47.5; 
(1, 0) -> (1, 1): 38.3; (0, 0): 17.9; (2, 0): 0.0
(1, 1) -> (1, 2): 48.0; 
(1, 2) -> (0, 2): 60.0; (1, 1): 37.8; (2, 2): 37.7; 
(2, 0) -> (1, 0): 18.4; (2, 1): 0.0
(2, 1) -> (1, 1): 30.3; (2, 0): 0.0; (2, 2): 0.0; 
(2, 2) -> (1, 2): 47.8; (2, 1): 0.0

 

Note all the 0.0 qValues. This means that in looking for the qValues for the states at the left of each of those rows, we must find the max qValue of all the possible transitions from those rows. But if we had not previously taken a transition to a particular state, its default value is set to 0.0.

In addition, when doing actual transitions from those states, since epsilon is 0.5 there is only a 50% chance that we will take the less optimum choice. And even when we do that, the chances of taking any particular choice is 1/(number of possible transitions from that state).

In addition, notice that there is only one possible transition from state (1, 1). That was done artificially on line 48 of crawlerControlled.  We did that last Saturday when exploring one of the questions the homework asks. So the only way to get to a state with arm position 2, is when state (1, 2) goes to (2, 2). But that is a less optimal transition and depends on epsilon, etc. So the short answer is that the 0.0 qValues were a result of bad luck in exploring the world.

So, to get qValues for those transitions: comment out line 48 of crawlerController and increment epsilon after a run starts.  When I did that, I got these values after only 200 steps. No 0.0 qValues.

After 200 steps:
(0, 0) -> (1, 0): 34.5; (0, 1): -3.6; 
(0, 1) -> (0, 0): 72.5; (1, 1): 41.3; (0, 2): 36.4; 
(0, 2) -> (0, 1): 72.1; (1, 2): 45.4; 
(1, 0) -> (1, 1): 46.3; (0, 0): 29.5; (2, 0): 25.8; 
(1, 1) -> (0, 1): 58.0; (1, 2): 44.4; (2, 1): 36.7; (1, 0): 36.2; 
(1, 2) -> (0, 2): 56.8; (2, 2): 36.1; (1, 1): 33.3; 
(2, 0) -> (1, 0): 36.6; (2, 1): 34.1; 
(2, 1) -> (1, 1): 46.2; (2, 2): 33.1; (2, 0): 25.3; 
(2, 2) -> (1, 2): 45.4; (2, 1): 34.2; 

Last edited by rabbott at 13:27 Oct 22, 2018.