View Forum Topic

Forums
CS4660
A couple of notes on Capture-the-Flag and Taxi

Author Message

Author	Message
rabbott Posts: 1649	Posted 20:28 Nov 23, 2018 \| General. A few people had asked about normalizing feature values. I had said that it wasn't very important. It turns out to be important to keep the weight values from blowing up. So I keep the absolute value of all my feature values to less than 1 -- by dividing by an appropriate constant. Capture-the-Flag. Earlier this week, one team told me that their defensive players were successfully approaching invading enemies. This was due to a feature that measured distance to nearest enemy invader. (Smaller is better.) But they refused to eat the invaders. The reason: the distance to nearest invader would increase! (The eaten invader would no longer be the nearest one.) The baseline team solved a similar problem to encourage offensive agents to eat food pellets. One feature measured distance to nearest (edible) food pellet. Another measured number of remaining (edible) food pellets. When a pellet was eaten the number of remaining would decrease (good) as the distance to the nearest one increased (bad). With the appropriate weights, eating a food pellet makes the action more attractive than not eating it. Taxi. In computing the feature `f(state,` `action)`, the feature value of taking `action` in `state`, the best approach is to include both the reward for that action as well as the features of the resulting state. Including the reward is important. Also, if taking the action results in `done` being `True`, don't include the features for the next state. One of my bugs, which took me a long time to find, was in the table I built for possible actions to be used in the shortest path algorithm. The table listed the neighbors of all the border cells, including those next to internal walls. (You don't want to include cells that would require going through a wall to be a neighbor.) I did all that correctly, but I had one typo. The table included (3, 4) as a neighbor for (4, 3). (What it should have said was that (4, 4) was a neighbor for (4, 3).) This error threw off the shortest path calculation from (4, 3) since its computation of shortest path was too small by one. The manifestation of that error was that the agent keep trying to go south from (4, 3), which would hit the southern wall and keep it at (4, 3). But that was ok as far as the computation was concerned since (4, 3) had a secret short route to the destination. My system solves Taxi using both states and features. It takes about a factor 5 times more episodes to solve it using states than features. That's the power of well designed features. They capture the important information about states! It's similar in some ways to the difference between writing code in assembly language (states) and writing code in a higher level language (features). Features, if done right, have far more useful semantic information than states. Last edited by rabbott at 21:21 Nov 23, 2018.

rabbott

Posts: 1649

Posted 20:28 Nov 23, 2018 |

General. A few people had asked about normalizing feature values. I had said that it wasn't very important. It turns out to be important to keep the weight values from blowing up. So I keep the absolute value of all my feature values to less than 1 -- by dividing by an appropriate constant.

Capture-the-Flag. Earlier this week, one team told me that their defensive players were successfully approaching invading enemies. This was due to a feature that measured distance to nearest enemy invader. (Smaller is better.) But they refused to eat the invaders. The reason: the distance to nearest invader would increase! (The eaten invader would no longer be the nearest one.)

The baseline team solved a similar problem to encourage offensive agents to eat food pellets. One feature measured distance to nearest (edible) food pellet. Another measured number of remaining (edible) food pellets. When a pellet was eaten the number of remaining would decrease (good) as the distance to the nearest one increased (bad). With the appropriate weights, eating a food pellet makes the action more attractive than not eating it.

Taxi. In computing the feature f(state, action), the feature value of taking action in state, the best approach is to include both the reward for that action as well as the features of the resulting state. Including the reward is important. Also, if taking the action results in done being True, don't include the features for the next state.

One of my bugs, which took me a long time to find, was in the table I built for possible actions to be used in the shortest path algorithm. The table listed the neighbors of all the border cells, including those next to internal walls. (You don't want to include cells that would require going through a wall to be a neighbor.) I did all that correctly, but I had one typo. The table included (3, 4) as a neighbor for (4, 3). (What it should have said was that (4, 4) was a neighbor for (4, 3).) This error threw off the shortest path calculation from (4, 3) since its computation of shortest path was too small by one. The manifestation of that error was that the agent keep trying to go south from (4, 3), which would hit the southern wall and keep it at (4, 3). But that was ok as far as the computation was concerned since (4, 3) had a secret short route to the destination.

My system solves Taxi using both states and features. It takes about a factor 5 times more episodes to solve it using states than features. That's the power of well designed features. They capture the important information about states! It's similar in some ways to the difference between writing code in assembly language (states) and writing code in a higher level language (features). Features, if done right, have far more useful semantic information than states.

Last edited by rabbott at 21:21 Nov 23, 2018.