View Forum Topic

Forums
CS4660
Taking a hypothetical step

Author	Message
rabbott Posts: 1649	Posted 22:43 Nov 20, 2018 \| In Q-learning (either state-based or feature-based) you often have to take a step and see where it takes you. The gym environments let you do that. Suppose you are working with taxi. To simulate possible actions, make a local copy of the environment and do it there. For example, # This is a local copy, not the one we are using to run your actual taxi. env = gym.make('Taxi-v2') # Some environments are "wrapped" in a timer. Remove it from the wrapper. env = env.env # Before you can take any steps, you must reset(). But you only have to do it once. env.reset() # Keep this environment and use it repeatedly. # Set the state to 328. (This was an example state in an online taxi tutorial.) env.s = 328 # See what it looks like. env.render() # Go one step east. env.step(2) # See what it looks like now. env.render() # Set the environment to a different state. env.s = 458 # See what it looks like. env.render() # Go one step north. env.step(1) # See what it looks like now. env.render() You can do essentially the same thing with cart-pole. You have to watch out for going out of bounds. Then the environment goes into a "done" state. Reset may help, or you may have to "make" it again. env = gym.make('CartPole-v0') env = env.env env.reset() print(env.state) # In cart-pole the internal state variable is state instead of s. env.state = (0, 0, 0, 0) env.step(1) print(env.state) env.step(0) print(env.state) env.state = (-0.1, 0.2, 0.1, -0.03) print(env.state) env.step(1) print(env.state) env.step(1) print(env.state) Last edited by rabbott at 08:03 Nov 21, 2018.

Author

Message

rabbott

Posts: 1649

Posted 22:43 Nov 20, 2018 |

In Q-learning (either state-based or feature-based) you often have to take a step and see where it takes you. The gym environments let you do that.

Suppose you are working with taxi. To simulate possible actions, make a local copy of the environment and do it there. For example,

# This is a local copy, not the one we are using to run your actual taxi.
env = gym.make('Taxi-v2')
# Some environments are "wrapped" in a timer. Remove it from the wrapper.
env = env.env
# Before you can take any steps, you must reset(). But you only have to do it once.
env.reset()
# Keep this environment and use it repeatedly.

# Set the state to 328. (This was an example state in an online taxi tutorial.)
env.s = 328
# See what it looks like.
env.render()
# Go one step east.
env.step(2)
# See what it looks like now. 
env.render()
# Set the environment to a different state.
env.s = 458
# See what it looks like. 
env.render()
# Go one step north. 
env.step(1)
# See what it looks like now. 
env.render()

You can do essentially the same thing with cart-pole.

You have to watch out for going out of bounds. Then the environment goes into a "done" state. Reset may help, or you may have to "make" it again.

env = gym.make('CartPole-v0')
env = env.env
env.reset()
print(env.state)
# In cart-pole the internal state variable is state instead of s.
env.state = (0, 0, 0, 0)
env.step(1)
print(env.state)
env.step(0)
print(env.state)
env.state = (-0.1, 0.2, 0.1, -0.03)
print(env.state)
env.step(1)
print(env.state)
env.step(1)
print(env.state)

Last edited by rabbott at 08:03 Nov 21, 2018.