Author | Message |
---|---|
rabbott
Posts: 1649
|
Posted 22:43 Nov 20, 2018 |
In Q-learning (either state-based or feature-based) you often have to take a step and see where it takes you. The gym environments let you do that. Suppose you are working with taxi. To simulate possible actions, make a local copy of the environment and do it there. For example,
# This is a local copy, not the one we are using to run your actual taxi.
env = gym.make('Taxi-v2')
# Some environments are "wrapped" in a timer. Remove it from the wrapper.
env = env.env
# Before you can take any steps, you must reset(). But you only have to do it once.
env.reset()
# Keep this environment and use it repeatedly.
# Set the state to 328. (This was an example state in an online taxi tutorial.)
env.s = 328
# See what it looks like.
env.render()
# Go one step east.
env.step(2)
# See what it looks like now.
env.render()
# Set the environment to a different state.
env.s = 458
# See what it looks like.
env.render()
# Go one step north.
env.step(1)
# See what it looks like now.
env.render()
You can do essentially the same thing with cart-pole. You have to watch out for going out of bounds. Then the environment goes into a "done" state. Reset may help, or you may have to "make" it again.
env = gym.make('CartPole-v0')
env = env.env
env.reset()
print(env.state)
# In cart-pole the internal state variable is state instead of s.
env.state = (0, 0, 0, 0)
env.step(1)
print(env.state)
env.step(0)
print(env.state)
env.state = (-0.1, 0.2, 0.1, -0.03)
print(env.state)
env.step(1)
print(env.state)
env.step(1)
print(env.state)
Last edited by rabbott at
08:03 Nov 21, 2018.
|