reset password
Author Message
rabbott
Posts: 1649
Posted 22:43 Nov 20, 2018 |

In Q-learning (either state-based or feature-based) you often have to take a step and see where it takes you. The gym environments let you do that.

Suppose you are working with taxi. To simulate possible actions, make a local copy of the environment and do it there.  For example,

# This is a local copy, not the one we are using to run your actual taxi.
env = gym.make('Taxi-v2')
# Some environments are "wrapped" in a timer. Remove it from the wrapper.
env = env.env
# Before you can take any steps, you must reset(). But you only have to do it once.
env.reset()
# Keep this environment and use it repeatedly.

# Set the state to 328. (This was an example state in an online taxi tutorial.)
env.s = 328
# See what it looks like.
env.render()
# Go one step east.
env.step(2)
# See what it looks like now. 
env.render()
# Set the environment to a different state.
env.s = 458
# See what it looks like. 
env.render()
# Go one step north. 
env.step(1)
# See what it looks like now. 
env.render()

You can do essentially the same thing with cart-pole.

You have to watch out for going out of bounds. Then the environment goes into a "done" state. Reset may help, or you may have to "make" it again.

env = gym.make('CartPole-v0')
env = env.env
env.reset()
print(env.state)
# In cart-pole the internal state variable is state instead of s.
env.state = (0, 0, 0, 0)
env.step(1)
print(env.state)
env.step(0)
print(env.state)
env.state = (-0.1, 0.2, 0.1, -0.03)
print(env.state)
env.step(1)
print(env.state)
env.step(1)
print(env.state)

 

Last edited by rabbott at 08:03 Nov 21, 2018.