Reinforcement Learning: Deep Q-Network (DQN) with Open AI Taxi
In the previous blog post, I learnt to implement the Q-learning algorithm using the Q-table. However, Q-tables are only plausible if there is a low number of states and actions. In this post, we will be implementing Deep Q-Network (DQN). By using neural networks, we are effectively replacing the Q-table with a neural network.
In [1]:
import numpy as np
import gym
import random
The Taxi Problem¶
There are four designated locations in the grid world indicated by R(ed), B(lue), G(reen), and Y(ellow). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drive to the passenger's location, pick up the passenger, drive to the passenger's destination (another one of the four specified locations), and then drop off the passenger. Once the passenger is dropped off, the episode ends. There are 500 discrete states since there are 25 taxi positions, 5 possible locations of the passenger (including the case when the passenger is the taxi), and 4 destination locations. Actions: There are 6 discrete deterministic actions:- 0: move south
- 1: move north
- 2: move east
- 3: move west
- 4: pickup passenger
- 5: dropoff passenger
- blue: passenger
- magenta: destination
- yellow: empty taxi
- green: full taxi
- other letters: locations
In [2]:
ENV_NAME = "Taxi-v2"
env = gym.make(ENV_NAME)
env.render()
In [3]:
print("Number of actions: %d" % env.action_space.n)
print("Number of states: %d" % env.observation_space.n)
In [4]:
action_size = env.action_space.n
state_size = env.observation_space.n
In [5]:
np.random.seed(123)
env.seed(123)
Out[5]:
Keras-RL and gym's discrete environments¶
Keras-RL examples does not use gym's discrete environment as examples. Being the beginner that I am to both Keras-RL and gym, I had to find another source to refer to for discrete environments. Therefore, I found and referred to this example which used gym's Frozen Lake environment, which is a discrete environment in gym, as reference.In [6]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Embedding, Reshape
from keras.optimizers import Adam
What does an Embedding layer do and what are the parameters?¶
Embedding(input_dimensions=500, output_dimensions=6, input_length)¶In Deep Q-Learning, the input to the neural network are possible states of the environment and the output of the neural network is the action to be taken. The input_length for a discrete environment in OpenAi's gym (e.g Taxi, Frozen Lake) is 1 because the output from env.step(env.action_space.sample())[0] (e.g. the state it will be in), is a single number.In [8]:
env.reset()
env.step(env.action_space.sample())[0]
Out[8]:
In the Embedding layer, the input_dimensions refers to the number of states and output_dimensions refers to the vector space we are squishing it to. This means that we have 500 possible states and we want it to be represented by 6 values.
If you do not want to add any dense layers (meaning that you only want a single layer neural network, which is the embedding layer), you will have to set the output_dimensions of the Embedding layer to be the same as the action space of the environment. This means that output_dimensions must be 6 when you are using the Taxi environment because there can only be 6 actions, which are go up, go down, go left, go right, pickup passenger and drop passenger.
In [9]:
model_only_embedding = Sequential()
model_only_embedding.add(Embedding(500, 6, input_length=1))
model_only_embedding.add(Reshape((6,)))
print(model_only_embedding.summary())
If you want to add Dense layers after the Embedding layer, you can choose your own output_dimensions for your Embedding layer (it does not have to follow the action space size), but the final Dense layer must have the same output size as your action space size.
In [11]:
model = Sequential()
model.add(Embedding(500, 10, input_length=1))
model.add(Reshape((10,)))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(action_size, activation='linear'))
print(model.summary())
What does the Reshape layer do?¶
In the reshape layer, we take the output from the previous layer and reshape it to a rank 1 tensor (a one-dimensional array). In this notebook, (6,) means a one dimensional array with 6 values. For example, [1, 2, 3, 4, 5, 6]Parameters when fitting the neural network¶
I tried to set the nb_steps and nb_max_episode_steps to be the same as total_episodes and max_steps in the previous blog post, Q-learning with OpenAI Taxi.I will be training both the neural network with only the Embedding layer (dqn_only_embedding) and the neural network with a few Dense layers (dqn)¶
In [13]:
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
memory = SequentialMemory(limit=50000, window_length=1)
policy = EpsGreedyQPolicy()
dqn_only_embedding = DQNAgent(model=model, nb_actions=action_size, memory=memory, nb_steps_warmup=500, target_model_update=1e-2, policy=policy)
dqn_only_embedding.compile(Adam(lr=1e-3), metrics=['mae'])
dqn_only_embedding.fit(env, nb_steps=1000000, visualize=False, verbose=1, nb_max_episode_steps=99, log_interval=100000)
Out[13]:
In [14]:
dqn_only_embedding.test(env, nb_episodes=5, visualize=True, nb_max_episode_steps=99)
Out[14]:
In [15]:
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
memory = SequentialMemory(limit=50000, window_length=1)
policy = EpsGreedyQPolicy()
dqn = DQNAgent(model=model, nb_actions=action_size, memory=memory, nb_steps_warmup=500, target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=1000000, visualize=False, verbose=1, nb_max_episode_steps=99, log_interval=100000)
Out[15]:
In [16]:
dqn.test(env, nb_episodes=5, visualize=True, nb_max_episode_steps=99)
Out[16]:
In [17]:
dqn.save_weights('dqn_{}_weights.h5f'.format("Taxi-v2"), overwrite=True)