How to convert infinite variables to a discrete space using tile coding and Python
This article is a continuation of the Reinforcement Learning series. To recap, visit these articles:
The last article about Q-Learning explored the concept of assigning a number to a state action pair:
The states used were states that can be listed and written into a table. For example, we indexed all the available positions in a maze that our agent can be in. Even in a huge maze (imagine a million times million grid) we can still assign a unique index to each state and straightforwardly use the states when filling out the Q-table.
Often in practice, the states that our agent is in cannot be uniquely indexed and fit into a table. For example, imagine that the state is the angle of the wheel which can be turned exactly one time and it can take ANY value in the range [-360, 360] degrees. The wheel can be turned at exactly 12.155…, 152.1568… and so on degrees. We cannot index all the unique degrees and create a table — the amount of possibilities is infinite.
Nevertheless, we would still like to use all the algorithms that RL has to offer. Thus, the first step is to create a discrete feature space from the features that have infinite possibilities.
One of the popular techniques to discretize a continuous feature space is the so-called tile coding algorithm.
The definition of tile coding is as follows¹:
Tile coding is a method for representing a continuous state space by dividing the state space into a number of overlapping regions, called tiles, and then representing the state by the set of tiles that it falls into.
We can represent a simple 1 feature discretization with the following code and graph:
# Creating an example 1D feature that goes from 0 to 1
x = np.linspace(0, 1, 100)# Defining the number of tilings
n_tilings = 4
# Defining the offset
offset = 0.05
# Defining the number of tiles in a tiling
n_tiles = 10
# Creating a list of tilings
tilings = []
cur_tiling = 0
for i in range(n_tilings):
# Creating a tiling by adding the offset to the feature
tiling = x + cur_tiling * offset
# Appending the tiling to the list
tilings.append(tiling)
# Incrementing the tiling
cur_tiling += 1
# Ploting the x feature and the tilings
# The x feature is plotted a horizontal line
# The tilings are plotted as horizontal lines, each moved up by 0.1
vertical_offset = 0.1
plt.figure(figsize=(10, 5))
plt.plot(x, np.zeros_like(x), color='black')
for i, tiling in enumerate(tilings):
plt.plot(tiling, np.zeros_like(x) + vertical_offset + vertical_offset * i, color='red')
# Adding vertical ticks on the tiling lines
for j in range(n_tiles):
plt.plot(
[j / n_tiles + offset * i, j / n_tiles + offset * i],
[vertical_offset + vertical_offset * i - 0.01, vertical_offset + vertical_offset * i + 0.01],
color='black'
)
plt.xlabel('Feature values')
plt.ylabel('Tilings')
# Drawing a vertical line at x = 0.46
plt.plot([0.46, 0.46], [0, vertical_offset * n_tilings + 0.1], color='blue', linestyle='dashed')
plt.show()
To understand tile coding, we need to perfectly understand what is going on in the above graph.
The bottom-most horizontal line is the feature x which can obtain any value in the range [0, 1].
Each red line is a tiling which is used in discretizing the feature x.
Each tiling is divided into tiles, which are evenly spaced out.
The blue dashed line is a random value taken from the x range. The question is, how do we use the 4 tilings and 8 tiles to create a discrete state out of the x feature value?
The algorithm is as follows:
Given a value s from a continues x feature:
For each tiling:
- Initialize a vector of size equal to the number of tiles. Fill it with 0.
- Calculate in what tile the s value falls. Save that index i.
- Fill in the vector coordinate i with value 1.
Finally, stack all the vectors into one vector.
Let us work out the example presented in the graph. For the first tiling, directly above the feature space x, the blue value falls into the 5th tiling space. Thus, the feature vector of the first tiling is:
[0, 0, 0, 0, 1, 0, 0, 0]
For the second tiling, we repeat the same process and end up with the vector:
[0, 0, 0, 0, 1, 0, 0, 0]
The third and fourth tilings vectors:
[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0]
The final discrete vector, representing the blue dashed “state” is
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
Let’s do one more example with an x value of 0.44 to understand the process fully.
Each tiling vector (starting from the bottom):
[0, 0, 0, 0, 1, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 1, 0, 0, 0, 0, 0]
Final state vector:
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
The final state representing the vector will be of length N tilings * N tiles.
The process of assigning a vector to a state that is presented by 2 features follows a very similar algorithm. The tilings now are not horizontal lines but rectangles.
Let us imagine that our state is comprised of continuous x and y variables, each ranging from 0 to 1.
We will divide the whole feature space with 2 tilings, each comprised of 4 tiles:
The grey zone in the above graph represents the original feature space. Each red tiling is divided into 4 tiles. We want to create a vector representing the state for the blue point (0.44, 0.44).
The algorithm is the same as in the 1D case, but now we assign the index for the point falling into a tile going from left to right, going from the top left:
Thus, for the first and second tiling, the blue point will fall into the 3rd tile and the resulting state vectors would be:
[0, 0, 1, 0]
[0, 0, 1, 0]
And the final vector would be:
[0, 0, 1, 0, 0, 0, 1, 0]
Taking another point:
The vectors would be:
[1, 0, 0, 0]
[0, 0, 1, 0]
With the final vector being:
[0, 0, 1, 0, 0, 0, 1, 0]