Diving into the Unshashable Type error
A common issue all Python developers face at some point (or in reality, many times) is a
TypeError with the magic words unhashable type. We can all agree that it does not sound good and can get close to scary without the proper context.
Python is a language heavily built on top of dictionaries. For example, that is how namespaces or classes store their functions and attributes. Understanding how dictionaries work beyond Python’s API can help us make sense of this data structure and avoid common errors.
This post will dive into mapping types, what they do internally, and how this knowledge helps us understand more about the unhashable exception.
Data access can be done in two ways:
- Full Scan, where the process needs to go through all the elements of the data structure. In Python, an example would be a
List. While we could access an item by index, there is no way to identify a specific element within the container directly, plus the indexes could change over time.
- Key Lookup, where the process knows how to identify and retrieve specific items directly. A mapping is a data structure that supports this data access strategy. A
dictwould be one of Python’s implementations of a mapping type.
The main difference between both approaches is that for mappings, the data stored in the container needs to be provided with a key, which will be used to identify an item, and a value, which is the element containing the data we want to retrieve.
For the Key Lookup to be consistent, there are a couple of properties that need to be satisfied (if we do not modify the container):
- A value should only be accessible by one key. Unique values avoid collisions and help ensure that no data gets overridden by a distinct element.
- A key should always return the same value for the whole lifetime of a key-value pair.
Mapping hashable keys to the values stored in the container ensures that the properties above hold. When we add a new pair to a dictionary, the key itself won’t be used to store the value but rather the result of applying a hash function to it.
Following the diagram, hash functions help us to uniquely obtain the required values based on a key for the whole lifetime of the key. Let’s imagine for a moment what would happen if hash functions did not hold these properties:
- We start with two keys
k2that produce the same hash
k1we could either retrieve
v2, ending up with data inconsistency.
- Inserting a new pair
k3is hashed as
H0, means that we lose the contents of
v0as we are overwriting it with
Without proper guarantees, we get unexpected results and can even lose data.
However, hash functions are not the only ones involved in the Key Lookups. The actual Key objects are part of the equation as well. Therefore, to ensure the two main lookup properties are satisfied, we need to introduce an important topic: mutability.
We say that an object is immutable if its state cannot be altered after it is created and mutable otherwise.
Even if hash functions are well-defined, mutable objects that hold multiple states during their lifetime will yield different results if we apply the hash function to them. Then, we end up in the mapping mess showcased above.
For the sake of discussion, let’s suppose that
length is a proper hash function. If we wanted to use a
list as a key, we’d be applying
len(list) to access the container. Updating the state of the list by adding or removing elements (mutating the list) would mean that we could not correctly store and get data from a mapping.
The example above showcases why Python only allows immutable objects to become mapping keys. Strings or integers are immutable, so developers can safely create
dicts with them:
But trying this with types such as
set, raises a
TypeError: unhashable type. Why? Because those two are mutable. We can add and remove elements as we wish, so the hash function results will vary.
The Unshashable Type error tells us that we are trying to create a dictionary with a value that is not fit to become a proper Key, as we cannot ensure that the hash function will always return the same output for the same object.
int are the typical dictionary keys; we might need other data types at some point. Luckily, Python brings some collections to the table that are immutable:
tuple: While Lists are a generic ordered collection, Tuples usually bring meaning and structure to their elements. In addition, tuples are immutable, and as they cannot be updated, each position holds a specific piece of information (recommended source).
frozenset: Both structures share most of the C implementation but
frozensets are immutable.
frozenmap: This has not yet been implemented, but there’s a PEP-603 open for consideration to add a persistent data structure for mappings.
Sometimes, it is both valuable and exciting to jump to the root of a topic to understand why Python works the way it does.
In this post, we have reviewed:
- How mapping data types work.
- How hash functions and immutability ensure consistency and safety when storing and accessing data.
- Different mutable and immutable Python types.