Benchmarking the new and impressive Python 3.11
Python is one of the most used scripting languages in data science (DS) and machine learning (ML). According to ‘PopularitY of Programming Languages’ Python is the most searched language on Google. Next to being a great glue language to connect various DS/ML solutions together, it has many libraries to virtually do anything with data.
In about a month we get a fresh new yearly release of Python: version 3.11. I am quite excited about this new version as the main feature for this version is a significant increase in speed.
On LinkedIn I have already seen a couple of post of people testing the new version and their results were stunning. But the best method to get a feeling on how fast Python 3.11 truly is, is to run the tests yourself.
In this post I will share my step-by-step analysis of Python 3.11. All code is available on my github page.
Benchmarking a programming language is not trivial at all. When you read x is faster than y, you should always take the result with a grain of salt. One implementation of an algorithm can be better of x while another is better on y. For our benchmark it is a bit simpler as we are testing Python against Python, but we might have selected elements from the language that are only marginally effected. With this in mind, I want to present the algorithm I used to benchmark: the estimation of Pi using a Monte Carlo method.
The idea of this algorithm is simple but the first time a saw it during some mathematics course at the University it blew my mind. We have a square of size 2r and in this square we fit a circle of radius r. Now we take a random number generator that generates numbers on a plane: <-r, r>, <-r, r>. The ratio between the points that are on the circle and the points on the square (read: all points) is an approximation of the ratio of area, which we can use to approximate Pi. This is a bit more clear in the equation:
In Python, I have split up the actual estimation from a testing script such that I can repeat the test and take the average. Not shown here, but I have also parametrized the script using Argparse, a standard library to parse arguments from the command line interface (CLI). The Python code looks like this:
This script is ready to run, however we want to use it to test various versions of Python, not only the currently installed (or activated) version. The easiest way to test multiple versions of Python is to use Docker. Python maintains many docker images. Naturally all supported version, but also some versions that are end-of-life (EOL) such as 2.7 or 3.2. It also has images for release candidates such as version 3.11. To use Docker, you need to have it installed. In Linux and Mac it is relatively easy, in Windows I am not so sure but probably not difficult as well. I would advice to install only the docker CLI, the desktop is too much bloat for me. To run a local script in a containerized Python environment run:
docker run -it --rm
To automate the tests for the various version, we will of course also use Python. This script will simply start a subprocess to start a container with the particular Python version and collects the results afterwards. Nothing special:
When running these tests, the absolute number differs from machine to machine, depending on the processor (it is CPU heavy). Here are the results for the last 7 major Python versions:
The new Python 3.11 took 6.4605 seconds per run.Python 3.5 took 11.3014 seconds.(Python 3.11 is 74.9% faster)
Python 3.6 took 11.4332 seconds.(Python 3.11 is 77.0% faster)
Python 3.7 took 10.7465 seconds.(Python 3.11 is 66.3% faster)
Python 3.8 took 10.6904 seconds.(Python 3.11 is 65.5% faster)
Python 3.9 took 10.9537 seconds.(Python 3.11 is 69.5% faster)
Python 3.10 took 8.8467 seconds.(Python 3.11 is 36.9% faster)
The benchmark took on average 6.46 seconds for Python 3.11. Comparing this to the previous version (3.10), this is almost 37% faster. Pretty impressive! Approximately the same difference is between version 3.9 and 3.10, making 3.11 almost 70% faster! I have plotted all times in figure 2.
When talking about speed, we always have that one guy saying: if you want speed why not use C.
C is much faster that Python! — that one guy
While my C is a but rusty, I thought I give it a try anyway. I used GNU C++ as it comes with a nice time measurement library (chrono). Find the code below:
As we all know, C++ is a compiled language and therefore, we need to compile the source before we can use it. When you have the typical
build-essentials installed, you can type:
g++ -o pi_estimate pi_estimate.c
After the compilation, simply run the build executable. The output should be like this:
Pi is approximately 3.14227 and took 0.25728 seconds to calculate.
Pi is approximately 3.14164 and took 0.25558 seconds to calculate.
Pi is approximately 3.1423 and took 0.25740 seconds to calculate.
Pi is approximately 3.14108 and took 0.25737 seconds to calculate.
Pi is approximately 3.14261 and took 0.25664 seconds to calculate.Each loop took on average 0.25685 seconds to calculate.
And we have to agree with that one guy as it is really (read: REALLY) fast. It took only 0.257 seconds to do the same loop we programmed in Python before. Lets add this as a line in our previous plot, shown in figure 3.
Now, after appreciating the previous figure for a bit longer, we clearly see the momentum Python has gained. Since version 3.9, Python has increased in speed about 35%. The Python developers mentioned that the next couple of versions will have a significant speed increase, therefore, we could assume that this pace will be kept (yup, super safe assumption).
Now the question is, with this momentum fixed, when would Python surpass the time of C++. For this we can of course use extrapolation to predict the loop times of the next Python versions to come. These can be seen in figure 4.
The result is really stunning! Keeping at this pace, Python 3.14 will be faster than C++. To be exact, the loop time will be -0.232 seconds, so it will be done just before you want to do the calculation. There appears to be a hole in time-space continuum but these calculations are rock solid. Therefore, I think we might have to question the work of Einstein and friends.
While these benchmarks for Python 3.5 .. Python 3.11 are valid, the extrapolation is of course meant as a joke. The XKCD style figures are meant as an additional reminder to that ;-).
If you want to run these or your own tests on the various Python version, download the code on my Github page.
Please let me know if you have any comments! Feel free to connect on LinkedIn.