Easily access MySQL databases and execute SQL queries in Python
SQL and Python are indispensable tools for data practitioners to work effectively with data.
A common use case would be the initial retrieval of data from relational databases using SQL queries, followed by subsequent manipulation and analysis of the data in Python with libraries such as pandas.
But did you know that these two seemingly isolated tasks can be combined into a single Python script to easily deliver the same outcome?
In this article, we discover the concepts and implementation of PyMySQL for connecting to and interacting with MySQL databases from Python.
PyMySQL is a pure-Python MySQL client library, which means it is a Python package that creates an API interface for us to access MySQL relational databases.
The documentation page states that PyMySQL was built based on PEP 249. It means PyMySQL was developed based on the Python Database API Specification, which was set to promote standardization of Python modules used for database access.
The key reason for using PyMySQL is that it serves as a handy interface to interact directly with MySQL databases by incorporating SQL statements within the confines of Python scripts.
It also means we do not require access to a separate RDBMS interface for running SQL queries needed for our downstream Python operations.
We can install PyMySQL using pip:
pip install PyMySQL
The following version requirements also need to be met:
(ii) Data Setup
For this demo, we generate a 1000-row mock dataset from Mockaroo to represent a customers dataset. Then, the data is imported into a local MySQL database via the Table Data Import Wizard function to form a customers table.
The data I generated can be found here in the project GitHub repo.
If you would like to know more free resources on generating mock data for experiments like this, do check out the following reference:
(iii) Import packages
We then import all the necessary packages for the project:
import pandas as pd
import matplotlib.pyplot as plt
The PyMySQL library is built upon two key Python objects: connection objects and cursor objects.
Before interacting with the MySQL database, we must connect to the server. The first step is to define the configuration parameters.
Here are the default configuration settings we can use to connect to a local database instance. Note that these values may vary depending on the specific settings you created for your schema and database.
These parameters should ideally be stored in a separate .py file and parsed into the main Python script using tools like configparser.
With the configuration parameters defined, we can parse them into a PyMySQL connect method call, which can be wrapped within a Python function.
We then execute the connection by calling the Python function and returning the connection object as a variable:
Let’s say we would like to extract a subset of the data (i.e., customers who drive BMW) from the customers table. We first define a function that does the following:
- Create a cursor object from the connection object set up earlier.
- Parse SQL query into the cursor object and execute the query. For this example, the query is
SELECT * FROM customers WHERE car_make = 'BMW';, which we store as a string inside a Python variable called
- Commit the execution to the connection object for the query to reach the database. It is because the connection object does not automatically commit by default.
- Fetch all the data records resulting from the SQL query that was executed.
- Convert the data records (which are returned as a list of dictionaries) into a pandas DataFrame.
The above steps are wrapped in the Python function (
get_records) shown below:
Running the function returns the following output:
Now that the records from the SQL query are stored as a Python pandas dataframe, we are free to perform a myriad of analysis and manipulation techniques on the data.
For example, we can use pandas and matplotlib to visualize the distribution of how often these customers visit the specific mobile application (based on the column
Besides reading data from MySQL databases, we can also use PyMySQL to perform other CRUD (Create, Read, Update, Delete) functions to manage the database.
For example, we can execute a query that inserts a data record into the customers table. Like before, we create a Python function for this specific operation of data insertion.
Once done, we can view the newly added record using the
get_record Python function we created earlier:
In this article, we learned how we could leverage the capabilities of PyMySQL to interact with MySQL databases using Python.
In particular, we saw the ease with which we can retrieve, manipulate, and analyze the data from MySQL databases within the confines of a single Python script or notebook.
If you would like to see how PyMySQL is used for cloud databases (rather than local databases as shown in this demo), do check out this TowardsDataScience writeup:
I welcome you to join me on a data science learning journey! Follow this Medium page and check out my GitHub to stay in the loop of more exciting practical data science content. Meanwhile, have fun working with PyMySQL!