How to create a local dbt project with dummy data for testing purposes with Docker
dbt (data build tool) is one of the hottest technologies in the data engineering and analytics space. Recently, I’ve been working on a task that performs some post-processing over dbt artefacts and wanted to write up some tests. In order to do so, I’ve had to create an example project that could run locally (or in a docker container), so that I wouldn’t have to interact with the actual Data Warehouse.
In this article we will go through a step-by-step process one can follow in order to create a dbt project and connect it with a containerized Postgres instance. You can use such projects either for testing purposes, or even for experimenting with the dbt itself in order to try out features or even practise your skills.
Step 1: Create a dbt project
We will be populating some data in a Postgres database therefore, we first need to install the dbt Postgres adapter from PyPI:
pip install dbt-postgres==1.3.1
Note that the command will also install the
dbt-core package as well as other dependencies that are required for running dbt.
Now let’s go ahead and create a dbt project — to do so, we can initialise a new dbt project by running the
dbt init command in the terminal:
dbt init test_dbt_project
You will then be prompted to select which database you like to use (depending on the adapters you have installed locally, you may see different options):
16:43:08 Running with dbt=1.3.1
Which database would you like to use?
(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
Enter a number: 1
Make sure to enter the number that corresponds to the Postgres adapter, as shown in the output list. Now the
init command should have created the following basic structure in the directory where you’ve executed it:
Step 2: Create a Docker Compose file
Now let’s create a
docker-compose.yml file (place the file at the same level as the
test_dbt_projectdirectory) in which we will be specifying two services — one would correspond to a ready-made Postgres image and the second one to a
dbt image that we will define in a
Dockerfile in the next step:
test: ["CMD-SHELL", "pg_isready -U postgres"]
As you can tell, for the Postgres container, we will be using an image called
frantiseks/postgres-sakila which is publicly available and accessible on Docker Hub. This image, will populate the Sakila Database on the Postgres instance. The database models a DVD rental store and is consisted of multiple tables which are normalised and correspond to entities such as films, actors, customers and payments. In the next few following sections we’ll make use of this data in order to build some example dbt data models.
The second service, called
dbt, will be the one that creates an environment where we will build our data models. Note that we mount the current directory into the docker container. This will let the container have access to any changes we may be doing to the data models without having to re-build the image. Additionally, any metadata generated by dbt commands (such as
manifet.json) will appear instantly on the host machine.
Step 3: Create a Dockerfile
Now let’s specify a
Dockerfile that will be used to build an image on top of which the running container will then build the models specified in our example dbt project.
RUN apt-get update
&& apt-get install -y --no-install-recommends
# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.3.1
# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir profiles && sleep infinity
Note that in the last
CMD command, we intentionally added an extra
&& sleep infinity command such that the container won’t exit after running the steps specified in the
Dockerfile so that we can then access the container and run additional dbt commands (if needed).
Step 4: Create a dbt profile for the Postgres database
Now that we have created the required infrastructure for our host machines in order to create a Postgres database, populate some dummy data as well as creating an image for our dbt environment, let’s focus on the dbt side.
We will first have to create a dbt profile that will be used when interacting with the target Postgres database. Within the
test_dbt_project directory, create another directory called
profiles and then a file called
profiles.yml with the following content:
Step 5: Define some data models
The next step is to create some data models based on the Sakila data populated by the Postgres container. If you are planning to use this project for testing purposes, I would advise to create at least one seed, one model and a snapshot (with tests if possible) so that you have a full coverage of all dbt entities (macros excluding).
I have created some data models, seeds and snapshots already, that you can access them on this repository
Step 6: Run the Docker containers
We now have everything we need in order to spin up the two docker containers we specified in the
docker-compose.yml file earlier, and build the data models defined in our example dbt project.
First, let’s build the images
And now let’s spin up the running containers:
This command should have initialised a Postgres database using the Sakila Database, and created the dbt models specified. For now, let’s make sure you have two running containers:
should give an output that includes one container with name
dbt and another one with name
Step 7: Query the models on Postgres database
In order to access the Postgres container, you’ll first need to infer the container id
And then run
docker exec -it
We will then need to use
psql, a command-line interface that gives us access the postgres instance:
psql -U postgres
If you have used the data models I’ve shared in the previous sections, you can now query each of the models created on Postgres using the queries below.
-- Query seed tables
SELECT * FROM customer_base;
-- Query staging views
SELECT * FROM stg_payment;
-- Query intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;
-- Query mart tables
SELECT * FROM cumulative_revenue;
-- Query snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;
Step 8: Creating additional or modifying existing models
As mentioned already, the
docker-compose.yml files were written in such a way such that the dbt container would still be up and running. Therefore, whenever you modify or create data models, you can still use that container to re-build seeds, models, snapshots and/or tests.
To do so, first infer the container id of the
Then enter the running container by running
docker exec -it
And finally run any dbt command you wish, depending on the modifications you’ve made to the example dbt project. Here’s a quick reference of the most commonly used commands for these purposes:
# Install dbt deps
# Build seeds
dbt seeds --profiles-dir profiles
# Build data models
dbt run --profiles-dir profiles
# Build snapshots
dbt snapshot --profiles-dir profiles
# Run tests
dbt test --profiles-dir profiles
How to get the full code of this tutorial
I’ve created a GitHub repository called
dbt-dummy that contains all the pieces you need in order to quickly create a containerized dbt project that uses Postgres. You can access it in the link below.
This project is also available in the example projects section of the official dbt documentation!
In today’s tutorial we went through a step by step process for creating a dbt project on a local machine using Docker. We’ve built two images, one for the Postgres database that also populates the Sakila database, and another one for our dbt environment.
It’s important to be able to quickly build some example projects with data build tool that can then be used as a testing environment or even a playground for experimenting and learning.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
Related articles you may also like