Mastering the Art of Python Project Setup: A Step-by-Step Guide
Whether you’re a seasoned developer or just getting started with 🐍 Python, it’s important to know how to build robust and maintainable projects. This tutorial will guide you through the process of setting up a Python project using some of the most popular and effective tools in the industry. You will learn how to use GitHub and GitHub Actions for version control and continuous integration, as well as other tools for testing, documentation, packaging and distribution. The tutorial is inspired by resources such as Hypermodern Python and Best Practices for a new Python project. However, this is not the only way to do things and you might have different preferences or opinions. The tutorial is intended to be beginner-friendly but also cover some advanced topics. In each section, you will automate some tasks and add badges to your project to show your progress and achievements.
The repository for this series can be found at github.com/johschmidt42/python-project-johannes
- OS: Linux, Unix, macOS, Windows (WSL2 with e.g. Ubuntu 20.04 LTS)
- Tools: python3.10, bash, git, tree
- Version Control System (VCS) Host: GitHub
- Continuous Integration (CI) Tool: GitHub Actions
It is expected that you are familiar with the versioning control system (VCS) git. If not, here’s a refresher for you: Introduction to Git
Commits will be based on best practices for git commits & Conventional commits. There is the conventional commit plugin for PyCharm or a VSCode Extension that help you to write commits in this format.
Overview
- Part I (GitHub, IDE, Python environment, configuration, app)
- Part II (Formatting, Linting, Command management, CI)
- Part III (Testing, CI)
- Part IV (Documentation, CI/CD)
- Part V (Versioning & Releases, CI/CD)
- Part VI (Containerisation, Docker, CI/CD)
Structure
- Formatters & linters (isort, black, flake8, mypy)
- Configurations (isort, .flake8, .mypy.ini)
- Command management (Makefile)
- CI (lint.yml)
- Badge (Linting)
- Bonus (Automatic linting in PyCharm, Create requirements.txt with Poetry)
If you’ve ever worked in a team, you know that to achieve code and style consistency, you need to agree on formatters and linters. It will help you with onboarding new members to the codebase, create fewer merge conflicts and generally save time because developers don’t have to care about formatting and style while coding.
If you don’t know the difference between a formatter & linter and/or would like to see them in action, check out this tutorial!
One option for formatting and linting Python code is wemakepyhton, which claims to be the “strictest and most opinionated Python linter ever”. However, I prefer the popular combination of isort and black as formatters, flake8 as linter and mypy as static type checker. mypy adds static typing to Python, which is one of the most exciting features in Python development right now.
We are going to add these tools to our project with Poetry. But since these tools are not part of the application, they should be added as dev-dependencies. With Poetry 1.2.0, we now can use dependency groups:
Poetry provides a way to organize your dependencies by groups. For instance, you might have dependencies that are only needed to test your project or to build the documentation.
When adding the dependencies, we can specify the group the should belong to with --group
.
> poetry add --group lint isort black flake8 mypy
Structuring the dev-dependencies in groups will make more sense later. The main idea is that we can save time and resources in CI pipelines by installing only the dependencies that are required for a specific task, such as linting.
Because isort and black don’t agree on a very few points, we need to enforce that isort uses the profile black.
So we add the configuration in the pyproject.toml
file:
# pyproject.toml
...[tool.isort]
profile = "black"...
flake8 also needs to “use the black profile”. However, flake8 has not (yet) adopted pyproject.toml as the central location for project configuration (see this heated discussion, or use the pyproject-plugin), that’s why we add it in a .flake8 file:
# .flake8[flake8]
max-line-length = 88
extend-ignore = E203
For mypy, we can add the configuration of the tool according to the docs:
# pyproject.toml
...[tool.mypy]
# 3rd party import
ignore_missing_imports = true
# dynamic typing
disallow_any_unimported = true
disallow_any_expr = false
disallow_any_decorated = false
disallow_any_explicit = true
disallow_any_generics = false
disallow_subclassing_any = true
# platform
python_version = "3.10"
# untyped
disallow_untyped_calls = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
disallow_untyped_decorators = true
# None and Optional
no_implicit_optional = true
# Warnings
warn_return_any = false
warn_unreachable = true
# Misc
pretty = true...
Mypy has many settings that you can customize to suit your preferences. I won’t cover all of them here, but I encourage you to read the mypy documentation and learn how to configure the static type checker for your project!
Let’s see our new tools in action:
> isort . --checkSkipped 2 files> black . --checkwould reformat src/example_app/app.pyOh no! 💥 💔 💥
1 file would be reformatted, 1 file would be left unchanged.> flake8 ....> mypy .Success: no issues found in 2 source files
Only one of the tools (black) reported an issue that we can fix. Omitting the --check
flag will run the formatter black for us on our Python files.
> black .
At this point we could think of adding pre-commit hooks that run these linters every time we commit. But using mypy with pre-commit is a little fiddly, so I’ll leave it up to you if you want (and like) pre-commit hooks.
As we add new tools to our project, we also need to remember some commands to use them. These commands can get complicated and hard to remember over time. That’s why it’s useful to have a single file where we can store and name commands for our project. This is where the Makefile comes in. Many devs are unaware that you can use make
in a Python project to automate different parts of developing a project. It is a common tool in the world of software development with languages such as C or C++. It can be used, for example, to run tests, linters, builds etc. It’s an underutilized tool, and by integrating it into your routine, you can save time and avoid errors.
GNU Make controls the generation of executables and other non-source files of a program from the program’s source file.
That way, we don’t need to remember all the commands and their arguments and options. It lets us specify a set of tasks via a common interface and allows us to run several commands sequentially.
# Makefile
format-black:
@black .format-isort:
@isort .lint-black:
@black . --checklint-isort:
@isort . --checklint-flake8:
@flake8 .lint-mypy:
@mypy ./srclint-mypy-report:
@mypy ./src --html-report ./mypy_htmlformat: format-black format-isortlint: lint-black lint-isort lint-flake8 lint-mypy
To do stuff with make, you type make
in a directory that has a file called Makefile. You can also type make -f
to use a different filename. By default, make
prints out the command before it runs it, so that you can see what it’s doing. But there is a UNIX dogma saying that “success should be silent”. So to silent commands in a target, we can start the command with a `@` character. Now we just need to run these two commands in a shell
> make format
> make lint
to run all our formatters and linters on our source code. If you want to know more about the format in a makefile, how to set variables, add pre-requisites and phonies, I highly recommend to read: python-makefie by Aniket Bhattacharyea!
If you want to have a well documented Makefile, check out the bonus part of this part at the bottom!
Now that we have a few more config files and a new Makefile as a task runner, our project should resemble this:
.
├── .flake8
├── LICENSE
├── Makefile
├── README.md
├── poetry.lock
├── pyproject.toml
└── src
└── example_app
├── __init__.py
└── app.py2 directories, 8 files
Working in a team of professional software developers brings a number of challenges. Making sure that nothing is broken and everyone is working on the same formatted code is one of them. For this we use continuous integration (CI), a software development practice that allows members of a team to integrate their work frequently. In our case, so far, new features (feature branches) that modified source files need to pass our linters to preserve style consistency. There are a lot of CI tools such as CircleCI, TravisCI, Jenkins etc., but in the scope of this tutorial we will use GitHub’s CI/CD workflow solution GitHub Actions.
Now that we can run our formatters and linters locally, let’s set up our first workflow that will run on a GitHub server. To do this, we will create a new feature branch called feat/lint-ci and add the file .github/workflows/lint.yml
Let’s break it down to make sure we understand each part. GitHub action workflows must be created in the .github/workflows directory of the repository in the format of .yaml or .yml files. If you’re seeing these for the first time, you can check them out here to better understand them. In the upper part of the file, we give the workflow a name name: Linting
and define on which signals/events, this workflow should be started: on: ...
. Here, we want that it runs when new commits come into a PullRequest targeting the main branch or commits go the main branch directly. The job runs in an ubuntu-latest* (runs-on
) environment and executes the following steps:
- checkout the repository using the branch name that is stored in the default environment variable
${{ github.head_ref }}
. GitHub action: checkout@v3 - install Poetry with pipx because it’s pre-installed on all GitHub runners. If you have a self-hosted runner in e.g. Azure, you’d need to install it yourself or use an existing GitHub action that does it for you.
- Setup the python environment and caching the virtualenv based on the content in the poetry.lock file. GitHub action: setup-python@v4
- Install only the requirements that are needed to run the different linters with
poetry install --only lint
** - Running the linters with the make command:
poetry run make lint
Please note, that running the tools is only possible in the virtualenv, which we can access throughpoetry run
.
*We could also run this in a container (docker) but containerisation will be covered in Part VI
**We used poetry install --only lint
to just install the dependencies in the group lint
. You might wonder: How can we check if these dependencies are enough to run the tools locally? Well, in poetry 1.2.0, the environment depends on both the Python interpreter and the pyproject.toml file. So we would need to delete the existing environment with poetry env remove
or poetry env remove --all
, then create a new clean environment with poetry env use python3
and run poetry install --only lint
. This seems like a hustle, right? agree, but that’s how it works for now. You can read more about this issue in this StackOverFlow Post.
Now that we have our first workflow, how can we see it in action? Or better yet: How can we test it before pushing it to GitHub? There are two ways to do that:
- We can push our changes and see the results on GitHub
- We can use the tool act, which lets us run GitHub actions locally and avoid the trial-and-error approach.
Let’s try the first option and push our changes to our feature branch. When we open a pull request, we can see that the workflow has started running.
And we can also see that it actually failed:
The reason for this error is that we didn’t run this command
> poetry install/home/runner/work/python-project-johannes/python-project-johannes/example_app does not contain any element
before to check if our app was installed correctly in the site-packages directory or if the name or mapping was wrong. We can solve this by making sure that the name
attribute in our pyproject.toml matches the name of our src
directory and also removing the package
attribute for now:
# pyproject.toml[tool.poetry]
name = "example_app"
...
Running the pipeline a second time, we see that … it fails again!
This time, our static type checker mypy reported errors because of unfollowed imports
. We can reproduce this by running the same commands from the workflow locally (only install lint
packages). Turns out that mypy tries to follow the imports in a file but if it can’t (because it was not installed with poetry install —- group lint
), then it will have Any
types! This is described in the mypy documentation. We can solve this by installing our application dependencies AND the lint dependencies with
> poetry install --with lint
This time, we see that it succeeded, Hallelujah!
And to summarise, here’s how our repository tree looks like now:
.
├── .flake8
├── .github
│ └── workflows
│ └── lint.yml
├── LICENSE
├── Makefile
├── README.md
├── poetry.lock
├── pyproject.toml
└── src
└── example_app
├── __init__.py
└── app.py4 directories, 9 files
When we merge our PR to the main branch, the workflow will run again. We can display the status of our CI pipeline on the homepage of our repository by adding a badge to the README.md file.
To get the badge, we need to click on a workflow run (main branch) and copy the lines
The badge markdown can be copied and added to the README.md:
Our landing page of the GitHub now looks like this ❤:
If you want to know how this magically shows the current status of the last pipeline run in main, have a look the commit statuses API on GitHub.