6. Python Package: Distribution

To illustrate the process we continue using the company_package that we developed in part 4.

In principle, the code of your package should be developed with git and uploaded on GitHub (or GitLab, or Bitbucket, etc.).

If you want to keep your code private you can of course do so, while developing it and sharing it with your chosen colleagues by inviting them to your private repository.

Here, we will assume your are ready to share your code with the world and that you want it to be usable by users on Mac OS, Linux, and Windows machines.

6.1. Source and wheels

When we used

pip install -e .

inside our company_package folder, we created a sub-folder called company.egg-info, which contains metadata about the package.

Exercise: Look inside the company.egg-info folder and check-out the content of the files there.

More importantly, we also automatically created a new folder called dist-info inside our environment site-packages folder, which contains the same sort of information.

This folder will typically be at

<venvdir>/<name-of-env>/lib/python3.X/site-packages/company-<version>.dist-info

Exercise: Look inside the dist-info folder of your site-packages folder and check-out the content of the files there.

One of the files is direct_url.json. In our case it shows:

{"dir_info": {"editable": true}, "url": "file:///path/to/company_package"}

which means we have installed the package in editable mode and the location of the source code is given by the url field.

However, we have not built the package yet, i.e., we have not created a binary distribution of the code.

Let’s do it. To do so, in a Terminal we run:

python -m build

from inside the package folder. Executing the command may throw out an error like:

No module named build.__main__; 'build' is a package and cannot be directly executed

If so, you simply need to degrade your build version by running the following command:

pip install 'build<0.10.0'

At the end of the process, a dist folder should be created and we should see something like:

Successfully built company-0.0.0b1.dev6+gfa37a84.d20241029.tar.gz and company-0.0.0b1.dev6+gfa37a84.d20241029-py3-none-any.whl

These two files, stored in the dist folder, are the source distribution and the wheel distribution of the package.

The wheel distribution is a binary distribution of the package. (In fact, for a pure Python package, it amounts to an archive of the package efficiently organized. For a package with compiled extensions, it also contains the compiled files.)

Installing from the wheel can be much faster than from the source. To do so, we can run:

pip install <package-name>-<version>-<py-version>-<platform>.whl

(However, note that this is not allowing you to install the package in editable mode)

To see what’s inside the wheel, we can extract it using:

unzip <package-name>-<version>-<py-version>-<platform>.whl -d <where-to-extract>

Exercise: Create, install and extract the wheel distribution of the company package and inspect its content.

6.2. Docker Images and Containers

Here we notice that our wheel file says py3-none-any. This means that it is compatible with Python 3 (any version), and will work on any platform (macOS, Linux, Windows, etc.) and any architecture (x86, arm, etc.).

This is generally the case for pure Python packages.

For more complex packages that involve compiled extensions, we will see how to build wheels for multiple platforms and architectures using cibuildwheel.

cibuildwheel is a tool that relies on Docker.

With Docker we can test how our package installs and behaves on different platforms.

We will cover cibuildwheel in more details later in the course and focus on Docker for now.

Docker can be used in CLI but also through a graphical interface known as Docker Desktop.

You are encouraged to install it.

To put things simply, with Docker we create a sort of virtual Linux machine on our machine. This virtual machine has its own operating system and can be seen as completely isolated from the rest of our local machine.

The key step to set-up Docker is to create a so-called Dockerfile. It is a script detailing the setup of the environment, including dependencies, for the package.

For our company_package, a valid Dockerfile could be:

# Use an official Python image as the base image
FROM python:3.12-slim

# Install Git
RUN apt-get update && apt-get install -y git


# Set the working directory in the container
WORKDIR /app

# Copy the project files to the working directory
COPY . /app

# Install required dependencies for building the package
RUN pip install --upgrade pip setuptools wheel setuptools_scm build

# Install runtime dependencies listed in pyproject.toml
RUN pip install .

# Build the package
RUN python -m build

(See here.)

The next step is to build the Docker image.

First, we check that Docker is installed and active on our machine. To do so, we can run:

docker info

If Docker is not active. The easiest way to start it is to open Docker Desktop. Then the command above should work (and print info about the Docker version on your machine).

Exercise: Install Docker on your machine and try the docker info command.

With docker active, we build the Docker image by running:

docker build -t <name-of-image> .

In Docker desktop we should be able to see the image being built.

To list the images on our machine we can run:

docker images

Finally, we can run the Docker image in a container by running:

docker run -it <name-of-image>

For our company_package, this generates a container that has its own Python environment and can be used to test the package in this isolated environment. The it option means interactive: the command will open a Python shell in the container (and the container will stop when you exit the Python shell). The output looks like:

docker run -it company-image
Python 3.11.10 (main, Oct 19 2024, 03:39:30) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import company as cp
Company package version: 0.0.0b1.dev7+g74c5191.d20241029
>>>

Now, you understand that you can use Docker to distribute your package in a very robust way. Indeed, by providing a Dockerfile, other users can readily test and use your package on their own machines by running:

docker build -t <name-of-image> .
docker run -it <name-of-image>

With this method, users do not need to worry at all about dependencies of your package, python version, etc. Everything is specified in the Dockerfile.

This covers the essential aspects of Docker and hopefully conveys the idea that it is a very powerful tool for software development. It is worth noting that the Docker (and Remote - Containers) extensions of VSCode allow you to develop and debug inside the container.

Exercise: Use the VSCode extensions to set-up and run a container based on the company_package Docker image.