8. Automatic documentation

This course itself is built using Sphinx using a theme provided by Read the Docs.

8.1. Docstrings

Docstrings are special strings used to document classes, functions, and modules. They are enclosed in triple quotes (""") and provide details on what the code does, its parameters, and its return values.

Here’s how to write docstrings for a class and a function, using the Fourier Transform as an example.

8.1.1. Docstring for a Function

Use a docstring to explain:

  • What the function does.

  • Parameters: Their types and meanings.

  • Returns: What the function outputs.

[1]:
import numpy as np

def compute_fourier_transform(signal, sample_rate):
    """
    Computes the Fourier Transform of a signal.

    Parameters:
        signal (numpy.ndarray): The input signal in the time domain.
        sample_rate (float): The sampling rate of the signal in Hz.

    Returns:
        tuple: A tuple containing:
            - freqs (numpy.ndarray): Frequencies corresponding to the FFT components.
            - fft_values (numpy.ndarray): The Fourier Transform (complex values).
    """
    n = len(signal)
    fft_values = np.fft.fft(signal)
    freqs = np.fft.fftfreq(n, d=1/sample_rate)
    return freqs, fft_values

# Example usage
signal = np.array([1, 0, -1, 0])  # A simple waveform
sample_rate = 1.0
freqs, fft_vals = compute_fourier_transform(signal, sample_rate)

8.1.2. Docstring for a Class

For a class, document:

  • Purpose of the class.

  • Attributes: Variables it maintains.

  • Methods: Brief descriptions of key methods.

[2]:
class FourierTransformer:
    """
    A class to perform Fourier Transform operations on signals.

    Attributes:
        signal (numpy.ndarray): The input signal in the time domain.
        sample_rate (float): The sampling rate of the signal in Hz.
    """

    def __init__(self, signal, sample_rate):
        """
        Initializes the FourierTransformer with a signal and its sample rate.

        Parameters:
            signal (numpy.ndarray): The input signal.
            sample_rate (float): The sampling rate in Hz.
        """
        self.signal = signal
        self.sample_rate = sample_rate

    def compute_transform(self):
        """
        Computes the Fourier Transform of the stored signal.

        Returns:
            tuple: A tuple containing:
                - freqs (numpy.ndarray): Frequencies corresponding to the FFT components.
                - fft_values (numpy.ndarray): The Fourier Transform (complex values).
        """
        n = len(self.signal)
        fft_values = np.fft.fft(self.signal)
        freqs = np.fft.fftfreq(n, d=1/self.sample_rate)
        return freqs, fft_values

# Example usage
signal = np.array([1, 0, -1, 0])
sample_rate = 1.0
transformer = FourierTransformer(signal, sample_rate)
freqs, fft_vals = transformer.compute_transform()

8.1.3. Best Practices for Docstrings

  1. Use triple quotes (""") for docstrings.

  2. Clearly state the purpose of the function or class.

  3. Describe:

    • Parameters (name, type, and purpose).

    • Return values (type and description).

  4. Use concise, yet descriptive language.

  5. For larger projects, consider adding examples inside the docstring. See below.

[4]:
import numpy as np

def compute_fourier_transform(signal, sample_rate):
    """
    Computes the Fourier Transform of a signal.

    Parameters:
        signal (numpy.ndarray): The input signal in the time domain.
        sample_rate (float): The sampling rate of the signal in Hz.

    Returns:
        tuple: A tuple containing:
            - freqs (numpy.ndarray): Frequencies corresponding to the FFT components.
            - fft_values (numpy.ndarray): The Fourier Transform (complex values).

    Example:
        >>> import numpy as np
        >>> signal = np.array([1, 0, -1, 0])  # A simple waveform
        >>> sample_rate = 1.0
        >>> freqs, fft_values = compute_fourier_transform(signal, sample_rate)
        >>> print(freqs)
        [ 0.   0.25 -0.5  -0.25]
        >>> print(fft_values)
        [ 0.+0.j  2.+0.j  0.+0.j -2.+0.j]
    """
    n = len(signal)
    fft_values = np.fft.fft(signal)
    freqs = np.fft.fftfreq(n, d=1/sample_rate)
    return freqs, fft_values

8.1.4. Why Use Docstrings?

  • They serve as in-line documentation, making code easier to understand.

  • Tools like help() or IDEs display docstrings, making them highly useful for users of your code:

[5]:
help(compute_fourier_transform)
help(FourierTransformer)

Help on function compute_fourier_transform in module __main__:

compute_fourier_transform(signal, sample_rate)
    Computes the Fourier Transform of a signal.

    Parameters:
        signal (numpy.ndarray): The input signal in the time domain.
        sample_rate (float): The sampling rate of the signal in Hz.

    Returns:
        tuple: A tuple containing:
            - freqs (numpy.ndarray): Frequencies corresponding to the FFT components.
            - fft_values (numpy.ndarray): The Fourier Transform (complex values).

    Example:
        >>> import numpy as np
        >>> signal = np.array([1, 0, -1, 0])  # A simple waveform
        >>> sample_rate = 1.0
        >>> freqs, fft_values = compute_fourier_transform(signal, sample_rate)
        >>> print(freqs)
        [ 0.   0.25 -0.5  -0.25]
        >>> print(fft_values)
        [ 0.+0.j  2.+0.j  0.+0.j -2.+0.j]

Help on class FourierTransformer in module __main__:

class FourierTransformer(builtins.object)
 |  FourierTransformer(signal, sample_rate)
 |
 |  A class to perform Fourier Transform operations on signals.
 |
 |  Attributes:
 |      signal (numpy.ndarray): The input signal in the time domain.
 |      sample_rate (float): The sampling rate of the signal in Hz.
 |
 |  Methods defined here:
 |
 |  __init__(self, signal, sample_rate)
 |      Initializes the FourierTransformer with a signal and its sample rate.
 |
 |      Parameters:
 |          signal (numpy.ndarray): The input signal.
 |          sample_rate (float): The sampling rate in Hz.
 |
 |  compute_transform(self)
 |      Computes the Fourier Transform of the stored signal.
 |
 |      Returns:
 |          tuple: A tuple containing:
 |              - freqs (numpy.ndarray): Frequencies corresponding to the FFT components.
 |              - fft_values (numpy.ndarray): The Fourier Transform (complex values).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

8.2. What is automatic documentation?

It is documentation generated automatically out of docstrings in your package code, as well as other metadata and documentation in your project.

Docstrings are comments typically placed under your function/class definitions.

For instance in the base_company.py file (see here):

def summarize_activity(self, *args, **kwargs):
    """
    Summarizes company activities and additional information.

    Parameters:
    - *args: A list of activities related to the company.
    - **kwargs: Additional information, like location or date.
    """

Every function/class in your package should have a docstring, and some functions can have more details, for instance including example on how to use the function.

For instance in the medical.py file (see here):

def drug_approval_summary(self, dataset_path=None):
    """
    Prints a summary of drug approval attempts for the company's drugs.

    Parameters:
    - dataset_path (str): Path to the dataset with drug approval data.
      If not provided, uses the default package data file.

    Example:
        >>> medical_company = MedicalCompany("Pfizer", "Pharmaceuticals", drug_manufacturer=True)
        >>> medical_company.drug_approval_summary()

        Drug Approval Summary for Pfizer:
         - Drug A: 2 failed attempt(s) before approval
         - Drug B: 0 failed attempt(s) before approval
         - Drug C: 1 failed attempt(s) before approval
    """

The automatic documentation is then generated from these docstrings.

8.3. Sphinx

Sphinx is a tool primarily used to create comprehensive and structured documentation for Python projects. It takes reStructuredText (reST) or Markdown files and generates clean, organized documentation in HTML, PDF, etc.

It is very simple to set-up. You can automatically create a template version by running sphinx-quickstart or do so manually by adding a docs folder in the root folder of your project with a couple of configuration files.

The main files are:

  • conf.py: it contains the configuration of the documentation and some metadata about the project, like the title, author, etc.

  • index.rst: it is the main page of the documentation. It can be written in reStructuredText or Markdown. We prefer reStructuredText (.rst) as it is more flexible.

  • Makefile: it contains the commands to build the documentation. You probably won’t need to edit it.

The best way for you to learn how this works is to look at examples. A simple and minimal example is provided in our company_package repository here.

A nice feature is that we can add notebooks and make them part of the documentation. This is how the course website you are reading now is generated. See the docs folder and in particular the index.rst file for more details on how these notebooks are linked.

The course website presents further optional usage, for instance setting up a specific style for exercise boxes in the file style.css.

Now, you understand that this is more or less limitless: you can design your documentation just like you would do for a website.

Starting from templates that you like is probably a good idea rather than starting from scratch.

For instance, my starting point for the template of this course website was Phillip Lippe’s tutorials here.

8.4. Building your documentation

Building the documentation can be done locally, in your terminal with:

cd docs
make clean
make html

You can then open the generated index.html file in the _build/html folder automatically generated under docs in your favorite browser to view the documentation and how it lays out.

8.5. Publishing your documentation

To publish your documentation, we recommend using Read the Docs.

To set it up, you need to create an account on Read the Docs and then link your repository. It is easy.

You need to add a .readthedocs.yml file in the root of your repository. This hidden file contains the configuration for Read the Docs to know how to build your documentation. You can look at the .readthedocs.yml file of this course.

The nice thing about Read the Docs is that it will automatically build your documentation on every commit in your repository. This is the first example of continuous integration we see in this course. To achieve this, we have setup a webhook in our github repository. You can read about how to do this here.

Note: for the continuous integration part, we needed to add a requirements.txt file in the root of our repository. This file contains the dependencies that Read the Docs will install in a fresh environment before building your documentation, as can be understood from looking at the .readthedocs.yml file, which contains the following:

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
  install:
  - requirements: docs/requirements.txt

Exercise: Set-up automatic documentation for a simple package of your choice and publish it on Read the Docs using continuous integration. Of course, once you are done you can always delete it if you don’t want it to be public!

8.6. Alternatives

A popular alternative to Sphinx is MkDocs.

A alternative to ReadTheDocs is Material for MkDocs.

It is also possible to interface documentation with LLM tools. A popular solution is DeepWiki.