4. Building your Python package¶
Building your own Python package is easy.
You need to:
make a folder where you will store your package
create the relevant configuration files
write the code you want your package to contain
pipinstall it
We go over these steps here, and then you should practice.
4.1. What is a Python package?¶
A Python package is an ensemble of functions that serve a specific purpose.
The library of functions is often split into multiple files, containing subset of functions, the split follows common sense. Each subset/file is often referred to as a module.
An example of a well-maintained package is getdist.
Exercise: Click on getdist and look inside the repository. Identify the configuration files and modules.
4.2. The folder structure of a Python package¶
4.2.1. Minimal package structure¶
The structure is simple. A minimal package file structure would look like this:
.
├── pyproject.toml # Configuration
├── README.md # Instructions
├── package_name/ # Package folder with codes
│ ├── __init__.py
│ ├── module1.py
│ ├── module2.py
│ └── module3.py
├── dist/ # Distribution files
├── docs/ # Documentation files
└── tests/ # Test files
We will cover dist test and docs later. And you can ignore this for now and focus on the rest.
Exercise: Create a folder called my_package and create the minimal file structure above with a simple module1.py file. The module1.py file should contain a simple function print_name that prints “Hello, <your name>!”.
So, you should be able to run:
pip install -e .
from inside the package folder to install it.
Then in a python session you should be able to run:
import package_name as mypkg
mypkg.print_name()
and it should print “Hello, <your name>!”.
Tip: Use Google and ChatGPT to help you.
4.2.2. Full package structure¶
As serious package developers, you need to be organized. Here is the full package structure you should follow from now on:
my_package/
├── .gitignore # Git ignore file for unnecessary files
├── .readthedocs.yaml # Configuration for Read the Docs
├── README.md # Project overview and instructions
├── pyproject.toml # Project configuration, dependencies, and build settings
├── package_name/ # Source code directory
│ ├── __init__.py # Init file
│ ├── base.py # Base classes and functions
│ ├── version.py # Dynamic version handling for the package
│ ├── sub_module1/ # Sub module 1
│ │ ├── __init__.py # Init file
│ │ ├── module1.py # Sub module 1 main module
│ │ └── module1_functions.py # Sub module 1 functions
│ └── sub_module2/ # Sub module 2
│ ├── __init__.py # Init file
│ └── module2.py # Sub module 2 main module
├── dist/ # Distribution files
├── docs/ # Sphinx documentation directory
└── tests/ # Test suite directory
Note the two hidden files:
.gitignoreto tellgitwhich files to ignore.readthedocs.yamlto tell Read the Docs how to build the documentation
Importantly, submodules have their own __init__.py file and are stored in their own folder.
Exercise: Create an account on Read the Docs, you will need it.
4.3. Working example¶
We now go through a working example of a package that deals with companies. It will shows you the crucial steps of building a package following good practices.
The package is available here on GitHub.
4.3.1. README.md¶
README.md is a markdown file that instructs users on what the package is about, how to install and use it.
# Company Package
<description of the package>
## Features
<description of the features>
## Installation
<description of the installation>
## Usage
<baseline example of usage>
## Documentation
Link to the [documentation page](https://your-readthedocs-url-here).
## Contributing
Contributions via pull requests are welcome!
## License
<description of the license>
4.3.2. Core module¶
The core module contains the base class and functions.
In our example, it is:
company_package/
└── company/
│ ├── __init__.py # Init file for the 'companies' package
│ ├── base_company.py # Base module containing the main `Company` class
│ └── version.py # Dynamic version handling for the package
Look at the files here.
Note that the files __init__.py and version.py are required, and their names are always this.
__init__.py is what turns your code into a package.
version.py sets-up the version number.
Most importantly, the base_company.py file contains the core code.
It usually starts with some imports of external packages (the dependencies in the pyproject.toml file) that are needed for the package to work.
import yfinance as yf
import pandas as pd
Only import what you need in each file (i.e., if a package is only used in one file, only import it in that file).
4.3.3. pyproject.toml¶
pyproject.toml is the configuration file for the package.
Here is what it should look like:
[build-system]
requires = ["setuptools", "wheel", "setuptools_scm"] # Build requirements
build-backend = "setuptools.build_meta"
[project]
name = "company" # name of the package must match the core folder name
dynamic = ["version"]
description = "A Python package for modeling companies across various sectors."
readme = "README.md"
requires-python = ">=3.9"
license = { file = "LICENSE" }
authors = [
{ name = "Your Name", email = "your.email@example.com" },
{ name = "Boris", email = "boris.bolliet@gmail.com" }
]
keywords = ["companies", "finance", "healthcare", "technology"]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Information Technology",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Topic :: Software Development :: Libraries"
]
# Runtime dependencies
dependencies = [
"numpy",
"pandas",
"yfinance",
]
[project.urls]
"Documentation" = "https://your-readthedocs-url-here"
"Source" = "https://github.com/yourusername/companies_package"
"Issues" = "https://github.com/yourusername/companies_package/issues"
[tool.setuptools_scm]
write_to = "company/version.py" # Where to write the dynamic version
[tool.setuptools.packages.find]
where = ["."]
For further details, see here.
The list of approved classifiers is available here. It tells you what to put in the classifiers field of the pyproject.toml file.
4.3.4. Package initialisation¶
To setup the dynamic versioning and development workflow.
From inside the package folder, in a terminal, we start with:
git init
git add .
git commit -m "Initial commit"
git tag 0.0.0beta0
This step is very important for the versioning to work with setuptools_scm.
We can now install the package in development mode with:
pip install -e .
We can now import the package in a python session, or notebook:
[2]:
import company as cp
We can then try the package out, testing some of the methods of the base class.
First, we create a company instance:
[3]:
my_company = cp.Company(name="Nvidia", ticker="NVDA")
Then we can test the display method:
[4]:
my_company.display_info()
Company Name: Nvidia
Ticker Symbol is: NVDA
Or the availability of the stock data:
[5]:
my_company.get_yfinance_status()
[5]:
'Available on yfinance'
And if it is available, we can get its stock history:
[7]:
stock_history = my_company.get_stock_info(period="1mo")
stock_history.head()
[7]:
| Open | High | Low | Close | Volume | Dividends | Stock Splits | |
|---|---|---|---|---|---|---|---|
| Date | |||||||
| 2024-09-26 00:00:00-04:00 | 126.800003 | 127.669998 | 121.800003 | 124.040001 | 302582900 | 0.0 | 0.0 |
| 2024-09-27 00:00:00-04:00 | 123.970001 | 124.029999 | 119.260002 | 121.400002 | 271009200 | 0.0 | 0.0 |
| 2024-09-30 00:00:00-04:00 | 118.309998 | 121.500000 | 118.150002 | 121.440002 | 226553700 | 0.0 | 0.0 |
| 2024-10-01 00:00:00-04:00 | 121.769997 | 122.440002 | 115.790001 | 117.000000 | 302094500 | 0.0 | 0.0 |
| 2024-10-02 00:00:00-04:00 | 116.440002 | 119.379997 | 115.139999 | 118.849998 | 221845900 | 0.0 | 0.0 |
4.3.5. Package development¶
With this good starting point, we can now start developing the package.
Let us do an example and implement a new class MedicalCompany that inherits from the base Company class.
We create a submodule medical, as a folder medical, with an __init__.py file and a medical.py file.
So, our tree structure now looks like this:
company_package/
├── README.md
├── company
│ ├── __init__.py
│ ├── base_company.py
│ ├── version.py
│ ├── medical
│ │ ├── __init__.py
│ │ └── medical.py
4.3.5.1. Relative imports¶
The core code of our package defines some class in the base_company.py file.
In the submodule, we define derived classes that inherit from the base class.
First, at the top of the medical.py file, we import the base class, and everything else we need:
import pandas as pd
from ..base_company import Company
Here, with .. we go up one level in the folder structure and ask to import the Company class that is defined in the base_company.py file. This is an example of a relative import.
Again, we also import pandas which is useful in this sub_module.
4.3.5.2. Class constructor¶
Python classes generally have an initialisation method __init__ that sets up the object (also called instance).
This method is called initializer or constructor.
It’s a method which means it is a function that depends on some parameters.
For instance, in our base_company.py file, we defined the Company class constructor as follows:
class Company:
def __init__(self, name, ticker=None):
"""
Initialize a Company instance.
Parameters:
- name (str): Name of the company.
- ticker (str): Stock ticker symbol if the company is publicly traded.
"""
self.name = name
self.ticker = ticker
4.3.5.3. Parameters passing¶
In the example above, the self parameter is a reference to the instance of the class. It must be there.
The name and ticker parameters are used to set the attributes of the instance, as can be seen in the constructor body.
Since ticker is presented with an equal sign and a default value, it is called an optional parameter.
However, name is not presented in such a way, so it is a required parameter.
It is common to use two additional parameters objects: *args and **kwargs.
*argsis used to pass a variable number of positional arguments to the constructor.**kwargsis used to pass a variable number of keyword arguments to the constructor.
What does this mean?
Let us see an example to see how this works and why it can be useful.
We add a method to the Company class that takes *args and **kwargs as arguments, whose purpose is to summarize activties of the company based on info provided in *args and **kwargs.
class Company:
...
def summarize_activity(self, *args, **kwargs):
"""
Summarizes company activities and additional information.
Parameters:
- *args: A list of activities related to the company.
- **kwargs: Additional information, like location or date.
"""
print(f"\nActivity Summary for {self.name}:")
if args:
print("Activities:")
for activity in args:
print(f" - {activity}")
if kwargs:
print("Additional Information:")
for key, value in kwargs.items():
print(f" - {key.capitalize()}: {value}")
Let see it at work:
[1]:
# Import the Company class from the company package
import company as cp
# Creating a Company instance
company = cp.Company(name="PharmaCorp")
# Example 1: Using *args to pass activities
company.summarize_activity("Researching new drugs", "Launching a public health campaign")
Company package version: 0.0.0b1.dev2+g7e4569a.d20241027
Activity Summary for PharmaCorp:
Activities:
- Researching new drugs
- Launching a public health campaign
In this example, we pass two activities as strings positional arguments.
They are stored in args and printed in the method body.
Note that other than inside the method, no other part of the code knows about them. In this sense, these are local variables.
Let us do a second example with both *args and **kwargs.
[5]:
# Example 2: Using both *args and **kwargs to provide activities and additional information
company.summarize_activity(
"Researching new drugs", "Launching a public health campaign",
location="New York", date="2024-10-27"
)
Activity Summary for PharmaCorp:
Activities:
- Researching new drugs
- Launching a public health campaign
Additional Information:
- Location: New York
- Date: 2024-10-27
Now, we pass two additional pieces of information as keyword arguments, via **kwargs.
These are stored in kwargs as a dictionary and printed in the method body. They are also local variables.
If I wanted to access these variables outside the method, I would need to store them as attributes of the instance.
For instance, this can be done by adding:
# Initialize activities if it hasn't been set yet
if not hasattr(self, 'activities'):
self.activities = []
# Store activities in the instance
self.activities.extend(args)
# Set each key-value pair in kwargs as an attribute
for key, value in kwargs.items():
setattr(self, key, value) # Dynamically create an attribute
to the summarize_activity method.
Of course, those attribute would only be set for this specific instance of the Company class and after the method has been called.
We can now do:
[6]:
# Import the Company class from the company package
import company as cp
# Creating a Company instance
company = cp.Company(name="PharmaCorp")
# Example 1: Using *args to pass activities
company.summarize_activity("Researching new drugs", "Launching a public health campaign",
location="New York", date="2024-10-27")
# Accessing dynamically set attributes to understand these are stored
print("\nDynamically set attributes:")
print("Activities:", company.activities) # Output: list of activities stored in the instance
print("Location:", company.location) # Output: New York
print("Date:", company.date) # Output: 2024-10-27
Activity Summary for PharmaCorp:
Activities:
- Researching new drugs
- Launching a public health campaign
Additional Information:
- Location: New York
- Date: 2024-10-27
Dynamically set attributes:
Activities: ['Researching new drugs', 'Launching a public health campaign']
Location: New York
Date: 2024-10-27
Switching order of the arguments:
[7]:
# Import the Company class from the company package
import company as cp
# Creating a Company instance
company = cp.Company(name="PharmaCorp")
# Example 1: Using *args to pass activities
company.summarize_activity(location="New York", date="2024-10-27",
"Researching new drugs", "Launching a public health campaign")
# Accessing dynamically set attributes to understand these are stored
print("\nDynamically set attributes:")
print("Activities:", company.activities) # Output: list of activities stored in the instance
print("Location:", company.location) # Output: New York
print("Date:", company.date) # Output: 2024-10-27
Cell In [7], line 9
"Researching new drugs", "Launching a public health campaign")
^
SyntaxError: positional argument follows keyword argument
It does not work. The keyword arguments must come after the positional arguments.
Now providing only keyword arguments:
[8]:
# Import the Company class from the company package
import company as cp
# Creating a Company instance
company = cp.Company(name="PharmaCorp")
# Example 1: Using *args to pass activities
company.summarize_activity(location="New York", date="2024-10-27")
# Accessing dynamically set attributes to understand these are stored
print("\nDynamically set attributes:")
print("Activities:", company.activities) # Output: list of activities stored in the instance
print("Location:", company.location) # Output: New York
print("Date:", company.date) # Output: 2024-10-27
Company package version: 0.0.0b1.dev8+g5c0d18a.d20241030
Activity Summary for PharmaCorp:
Additional Information:
- Location: New York
- Date: 2024-10-27
Dynamically set attributes:
Activities: []
Location: New York
Date: 2024-10-27
It works.
4.3.5.4. Class inheritance¶
Still in the medical.py file, we define the MedicalCompany child (or derived) class that inherits from the parent (or base) Company class:
class MedicalCompany(Company):
...
If you want the child class to be exaclty the same as the base class, you can use pass:
class MedicalCompany(Company):
pass
With this, all methods of the base class are inherited by the child class.
Exercise: Create the MedicalCompany class which inherits from the Company class and does nothing else.
For example, we get the following behaviour:
[1]:
import company as cp
med_comp = cp.MedicalCompany(name="HealthCare Inc.",ticker="HCI")
med_comp.display_info()
Company package version: 0.0.0b1.dev2+g7e4569a.d20241027
Company Name: HealthCare Inc.
Ticker Symbol is: HCI
In general, the child class has additional methods and attributes.
In this case, we don’t use pass. Instead, we re-write the methods of the base class, use super() to call the methods of the base class and add new attributes and methods.
For example,
class MedicalCompany(Company):
def __init__(self, name, specialty, drug_manufacturer=False, ticker=None):
super().__init__(name, ticker)
self.specialty = specialty
self.drug_manufacturer = drug_manufacturer
def display_info(self):
"""Displays basic information about the medical company."""
super().display_info()
print(f"Medical Specialty: {self.specialty}")
print(f"Drug Manufacturer: {'Yes' if self.drug_manufacturer else 'No'}")
We get the following behaviour:
[2]:
import company as cp
med_comp = cp.MedicalCompany(name="HealthCare Inc.", specialty="Oncology", drug_manufacturer=True, ticker="HCI")
med_comp.display_info()
Company package version: 0.0.0b1.dev2+g7e4569a.d20241027
Company Name: HealthCare Inc.
Ticker Symbol is: HCI
Medical Specialty: Oncology
Drug Manufacturer: Yes
see futher examples in medical.py.
4.3.5.5. Package data¶
It can be useful (sometimes necessary) to store data in your package.
We create a data folder in the core package folder, and put the data there.
Let us say we have a dataset with drug approval data drug_data.csv.
We put this file in company_package/company/data/drug_data.csv.
And we tell our configuration pyproject.toml about it:
[tool.setuptools.package-data]
"company" = ["data/*"]
This tells setuptools to include the data in the package distribution, and that the data is in the company folder.
See our pyproject.toml file for details.
Let us create an example and make up a dataset.
[4]:
import pandas as pd
import os
# Create the sample drug data as specified
data = {
"company_name": ["PharmaCorp", "PharmaCorp", "HealthMed", "HealthMed", "PharmaCorp", "PharmaCorp", "BioLife", "BioLife"],
"drug_name": ["DrugA", "DrugB", "DrugC", "DrugD", "DrugE", "DrugF", "DrugG", "DrugH"],
"approval_attempts": [3, 2, 1, 4, 1, 5, 3, 2],
"approval_status": ["approved", "approved", "approved", "approved", "approved", "rejected", "rejected", "approved"]
}
# Convert to a DataFrame
drug_data_df = pd.DataFrame(data)
# Save the DataFrame to a CSV file
file_path = f"{os.path.expanduser('~')}/drug_data.csv"
drug_data_df.to_csv(file_path, index=False)
And move the file to the data folder in the package.
Our tree structure now looks like this:
company_package/
├── README.md
├── company
│ ├── __init__.py
│ ├── base_company.py
│ ├── version.py
│ ├── medical
│ │ ├── __init__.py
│ │ └── medical.py
│ └── data
│ └── drug_data.csv
...
To see how this is used, look at the drug_approval_summary method in the medical.py file.
Note that we use the files function from the importlib.resources package to get the path to the data file automatically. At the top of the medical.py file, we add:
from importlib.resources import files
Let us try.
[1]:
import company as cp
med_comp = cp.MedicalCompany(name="PharmaCorp", specialty="Oncology", drug_manufacturer=True)
med_comp.drug_approval_summary()
Company package version: 0.0.0b1.dev2+g7e4569a.d20241027
Drug Approval Summary for PharmaCorp:
- DrugA: 2 failed attempt(s) before approval
- DrugB: 1 failed attempt(s) before approval
- DrugE: 0 failed attempt(s) before approval
- DrugF: 4 failed attempt(s) before approval
4.3.6. Turning methods into commands¶
Once you have reached a certain level of maturity in your package, you might want to turn some methods into commands that can be run from the command line.
To do so, create a cli.py file in the core package folder. , i.e., the tree structure is now:
company_package/
├── README.md
├── company
│ ├── __init__.py
│ ├── base_company.py
│ ├── version.py
│ ├── medical
│ │ ├── __init__.py
│ │ └── medical.py
│ └── cli.py
...
Let us implement two commands. One that simply uses the display_info method of the Company class (call it display_info), and one that actually performs some calculations (call it get_stock_price_difference).
See the cli.py for their implementation.
Then tell the package that it needs to create console commands for your package from the methods in the cli.py file.
Do it in the pyproject.toml file, and add:
[project.scripts]
company = "company.cli:main"
Now, in bash, you can run, for instance:
company display_info --ticker=AAPL
Let’s do it in the notebook.
[2]:
!company display_info --ticker AAPL
Company package version: 0.0.0b1.dev3+g73e64cc.d20241027
Company Name: N/A
Ticker Symbol is: AAPL
Let us now use the second command.
[3]:
!company get_stock_price_difference --ticker NVDA --interval 5mo --stop_date 2024-09-25
Company package version: 0.0.0b1.dev3+g73e64cc.d20241027
Stock price difference for NVDA over 5mo ending 2024-09-25: 9.628097534179688
You can ask for help on the commands by running:
[4]:
!company --help
Company package version: 0.0.0b1.dev3+g73e64cc.d20241027
usage: company [-h] {display_info,get_stock_price_difference} ...
Company CLI Tool
positional arguments:
{display_info,get_stock_price_difference}
display_info Display company information
get_stock_price_difference
Get stock price difference
optional arguments:
-h, --help show this help message and exit
And further details on a specific command by running:
[5]:
!company get_stock_price_difference --help
Company package version: 0.0.0b1.dev3+g73e64cc.d20241027
usage: company get_stock_price_difference [-h] --ticker TICKER
[--interval INTERVAL] --stop_date
STOP_DATE
optional arguments:
-h, --help show this help message and exit
--ticker TICKER Stock ticker symbol (e.g., AAPL).
--interval INTERVAL Time period (e.g., '1y', '6mo', '2y').
--stop_date STOP_DATE
End date in YYYY-MM-DD format.
Exercise: Create your own command that does something interesting.
Where are the commands stored?
4.4. Naming conventions¶
The style guide for Python code is called PEP 8.
PEP means Python Enhancement Proposal. There are many PEPs, each has a number and they all are on this page.
Here’s a guide to naming conventions in Python following the PEP 8 style guide and best practices:
4.4.1. Package and File Names¶
Convention: Use lowercase letters. You can use underscores (
_) when necessary.Reason: Keeps names concise and readable, and avoids naming conflicts.
Examples:
Package:
my_package,data_toolsFile:
process_data.py,utils.py
4.4.2. Modules¶
Convention: Same as file names (lowercase, underscores if needed).
Reason: Module names are usually file names.
Examples:
data_analysis,file_handler
4.4.3. Classes¶
Convention: Use
PascalCase(aka CapitalizedWords).Reason: Easily distinguish classes from variables or functions/methods.
Examples:
DataProcessor,MyCustomException
4.4.4. Methods¶
Convention: Use
snake_case(all lowercase with underscores between words).Reason: Matches function naming convention in Python.
Examples:
process_data,get_user_input
4.4.5. Variables¶
Convention: Use names in format like
snake_case.Reason: Matches Python’s style for variables.
Examples:
user_name,max_value
4.4.6. Additional Notes¶
Constants should use
UPPERCASE_WITH_UNDERSCORES.Example:
DEFAULT_TIMEOUT,MAX_RETRIES
Private or “internal use only” variables/methods should begin with a single underscore.
Example:
_private_method,_internal_cache
Here is an example of what a private method and variable look like:
[4]:
class Calculator:
def __init__(self):
self._factor = 2 # Private attribute for internal use
def multiply(self, number):
"""Public method to multiply a number by the private factor."""
return self._private_multiply(number, self._factor)
def _private_multiply(self, num1, num2):
"""Private method to perform multiplication."""
return num1 * num2
# Usage example
calc = Calculator()
# Using the public method (preferred)
result = calc.multiply(5)
print(result) # Output: 10
# Accessing the private method directly (discouraged but possible)
direct_result = calc._private_multiply(5, 3)
print(direct_result) # Output: 15
10
15
In this example, _private_multiply is a private method because it starts with a single underscore, in principle it should never be used outside the class. Similarly, _factor is a private variable because it starts with a single underscore and should never be accessed outside the class.
What is the point of this?
To hide what is under the kitchen sink, i.e., the internal details that your users do not need to know about.