Creating a Python package

Share this article

In the previous step of this case study, we explored our data in an ad hoc fashion using Jupyter notebooks. Now it is time to get to work on the code that will be used to prepare our data and train our model. Before we start coding, we will set up a Python package so that we can easily import our code across different environments. 

This article was created as part of a larger case study on developing data science models. That being said, it is also a great standalone resource if you are looking for a gentle introduction to creating a Python package. 

A schematic that shows the basic structure of a directory for making a Python package.

Why create a Python package

Why should you organize your code into a Python package rather than just maintaining a collection of scripts? Here are some advantages to using a Python package. 
 
  • Import code in different environments. The biggest advantage of organizing your code into a Python package is that it makes it easy to import your code across a variety of different environments. If you package your code up, then it is easy to import your code across different machines or clusters of machines.
  • Test code with ease. It is much easier to create tests for your Python code  if you turn your code into a Python package. This is partially because it makes it easier to import your code into test files.
  • Track different versions of your code. If you organize your code into a Python package then it is much easier to track different versions of your code and toggle back and forth between different versions of your code. 

How to create a Python package

What do you need to do to create a Python package? Here are the minimum changes that need to be made to your code so that it can be imported as a Python package and installed using pip. 

  • Add __init__.py files. This __init__.py file is one of the most fundamental tools for  creating python packages. In order to import a directory as a simple Python package, the directory must have an __init__.py file in it. Every subdirectory that exists within the main package directory must also contain an __init__.py file. In many cases, your __init__.py files will be empty and that is fine. If you want to learn more about __init__.py files check out our post on python init files

Other common files in Python packages

  • setup.cfg. A setup.cfg file is a configuration file that contains static information about your package that does not change often, such as the name and author of the package. This information is used by setup tools, which is the default tool for packaging and distributing Python code. Technically you can build a python package without a setup.cfg file and include all of the relevant information in a setup.py file, but it is best practice to include a setup.cfg file for static information. 
  • setup.py. A setup.py file is similar to a setup.cfg file, however rather than being a static configuration file, it is a Python file that allows you to dynamically import information and modify it. The setup.py file is also used by setuptools and can be used to pass the same information as the setup.cfg file. Technically, a directory does not need to include a setup.py file in order to be considered a Python package. However, a setup.py file is required if you want to import an editable version of your package using pip – which we do! 
  • README. A readme file is a file that contains human-readable documentation about the Python package. If you are following along with our case study, you should already be familiar with the concept of a README file because we created one when we initialized our GitHub repository in a previous step
  • LICENSE. A LICENSE is a file that tells other developers how they are allowed to use your code. It generally provides information such as whether the code is allowed to be modified or what use cases it is allowed to be used for. If you are following along with our case study, you should be familiar with the concept of a LICENSE because we already initialized our GitHub repository with one in a previous step
  • pyproject.toml. A pyproject.toml file is used to specify the resources that are needed in order to build the package. For example, if you need to specify a specific version of setuptools that should be used to build your Python package, then you can specify that in your pyproject.toml file. If you want to read more about pyproject.toml files then we recommend checking out this resource

Steps for creating a Python package

For this step of our case study, we will create a minimalistic Python package and import that package into another Python script. You should work through all of these steps if you are following along with our case study.

In the previous few steps of our case study, we walked you through the process of creating a new branch in your git repository and merging your changes into your main branch. From this point forward, we will assume that you know how to complete these steps on your own. We will call the branch for this step of our case study python-package.  

Before we get started, here is an example of what your project directory should look like when we are done with these steps. You can refer back to this diagram if you get confused about where to place any of the files you create. 

A schematic of what your project directory should look like after adding your python packaging files.

1. Create a package directory with a file

As a first step, you should create a directory to hold the code for your Python package. You should create a directory called pkgs that is in the main directory that contains your Python packages. This directory should be at the same level as your LICENSE, and README files. 

Within the pkg directory, create a directory called bank_deposit_classifier to hold all the code related to training your classifier to predict whether someone will submit a bank deposit. Your Python package will be imported using the name bank_deposit_classifier. 

Within this directory, you should add a file to hold the code that will be imported when you import your package. Create a file called sample.py and add a simple function to it so that you can test out whether your package is importing correctly later on. We recommend adding this code to your sample.py file.

def print_hi():
  return print('hi')

2. Add init files

After you create your package directory, you should add an  __init__.py file to every directory that contains Python code you want to be able to import. In this case, there is only one directory that contains Python code – the bank_deposit_classifier directory. There are no subdirectories within that directory. That means we will only need to add one __init__.py file.

3. Add a setup.cfg

After you add your __init__.py files, it is time to add a basic setup.cfg file. This file should be added in the main project directory where your README and LICENSE files are found rather than in the pkgs directory that contains your source code. This file should contain static information that does not change often such as the name of your package and the author. 

Here is an example of a daily minimal setup.cfg. This is more than enough information for our example package. If you want to read more about information that can be stored in a setup.cfg file, see this article on packaging and distributing Python code

[metadata]
name = bank-deposit-classifier
version = 0.0.1
author = Crunching the data
author_email = crunchingthedata@gmail.com
description = Bank deposit classifier.
url = https://github.com/crunchingthedata/case-study-one

[options]
python_requires = >=3.9

4. Add a setup.py

After you add a setup.cfg file, you should add a setup.py file. In this file you should import the setup method from the setuptools package and pass the setup method information about your package. The same information that is stored in the setup.cfg file can also be passed to the setup method, however it is best practice to keep static information that does not change often in the setup.cfg file. 

The setup.py file is useful for cases where you want to read in information from other files and pass it to the setup method. For example, we will open our README file and pass the contents of our README file as the long description using the setup method. We will also use this file to specify where the source code for our package is in relation to the setup.py file. This information can also be specified in the setup.cfg file, but we prefer to specify it in our setup.py file as we find it easier to read.

import setuptools

with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()

setuptools.setup(
package_dir={"": "pkgs"},
packages=setuptools.find_packages(where="pkgs"),
long_description=long_description
)

If there is no dynamic information that needs to be passed, you can call the setup method with no arguments. If you want to read more about information that can be stored in a setup.py file, see this article on packaging and distributing Python code

5. Add a pyproject.toml

Finally, you should add a pyproject.toml file to the same directory as your setup.py and setup.cgf files. This file is used to specify the resources that are used to build your package, such as the version of setuptools that you want to use. Here is an example of a minimal pyproject.toml file. You should use this configuration unless you have a reason to add different specifications. 

[build-system]
requires = [
    "setuptools>=42",
    "wheel"
]
build-backend = "setuptools.build_meta"

6. Install the package

Now you have created all the files necessary to build your Python package. We will use pip to install an editable version of the package so that the changes you make to the packages will be recorded in real time. This means that you do not have to rebuild the package every time that you make updates to the package as you are developing it.  

Navigate to the main package directory that contains your setup.py and setup.cfg files in your terminal and run the following command. The -e option that is used tells pip that you want to install an editable version of the package. 

pip install -e .

7. Import your package

Next we will import our package in another Python script to make sure that it is working correctly. First, we will create another folder called scripts in our main project directory to hold our Python scripts. This directory should be in the same location as your notebooks and pkgs directory. 

Create a python script called import_package.py in your scripts folder and make sure that you can import your Python package. Import the print_hi function that you added to the sample file in your package and make sure that it works using the following code. 

import bank_deposit_classifier.sample as s
s.print_hi()

Other resources

If you have any other questions about creating Python packages then you should check this website on python packaging. 


Share this article

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *