An infographic on the arrange-act-assert pattern for unit testing. Unit tests arrange data and parameters needed to test code, act by applying the code to the data, and assert that the result of applying the code to the data is as expected.

Unit testing with pytest

In this article we discuss using unit tests for data science projects. Specifically, we talk about using the pytest module in python to implement unit tests for data science projects. We will discuss what unit tests are, how they are useful, and how to set them up using pytest. 

This article was created as part of a larger case study on developing data science models. That being said, it is also a great standalone resource if you are looking for a gentle introduction to unit testing in Python. 

An infographic on the arrange-act-assert pattern for unit testing. Unit tests arrange data and parameters needed to test code, act by applying the code to the data, and assert that the result of applying the code to the data is as expected.

What is a unit test?

What is a unit test? A unit test is a simple, self contained test that is applied to a specific piece of code to ensure that the code behaves as expected. One easy way to think of what a unit test dose is to look towards the arrange-act-assert pattern. A unit test arranges the environment and sets up the data that is necessary to test a piece of code, acts by applying the code to the data, then asserts that the output from applying the code to the data is as expected. 

Generally, an individual unit test tests that a piece of code behaves correctly in a specific situation. This means that you may have multiple unit tests that test the same underlying piece of code. However, the converse is not true. Each unit test should only test one piece of code. 

Why are unit tests useful for data science?

Why should you use unit tests for your data science projects. Here are some of the main reasons to use unit tests for data science projects. 

  • Find bugs faster. The most obvious reason to use unit tests is that unit tests will help you find bugs in your code faster. This increases the chances that you will find the bugs before they get released into the wild and cause issues. Even if you are building backend models that serve internal users rather than models that are incorporated into a user facing product, reducing the number of errors in your analyses will increase your client’s trust in your data science work. As an added bonus, having unit tests also makes it easier to narrow in on exactly where your bugs are coming from, saving you valuable time and effort.
  • Ensure edge cases are handled correctly. Unit tests are particularly useful if you are writing code that has a lot of edge cases where your code might have undesirable behavior. These types of edge cases are common in many data science projects. You can easily write up a simple test, or a suite of tests that run through your edge cases and ensure that your code performs correctly. 
  • Enable developing code using toy data. Unit tests make it easier to develop your code using toy data. Rather than using a collection of ad-hoc scripts that may or may not be tracked to test your code, you have defined test files with toy examples ready so that the data you used is still ready and available in a logical place next time you need to make changes to your package. 
  • Iterate quickly with confidence. While unit tests are helpful to have around when you are initially developing your code, they are potentially even more useful when you are making changes to your code. Unit tests give you confidence that the changes that you are making to your code are not breaking anything. This enables you to iterate more quickly and take big swings with confidence. 
  • Help others get up to speed faster. It is often said that good tests serve as documentation, and that is for good reason! Oftentimes the easiest way to get up to speed with a new codebase is to look at the tests to see exactly what behavior is expected from each function on a set of toy datasets. 

Unit testing in Python

There are multiple packages available in Python that enable easy testing of Python code. Two of the most popular testing packages are pytest and unittest. Unittest is a built in package that comes along with Python and pytest is a third party package that builds upon unittest.

For the purpose of this case study, we will be using pytest to build out our unit tests. Pytest enables you to write smaller, more compact tests with less boilerplate code. This makes it great for beginners who have not written many unit tests before. Pytest also makes it easy to filter to subsets of tests that you want to run and re-run the same test with different sets of parameters. 

Pytest basics

Setting up tests with pytest

Here are some of the basics concepts you need to understand before you create unit tests with pytest. This includes basic pytest conventions as well as information about how to run tests. We will go into more details with implementation examples in the next section, but first we will go over the high level concepts and patterns you need to know.  

  • Give test files names that start with test_ or end with _test. When you run pytest in a given directory, pytest looks for any files that have a test_ prefix or a _test suffix in the name and runs all of the tests in those files. If you run your tests from a directory with subdirectories, then pytest will also look for files within those subdirectories. We generally create a directory called tests in the main directory of our project that holds all of our test files.
  • Give test functions names that start with test. Just as pytest will scan through directories looking for files that start or end with test, pytest will also scan through those files looking for functions that begin with test to run. Any unit test function you write should begin with the test prefix. 
  • Use assert statements to express test conditions. Within your test functions, you should use assert statements to denote the expected results of your code. For example if you are testing a function called add_one that is supposed to add one to a number, you should use an assert statement to ensure that the function actually adds one to a number. Your assert statement might look something like this: assert add_one(2) == 3.
  • Run tests from the terminal. After you write all of your tests and save all of your test files, you can run your tests from the terminal. We will provide an example in the next section, but this is a great resource if you are looking to learn more about running tests through pytest.

Creating toy data for unit tests

For most unit tests you write in pytest, you will need to create some type of toy data that you can feed to the function you are testing. There are a few different ways you can do this.

  • Define data for one test in the test function. The simplest way is to just define your data inside of your test function. This is a great option if you have a piece of data that you only need to use for one test. However, data that is defined inside a test function will not be available for other tests. If you need to use a piece of data for multiple tests, then you should not redefine it inside each test. Instead, you should define it in a location where it will be accessible to multiple tests.  
  • Define data for multiple tests in a fixture. If you want to define a piece of data that can be used by multiple tests, then you can use what is called a fixture. The simplest way to do this is to just define a fixture in the file where the tests that need to use the fixture live, however there are other solutions if you want to define a fixture that can be used across multiple tests files. We will show you an example of how to create a pytest fixture later in this article, but in general you just need to define a function that returns the data that you want to use. The data you return can be defined within the function, or read in from an external path inside of your function. 
  • Define data for multiple test files in a conftest file. So we have covered what you should do if you want to define a piece of data that is available to multiple tests in a pytest file, but what if you want to define a piece of data that is available to multiple pytest files? This is when you would define a conftest.py file. You can think of a conftest file as just a file that will get run before your tests are run. Anything that is available in your conftest file will be available to your tests when they run. This includes fixtures that are defined in the conftest file. 

Upgrading your pytest tests

Here are a few other ways that you can simplify your testing workflow using pytest functionality. These are not concepts that you absolutely need to understand in order to be able to run unit tests with pytest. You can think of them more as the cherry on top. 

  • Parameterize tests to re-run the same test with different parameters. Pytest provides functionality that allows you to parameterize tests, which essentially allows you to re-run a test multiple times with different parameters. This is useful as it may allow you to test multiple edge cases with one piece of code rather than having to re-write the same test multiple times to test out different sets of parameters. We will provide an example of a parameterized test later in this article.
  • Use pytest mark to filter to tests you want to run. We mentioned before that pytest makes it easier to filter to the specific test that you want to run. Part of what allows this is the mark functionality that allows you to mark individual tests with any label you want. When you run your tests, you can specify that you only want to run tests that do or do not contain a label. 

Creating your first unit test with pytest

If you are following along with our case study, now is the time to break out your code and create a new branch in your git repository. Even if you are not following along with our case study, most of this code is self contained and can be implemented in standalone files. 

For the sake of this part of the case study, we will be writing a function to upsample the minority class in our data then writing tests for that function. Normally we would use a sampling package such as imblearn to do this, but for now we will use our own function because it makes for a nice self-contained example. 

1. Create a test directory and a test file

Before you create any of your test files, you should create a directory to store your tests. We recommend creating a directory called tests in your main project directory.

After you create your test directory, you can create your first file. This file should either begin with test_ or end with _test to indicate to pytest that this is a file that contains tests. For the purpose of our case study, we will create a file called test_sample.py to create some basic tests for our data sampler.

2. Create a function to test

After you create your tests file, you should write a function that you want to test. This is the function that will be used to upsample the minority class in your data. If you are following along with our case study, this should be created in the sample.py file within the package that we created before. If you want, you can just copy the function that we used. 

Our upsampling function takes three arguments that we will reference over the course of this tutorial. The first is a DataFrame that will be resampled, the second is the name of the column that contains the outcome variable, and the third is the proportion of the rows that we want the minority class to make up after we are done resampling. 

from collections import Counter
import math
import pandas as pd

def upsample_minority_class(data, outcome, p_minority):
  outcome_counts = Counter(data[outcome])
  majority_class, majority_count = outcome_counts.most_common()[0]
  minority_class, minority_count = outcome_counts.most_common()[-1]
  desired_total_count = math.ceil(majority_count/(1-p_minority))
  n_samples = desired_total_count - majority_count - minority_count
  samples = data \
    .loc[data[outcome] == minority_class] \
    .sample(n_samples, replace=True)
  upsampled_data = pd.concat([data, samples])
  return upsampled_data

3. Create a fixture to test your function

After you create the function that you want to test, you can create the fixture that you want to use to test your function. As a reminder, a fixture is a function that defines a piece of data then returns that piece of data so that it can be used in tests. You will need to use the @pytest.fixture decorator to let pytest know that the function you defined is a fixture.

For the sake of this example, we will create a simple pandas data frame with two columns. We will call the outcome variable y and ensure that one class has fewer instances than the other so that we will have a minority class to upsample. For now, we will put this fixture in our test_sample.py test file. 

import pandas as pd
import pytest

@pytest.fixture
def data():
  data = pd.DataFrame({
    'y': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
    'x1': [1, 2, 1, 2, 1, 2, 1, 5, 6, 5],
  })
  return data

4. Create a basic test

Now we will create a basic unit test to test our upsampling function using the data frame that we created as a fixture in the previous step. In order to tell pytest that a test is going to use a specific fixture, you should pass the name of the fixture as an argument of the test function. You can pass as many fixtures as you want as long as they are all defined and available at the time the test is run.

As a reminder, pytest test functions should start with the word test and contain assert statements that assert whether certain conditions are true. For our first unit test, we will simply use the upsampling function to upsample the minority class in our data and check whether the resulting object is a pandas DataFrame. 

import bank_deposit_classifier.sample as sample

def test_upsample_minority_class(data):
  data_ = sample.upsample_minority_class(data, 'y', 0.5)
  assert isinstance(data_, pd.DataFrame)

5. Run tests with pytest

After you set up your test files, all that is left is to run your test using the pytest common in the terminal. You should navigate to the main project directory then use the pytest command. Here is more information on running tests using pytest

pytest

Advanced pytest concepts

Now that we have created our first unit test using pytest, we will iterate on this test and add some other tests to demonstrate additional concepts and functionality. 

1. Test that errors are raised

Now we will create some tests that check that errors are raised properly. In order to do this, we will first need to modify our function to raise some errors that we can check for. We added two checks to our upsampling function that check whether certain conditions are true. First we added a check to ensure that the proportion that is passed to the upsampling function is between 0 and 1. 

def check_p_minority_bounds(p_minority):
  if (p_minority > 1) or (p_minority < 0):
    msg = 'Proportion out of bounds! p_minority must be between ' \
    f'0 and 1, but value passed was {p_minority}.'
    raise ValueError(msg)

After that, we added another test to check that the outcome that is passed to our upsampling function is binary. If the outcome has more or less than 2 categories, it will raise an error that says that the function expected a binary outcome. 

def check_outcome_binary(data, outcome):
  outcome_counts = Counter(data[outcome])
  n_outcomes = len(outcome_counts.keys())
  if n_outcomes != 2:
    msg = 'Binary outcome expected but specified outcome ' \
    f'has {n_outcomes} classes'
    raise ValueError(msg)

When we were done adding these checks to our upsampling function, the function looked something like this. Now there are two errors that we can check for – one that is raised when a proportion that is not between 0 and 1 is passed, and another that is raised when the column that is labeled as the outcome does not contain a binary variable. 

from collections import Counter
import math

import pandas as pd

def upsample_minority_class(data, outcome, p_minority):
  def check_p_minority_bounds(p_minority):
    if (p_minority > 1) or (p_minority < 0):
      msg = 'Proportion out of bounds! p_minority must be between ' \
        f'0 and 1, but value passed was {p_minority}.'
      raise ValueError(msg)

  def check_outcome_binary(data, outcome):
    outcome_counts = Counter(data[outcome])
    n_outcomes = len(outcome_counts.keys())
    if n_outcomes != 2:
      msg = 'Binary outcome expected but specified outcome ' \
        f'has {n_outcomes} classes'
      raise ValueError(msg)

  check_p_minority_bounds(p_minority)
  check_outcome_binary(data, outcome)

  outcome_counts = Counter(data[outcome])
  majority_class, majority_count = outcome_counts.most_common()[0]
  minority_class, minority_count = outcome_counts.most_common()[-1]
  desired_total_count = math.ceil(majority_count/(1-p_minority))
  n_samples = desired_total_count - majority_count - minority_count
  samples = data \
    .loc[data[outcome] == minority_class] \
    .sample(n_samples, replace=True)
  upsampled_data = pd.concat([data, samples])

  return upsampled_data

Now that our function has a few different errors it raises, we can create tests that pass values that we would expect to trigger these errors and check that these errors are raised. In order to do this, we will need to use the pytest raises function. 

Using the pytest raises function, you can pass the type of error that you expect to be raised in a situation, then search through the error message to ensure that the correct error was raised. For example, we know that if we pass a proportion that is not between 0 and 1 then we expect our upsampling function to raise an error with a message that contains the text “Proportion out of bounds”.

We added these tests to our test_sample.py file to ensure that the checks that we added to our upsampling function raised the appropriate errors.  

def test_upsample_minority_class_high_p(data):
    with pytest.raises(ValueError) as e:
        sample.upsample_minority_class(data, 'y', 1.5)
    assert "Proportion out of bounds" in str(e.value)

def test_upsample_minority_class_binary(data):
    with pytest.raises(ValueError) as e:
        data_ = data.loc[data['y'] == 1]
        sample.upsample_minority_class(data_, 'y', 0.5)
    assert "Binary outcome expected" in str(e.value)

2. Parameterize tests

Now we will use the pytest parameterize functionality to re-run the same tests using different parameters. Specifically, we will update the test_upsample_minorty_class tests in our test_sample.py file so that we can pass a few different values for the proportion argument. 

In order to parameterize our test, we will use the @pytest.mark.parameterize decorator and pass the decorator 2 arguments. First, we will pass the name of the argument(s) that we want to pass multiple values for. After, we will pass the parameter values that we want to use.

For each parameterized run of the test, we will check that the final minority class proportion observed in the dataset is approximately equal to the desired proportion that was passed to the upsampling function. Since there are only a few rows in our toy dataset, we know that the proportions might not match exactly. As such, we will use the pytest approx function to assert that the proportions are approximately equal. 

@pytest.mark.parametrize("p", [(0.5), (0.75), (0.9)])
def test_upsample_minority_class(data, p):
    outcome = 'y'
    outcome_counts = Counter(data[outcome])
    minority_class, _ = outcome_counts.most_common()[-1]

    data_ = sample.upsample_minority_class(data, outcome, p)
    n_minority = data_.loc[data_[outcome] == minority_class].shape[0]
    n_total = data_.shape[0]
    p_minority = n_minority/n_total

    assert p_minority == pytest.approx(p, rel=0.05)

Good thing that we added this parameterized test! It showed us that the way the function was currently coded, we got unexpected results if we tried to enter a proportion that was lower than the current observed proportion (or the proportion at which the minority class is seen in the actual data).  

To fix this issue, we will modify the function that checks whether the proportion falls within the acceptable bounds to ensure that the proportion that is passed to the upsampling function is larger than the observed proportion at which the minority class is present. 

def upsample_minority_class(data, outcome, p_minority):
  def check_p_minority_bounds(p_minority, p_minority_pre):
    if (p_minority > 1) or (p_minority < p_minority_pre):
    msg = 'Proportion out of bounds! p_minority must be between ' \
      f'{p_minority_pre} and 1, but value passed was {p_minority}.'
    raise ValueError(msg)

  def check_outcome_binary(data, outcome):
    outcome_counts = Counter(data[outcome])
    n_outcomes = len(outcome_counts.keys())
    if n_outcomes != 2:
      msg = 'Binary outcome expected but specified outcome ' \
        f'has {n_outcomes} classes'
      raise ValueError(msg)

  check_outcome_binary(data, outcome)
  outcome_counts = Counter(data[outcome])
  majority_class, majority_count = outcome_counts.most_common()[0]
  minority_class, minority_count = outcome_counts.most_common()[-1]
  p_minority_pre = minority_count / data.shape[0]
  check_p_minority_bounds(p_minority, p_minority_pre)
  desired_total_count = math.ceil(majority_count/(1-p_minority))
  n_samples = desired_total_count - majority_count - minority_count
  samples = data \
    .loc[data[outcome] == minority_class] \
    .sample(n_samples, replace=True)
  upsampled_data = pd.concat([data, samples])

  return upsampled_data

3. Add a conftest file for shared resources

Now we will add a conftest file and move our fixture to this file so that it is accessible across different testing files. As a reminder, a conftest file is a file that is run before the tests are run that contains information that is accessible across all tests. This might be overkill at this point since we only have one testing file, but it is good practice to become comfortable with using a conftest file.

The conftest file will be called conftest.py and it will be located in our tests directory alongside our test_sample.py file. We will move the fixture that we created in our test_sample file to our conftest file then run our tests again to ensure that everything still runs smoothly. 

4. Mark tests

Are we stated earlier, the pytest mark decorator makes it easy to run a subset of tests that have the same marker. There are some built in markers that have predefined behaviors. You can use these markers without defining them or their behavior.

Custom markers that you define yourself should be defined in a pytest.ini file. All that you have to do is list the names of your markers and a short description of them. We will create a pytest.ini file and put it in our tests directory then define a custom mark called errors. We will use this marker to mark tests that assert that errors are raised. 

An example of a minimal pytest.ini file for defining marks in pytest.

After we define our errors mark in our pytest.ini file, we will add mark decorators to the test that ensure that errors are raised in our test_sample.py file. All you have to do is add @pytest.mark.errors decorators before your function. 

@pytest.mark.errors
def test_upsample_minority_class_high_p(data):
    with pytest.raises(ValueError) as e:
        sample.upsample_minority_class(data, 'y', 1.5)
    assert "Proportion out of bounds" in str(e.value)

@pytest.mark.errors
def test_upsample_minority_class_binary(data):
    with pytest.raises(ValueError) as e:
        data_ = data.loc[data['y'] == 1]
        sample.upsample_minority_class(data_, 'y', 0.5)
    assert "Binary outcome expected" in str(e.value)

After you mark your test in your testing files, you can choose which marks should be included or excluded when you run your tests but using the -m option. For example, you can run only the tests that have the errors mark using the following command. 

pytest -m errors

You can also do the opposite and run only the tests that do not have the errors mark. This is also done using the -m flag and running a command like this. 

pytest -m "not slow"

Leave a Comment

Your email address will not be published. Required fields are marked *