Are you wondering whether object oriented programming should be used for data science? Or maybe you are more interested in hearing about specific examples of situations where object oriented programming should be used for data science? Well either way, you are in the right place!
In this article, we tell you everything you need to know about object oriented programming for data science. First, we explain what object oriented programming is and how it differs from other programming paradigms. After that, we talk about some of the main advantages of object oriented programming. Next, we talk more generally about whether object oriented programming should be used for data science. Finally, we provide examples of situations where object oriented programming is particularly useful for data science work.
What is object oriented programming?
Before we talk about whether object oriented programming should be used for data science, we will first talk about what object oriented programming is and why object oriented programming is different from other programming paradigms. So what is object oriented programming? Object oriented programming is a coding paradigm that is designed around objects that hold data rather than standalone functions. These objects can store internal data or internal state that can be updated based on interactions that objects have with their environment.
What does it mean to say that a paradigm is designed around objects? In order to help clarify this, we will first lay out a simple example. Imagine you were creating a program to simulate actions that take place on a farm. Now imagine that you are in charge of implementing the code responsible for simulating animals on the farm. You need to simulate three different types of animals – cows, chickens, and horses. Each of these animals can interact with their environment in three different ways – by eating food, drinking water, and making noises.
How would you represent this situation using object oriented programming? The most straightforward way to do this would be to represent each animal in your simulation as an object that has an internal state that can be modified. For example, an animal’s internal state might contain a piece of data that represents how hungry the animal is. In addition to having an internal state, each animal would also have associated methods that can be called to represent interactions with their environment. In addition to having access to global information that is available in the environment, these methods would also have access to the internal data that is stored within the object. For example, an animal might have an eat_food function that decreases the amount of food available in the environment then modifies the internal state that represents how hungry the animal is.
What are the advantages of object oriented programming?
What are some of the main advantages of object oriented programming? To start us off, we will note that different languages implement the object oriented paradigm in different ways and therefore have different advantages. In this section, we will discuss advantages that are present in many common implementations of object oriented programming. We will focus in particular on advantages that are present in languages that are commonly used in data science such as Python.
- Track internal state. One advantage of object oriented programming is that it makes it easier to track internal state and enable functions to have different results based on the internal state of an object. This is a powerful property if you need functions to respond in a different manner based on the internal state of an object. This enables you to do things like create preprocessing pipelines that treat variables differently based on whether they are numeric or categorical.
- Enforce modularity. Another advantage of the object oriented paradigm is that it encourages the people who use it to write modular code. The objects that are introduced in this paradigm have clear boundaries that differentiate between properties that belong to one object and properties that belong to another. This strict enforcement of boundaries helps to enforce modularity.
- Share code via inheritance. Object oriented languages generally also implement a concept of inheritance that allows multiple different objects to inherit properties from a common ancestor. This makes it easy to share code between different objects that have similar properties, such as objects that have similar data that represents their internal state and objects that have similar methods that allow them to interact with their environment and update their internal state. This reduces the amount of duplicated code that exists across different objects. Returning back to the farm animal example that we laid out earlier in the article, if you were creating a program that included chickens, cows, and horses then you might create a single animal object that all of these different types of animals inherit from. These different types of animals would inherit any data or methods that were included in the general animal object.
- Represent the world in an intuitive manner. Another advantage of object oriented programming is that it is intuitive and can easily be used to represent the mental models that most people have of the world. This means that code that is written using an object oriented paradigm is often more intuitive and easier for people who are seeing the code for the first time to understand.
Should object oriented programming be used for data science?
Should object oriented programming be used for data science? The short answer is yes, object oriented programming is a powerful tool that can bestow powerful advantages to data scientists that make use of it. The longer answer to this question depends on what kind of data science team you are on and what types of tasks your team performs. While there are many cases where object oriented programming can benefit data scientists, there may be some data science roles where object oriented programming is not as beneficial. In the following sections, we will provide examples of situations where object oriented programming should and should not be used.
When to use object oriented programming for data science
When should you use object oriented programming for a data science project? Here are some examples of situations where it makes sense to reach for object oriented programming. Your code does not need to have all of these characteristics to be a good candidate for the object oriented paradigm. Rather, your code may be well suited for the object oriented paradigm if it has any of these characteristics.
- Codebase will continue to be used over time. If you are working on a codebase that will continue to be used and will continue to evolve over time, then you should consider using object oriented programming. This coding paradigm makes it easier to enforce structure on your code that will help to ensure that updates that are made to the codebase continue to follow the patterns that were laid out in the initial code.
- Codebase is complex or may reasonably be expected to gain complexity. If you are working with a codebase that is reasonably complex, or a codebase that you can reasonably expect to become more complex, then you are often best off using object oriented programming. The object oriented paradigm makes it easier to break complex code up into modular components that are in themselves less complex. This abstracts some of the complexity away from individuals who are making changes to the codebase and makes it more likely that those individuals can focus on the internal workings of the components they are modifying rather than understanding all of the complexity across the entire codebase.
- You work on a shared codebase. It is also a good idea to use an object oriented coding paradigm if you are working on a shared codebase with multiple other individuals. Code that is written using an object oriented programming paradigm is generally more modular and has firmer boundaries between components than code that is not. This makes it easier for different people to work on different components at the same time without needing to touch the same piece of code.
When not to use object oriented programming for data science
When does it make less sense to use object oriented programming in data science projects? Here are some examples of situations where you might not need to use object oriented programming.
- Simple, short term code that will not be reused. The main situation where it might not make sense to use object oriented programming for a data science project is if you are creating simple, short term code that will not be reused. You might write this type of throwaway code to create a prototype for a concept where you know you will have to reimplement the code if you decide to take the prototype further. You may also implement this type of code if you are completing a truly ad hoc analysis that will be used once then discarded.