Are you looking for recommendations on how to structure the lifecycle of a data science project? Or maybe you want to hear more about the deliverables you should produce at each stage of the project lifecycle? Well either way, you are in the right place! In this article we lay out a framework for how to structure the lifecycle of a data science project. This framework is applicable for a variety of data science projects ranging from projects that focus on building dashboards to those that focus on deploying machine learning models.
First, we will talk about exactly what we mean when we talk about a data science project lifecycle. This will give you a better idea of what content you should expect later in this article. After that, we will discuss the main benefits of using the framework laid out in this article. Finally, we lay out the different stages that make up the data science project lifecycle and describe the deliverables that are produced at each stage.
What is the data science project lifecycle?
Before we provide recommendations on how to structure the lifecycle of a data science project, we will first describe exactly what we mean by the data science project lifecycle. When we talk about the data science project lifecycle, we are talking about the set of steps or stages that one progresses through in order to complete a data science project.
Each stage in the data science project lifecycle has a distinct deliverable that should be produced and shared with stakeholders. A stage should not be considered completed until that deliverable has been produced. Depending on what stage you are currently in, that deliverable might be a functional data product or a document that provides crucial context about your project.
Why should you follow this data science project lifecycle?
Why should you follow the data science project lifecycle we recommend in this article? Here are just a few of the many benefits of using this lifecycle structure.
- Achieve better alignment with stakeholders. The first reason that you should follow this data science project lifecycle is that it will help you ensure that you are aligned with your stakeholders. This project lifecycle is designed to ensure that you attain early alignment on the problem you are working on and the constraints that the solution needs to adhere to. There is nothing worse than putting weeks (or months) of effort into a project, just to learn that you have built the wrong thing and your solution is not usable. This project lifecycle will help you avoid that.
- Demonstrate progress through milestones. The next reason that you should follow this data science project lifecycle is that it enables you to display the progress you make while you work through a project. Data science projects tend to run long and it can be difficult for stakeholders to understand why projects take so long to complete. By following this project lifecycle, you can break your workflow up into a distinct set of deliverables that can be delivered to your stakeholders to give them confidence that you are making progress towards completion.
- Discover missteps before they happen. The next reason that you should follow this data science project lifecycle is that it enables you to discover roadblocks early and plan for them. This is partially because there are multiple deliverables produced along the way that allow you to get early feedback on your ideas from other team members or other technical teams. If you are able to discover roadblocks and risks early in the project lifecycle, you will be able to plan for them and will not have to do rework when you get to the point where you actually encounter them.
Stages in the data science project lifecycle
What are the main stages in the data science project lifecycle? In the following section we will discuss all of the stages of the data science project lifecycle. For each stage, we will first discuss what the main purpose of that stage is and what activities you might perform during that stage. After that, we will describe what deliverable you should produce by the time you have completed the stage.
What is the main goal of the proposal stage?
The first stage in the data science project lifecycle is the proposal stage. In the proposal stage of the project, you should align on what problem you will be solving and why it is important to solve that problem. It is important to set aside dedicated time to ensure that you are aligned with your stakeholders up front before you start working on a project. This reduces the chances that you will spend valuable time working on a project just to realize that you are not building the right solution.
What is the main deliverable in the proposal stage?
The main deliverable in the proposal stage is a written project proposal document. Specifically, the deliverable is a proposal document that has been shared with stakeholders and adjusted based on stakeholder feedback. The proposal document should contain information such as the problem you will solve, the reason it is important to solve that problem now, the constraints your solution needs to adhere to, and the business metrics your project aims to move.
Make sure to translate the technical impact you expect your project to have into business metrics. Check out this article for more tips on how to translate technical impact into business metrics.
What is the purpose of the exploration stage?
After you align on what problem you are going to solve, it is time to enter the exploration stage. The main goal of the exploration stage is to conduct any investigation that is required to determine what approach you will use to build your solution. If you are considering multiple different approaches to solve a problem, this is a great time to build some scrappy prototypes to understand which approach will work better.
Setting aside dedicated time to explore different approaches you can take to solve a problem helps to reduce the amount of rework that needs to be done after you start building your solution. If you do not take time to explore the feasibility of different solutions up front, you might start building one solution just to realize that your current approach is not feasible and you have to scrap that work and go down a different avenue.
Here are a few topics that you might need to look into during the exploration phase.
- Data availability. Do you have access to the data you need? Do you need to implement instrumentation to collect the data you need? Do you need to investigate external data sources you might use to supplement your internal data?
- Data quality. Is the data you have accurate? How much data cleaning will you need to do before you are able to use the data? Is there another data source with better data quality?
- Tooling. What tooling will you use for each component you need to build out? Is there a clear path or are there multiple tools you need to decide between? Are there any additional technical constraints imposed by your tooling?
- Dependencies. Does your project have key dependencies on other projects? Are there other projects that will have dependencies on your project? Are there any technical constraints imposed by these interdependencies?
What is the main deliverable of the exploration stage?
What is the main deliverable of the exploration stage? The main deliverable that should be produced at the end of the exploration stage is a technical design document that outlines the path you will use to build your solution. This document should contain any technical context that is required to understand your solution, a system design diagram that outlines the components of your solution, a list of design choices that were made and alternatives that were considered, and a depiction of what your outcome will look like.
If you are working on a large project that will be delivered in multiple milestones or iterations then it might also make sense to create a roadmap document. A roadmap document should contain descriptions of the milestones you will deliver along the way, timelines for each milestone, and implementation details for tasks that will be completed to achieve each milestone. If your project does not have multiple milestones with concrete deliverables then you may not need a roadmap document.
What is the purpose of the build stage?
After you have determined the approach that you will use to build out your solution, it is time to move on to the build step. The main purpose of the build step is to build out the data product that you will deliver to your stakeholders. The type of data product you produce will vary from project to project.
If you are working on a large project with multiple different milestones that will be delivered, then you should focus on delivering your first milestone in this stage. If you are working on a smaller project that does not have multiple milestones then you might complete the full and final solution at this stage.
What is the main deliverable of the build stage?
What is the main deliverable of the build stage? The main deliverable of the build stage is a data product that can be used by your stakeholders. This might take the form of a raw dataset, a modeled dataset, an analytical dashboard, a simulation tool, a machine learning model, or a tool that can be used by other data professionals. There are many different forms that this data product can take.
What is the purpose of the measure stage?
Once you have delivered a data product to your stakeholders, it is time to measure the impact that your project has on the business. It is important to measure the impact that your product has on the business in order to demonstrate the value your team contributes. It is also useful to measure the impact each project has in order to understand what types of projects have had the largest impact. This can help you to prioritize future projects.
What is the main deliverable of the measure stage?
The main deliverable that you should have at the end of the measure stage is a number (or a collection of numbers) that represents the impact that your project had on the business.
If you are building an internal tool or product that will primarily service internal stakeholders, then we recommend tracking usage statistics. Solutions that are used by many people are generally more valuable than solutions that are not used by anyone. Usage statistics are useful for projects where the main deliverable is a dataset, dashboard, or internal tool.
Whenever possible, you should measure the impact that your data product has on key business metrics. There are multiple ways to do this, but the most common is to run an AB test to see how business metrics differ in situations where your solution is applied and situations where your solution is not applied.
What is the purpose of the evangelize stage?
While it is important to measure the impact that your projects have on the business, it is equally important to share those measurements widely and ensure that others in your organization are aware of the impact that your project has. The main purpose of the evangelize stage is to do exactly this – to share details about the impact that your project has on the company.
What is the main deliverable of the evangelize stage?
What is the main deliverable that you should have coming out of the evangelize stage? The main deliverable of the evangelize stages is an artifact that conveys the results of your project and the impact that the project had on the business. Ideally, this should be a long-living artifact that people who hear about your project can refer back to in the future.
Here are some examples of what a deliverable might look like at the end of the evangelize stage.
- A recording of a talk you gave on your project
- A standalone slide deck that explains your results
- A text document that explains your results
If you are working on a project that has multiple milestones that deliver incremental value then you are likely now at the stage where you have built out a deliverable for your first milestone and measured the impact of that deliverable. Now it is time to start working on the next milestone on your roadmap. Continue to iterate through the build, measure, and evangelize stage as each milestone is delivered to deliver incremental impact to the business.
- Data science best practices
- Getting feedback on data science projects
- Data science project proposal documents
- Data science design documents
- Prototypes for machine learning projects
- Why machine learning projects fail