Are you wondering what a data scientist does? Or maybe you are wondering what the difference between a data scientist and a data analyst is? Well either way, we’ve got you covered! In this article we tell you everything you need to know to understand what the average data scientist does and how their job differs from other data roles.
We start out with a discussion of what a data scientist does. This includes topics such as what the responsibilities of a data scientist are, what tools data scientists use, and how much time data scientists spend in meetings. After that, we discuss the difference between data scientist roles and other data roles such as data analyst roles and machine learning engineer roles. Finally, we finish off with some examples of projects that a data scientist might work on.
What does a data scientist do?
So what does a data scientist do? Job titles and responsibilities will vary from company to company, but in general the data scientist should be the person on the data team that specializes in statistical analysis and machine learning. They should have strong knowledge in at least a few areas of statistics and machine learning. They should also have sufficient software engineering skills to implement reliable jobs that use models to make predictions in batch.
A data scientist generally focuses on a few long term projects where they implement statistics and machine learning to provide insights that drive business decisions or produce data products that utilize model predictions. In most cases, the data scientist will be involved for the full lifecycle of the project, or at least the full lifecycle up until the model needs to be put into production. This means that a data scientist will need to meet with their business stakeholders early and often to ensure that they are on the same page about the goals and deliverables of the project.
Responsibilities of a data scientist
What are the main responsibilities of a data scientist? Here are some common responsibilities that might fall under a data scientist’s jurisdiction.
- Work with business stakeholders to flesh out project requirements
- Clean data and generate reusable features
- Research machine learning algorithms and statistical models
- Train and evaluate machine learning models
- Productionalize offline batch-scoring models
- Design and analyze complex experiments (such as like factorial experiments)
- Use causal inference to analyze observational data and more nuanced experiments
- Design simulations to answer key business questions
- Present the results of their work to both technical and nontechnical audiences
How do data scientists spend their time?
How much time do data scientists spend on ad-hoc requests?
How much time do data scientists spend working on ad hoc requests? And how does this compare to the amount of time they spend working on long term projects? In general data scientists should spend most of their time working on long term projects. They may spend some of their time working on quick proof of concept projects to evaluate the impact of potential models, but the expectation should be that the more promising ideas will roll over into longer term projects.
Data scientists may spend some of their time working on ad hoc data requests, but these types of requests should only take up a small portion of their time. Data scientists that build models and pipelines that are regularly used to transform data should also expect to spend some of their time on maintenance work to keep these pipelines up and running.
What tools do data scientists use?
What tools do data scientists use? Here are some of the most common tools that data scientists use.
- Python & R to clean data, transform, and analyze data
- Spark to clean data, transform, and analyze data (for particularly large datasets)
- SQL to pull data from a database and transform it
- Git to version control their code
How much time do data scientists spend in meetings?
How much time do data scientists spend in meetings? The amount of time that data a scientist spends in meetings will vary from company to company. In general, a data scientist would expect to have at least a few meetings with other data scientists per week. They should also expect to have at least one check in with the business stakeholders for each of their projects per week. The number of check ins with business stakeholders will generally be higher in the beginning of the project when the requirements of the project are just being fleshed out.
The exact amount of time that a data scientist spends in meetings will also vary depending on the level of the data scientist roles. In general, a junior data scientist should expect to spend less time in meetings than a senior data scientist.
Who do data scientists work closely with?
Who do data scientists work closely with? Here are some examples of stakeholders a data scientist might work closely with.
- Other data scientists. Many data scientists will spend the majority of their time in meetings with other data scientists and data science managers. Teams of data scientist will often have regular check ins to provide updates on the status of their projects and discuss technical concepts. Data scientists may also meet with other data scientists to collaborate on projects, mentor, and be mentored.
- Product managers and other business stakeholders. When they are not meeting with other data scientists, data scientists generally spend the majority of their time in meetings with product managers and other business stakeholders. This is because they need to make sure they understand the context surrounding the projects that they are working on and ensure that everyone is on the same page surrounding project deliverables, timelines, and expectations.
- Data analysts and data engineers. Data scientists sometimes meet with data analysts and data engineers as well. In the beginning of a project, they might meet with these stakeholders to gain a better understanding of the data that they are working with. They may also collaborate with analysts on projects, especially in the early phase of the project when there is a lot of exploratory work.
- Machine learning engineers. Data scientists also meet with machine learning engineers who help to productionalize the models they are working on. This is especially true if they are working on models that need to be encapsulated in APIs.
Types of data scientist roles
What are some different types of data scientist roles? Here are a few common types of data scientist roles.
- The generalist. A generalist is a data scientist that is familiar with a wide variety of statistical and machine learning methodologies. A generalist may have one or two areas that they have particularly deep knowledge of, but in general the breadth of their knowledge should exceed the depth. Generalists may be expected to pick up new methodologies and skills as needed for the job. Generalists generally use methods that have been developed by other data practitioners rather than developing their own methods.
- The specialist. A specialist is a data scientist that specializes in a specific type of data science problem. In some cases, a specialist will specialize in a certain type of machine learning methodology such as causal inference, natural language processing, or time series methods. In other cases, a specialist might specialize in a particular application area such as marketing, supply chain, or finance. Much like generalists, specialists generally utilize methodologies that have been developed by others rather than developing their own methodologies.
- The researcher. So who develops novel methodologies that can be used by other data scientists? That would be a researcher. Many of such researchers are stationed at universities and academic institutions, but some industries do employ data scientists that focus on research and the development of new methodologies.
How do data science roles compare to other data roles?
Differences between data scientist roles and data analyst roles
How do data scientist roles differ from data analyst roles? Data scientists tend to use more advanced statistical methods and machine learning algorithms than data analysts. For this reason, the projects that data scientists work on tend to have longer timelines than projects than data analysts work on. Since data scientists tend to work on a few long term projects rather than many short term projects, data scientists tend to spend less time in meetings than data analysts.
Data scientists are also expected to have stronger software engineering skills that data analysts. This is especially true if the data scientist is working on models or pipelines that produce data that is used in production.
Differences between data scientist and machine learning engineer roles
How do data scientist roles differ from machine learning engineer roles? Machine learning engineers tend to spend more time working on data platforms and infrastructure than manipulating data. Machine learning engineers tend to work on platforms that data scientists can use to train models and serve their models predictions.
Some machine learning engineers do implement machine learning models, but machine learning engineers do not generally have as strong backgrounds in statistics as data scientists so they tend to implement straightforward models that solve simple problems. While data scientists are expected to have stronger statistical backgrounds than machine learning engineers, machine learning engineers are expected to have a stronger software engineering background than data scientists.
Examples of data science projects
- Lead scoring. A data scientist might work with the sales team to build a model that scores their leads to determine which leads are most likely to convert to paying customers. Such a model can be used to inform the sales team of which leads they should be going after and where they should be prioritizing their time.
- Flagging offensive language. Another example of a project a data scientist might work on is flagging offensive language in user generated text. These flags can be used to remove content that contains offensive language or surface it to content moderators.
- Forecasting. Data scientists are often involved in a wide variety of forecasting exercises including revenue forecasting, demand forecasting, lead forecasting, traffic forecasting, and more! These exercises are used to plan for the future and ensure that metrics are tracking with their expectations.