Are you wondering whether you should work on a data science side project to enhance your resume? Or maybe you have already decided that you want to work on a side project, but you are looking for advice on what type of project you should pursue? Either way, we have the answers that you are looking for! In this article, we discuss everything you need to know about data science side projects and the role they play in enhancing your resume.
We start off by explaining why data science projects are useful for resume building. After that, we walk through the steps you need to take to build out your projects and give pointers on where to focus your attention. Finally we discuss what types of applicants benefit most from having data science side projects on their resumes. The advice provided in this article is broad enough that it is applicable for all data professionals ranging from data analysts to machine learning engineers.
Why work on data science side projects
- Add new skills to your resume. The first reason that you should work on data science side projects and build out a data science portfolio is to learn new skills. Are you an analyst who primarily works in R but is looking to transition to Python? Are you a data scientist who wants to be able to put time series analysis on your resume? There is no better way to learn new skills than to dive in and get hands-on experience. Once you feel comfortable with the new tool, you can add it to the skills section of your resume.
- Demonstrate competencies with real examples. Beyond just being able to add new skills to your resume, the main reason that having side projects listed on your resume is impactful is because you can provide actual code and documentation that proves that you do have the skills listed in your resume. Providing links to complex Python projects you have created with real code is much more persuasive than just saying that you would rate yourself as an advanced Python coder.
- Prove that you are an independent learner. Finally, having side projects on your resume demonstrates that you are able to learn independently and you are eager to learn new skills. These are qualities that hiring managers look for, particularly in more junior candidates and career changers.
Data science competencies for resumes
So what kind of competencies can you demonstrate on your resume using data science projects? Here are some examples of competencies you can demonstrate using side projects.
- Data analysis & visualization. The first competency that data science projects and portfolios can help to demonstrate is general data analysis and data visualization skills. If you want to focus on this competency, you should focus on defining good metrics, checking data integrity, and creating beautiful plots that make complex concepts easy to digest.
- Machine learning & statistics. A second competency that you can demonstrate by including data science projects on your resume is machine learning and statistics. Whether you want to demonstrate your proficiency in hypothesis testing or learn more about deep learning, all you need to do is choose an appropriate dataset and code up an analysis. If you are looking for a little bit of a challenge, try working on a project that involves time series, network, text, or image data.
- Software engineering. A third competency you can demonstrate with data science projects is software engineering skills. If you want to show off your software engineering chops, you do not necessarily need to work on a project that involves complex machine learning models. Just focus on writing well structured, modular code that is version controlled and well tested.
- Languages & tools. Finally, if you want to demonstrate your proficiency with a certain language or tool then you can do that with data science projects on your resume. Some common examples of tools that you can demonstrate your proficiency in with data science projects are Python, R, Java, Spark, SQL, Git, Mlflow, Docker, Flask, Pytorch, Tensorflow, AWS, and CI/CD tools.
Building data science projects for resumes
What steps do you need to go through in order to create a data science project for your resume? Here are the steps you need to go through to build a data science project for your resume.
- Decide what competencies to focus on. This is probably the most important step of the process. Before you work on a data science side project for your resume, you should make sure to decide what specific competencies you want to demonstrate with your project. Most people do not put much thought into this step of the process, but the competencies you choose should inform the dataset that you choose and the type analysis you run, not the other way around.
- Find a dataset. After you decide what competencies you want to focus on, you should find a dataset to use for your data science project. You should choose the dataset you use for your project based on the competencies you want to demonstrate. Here are some examples of characteristics you should look for in a dataset based on the differentiations competencies we listed before.
- Data analysis & visualization. If you want to demonstrate your competency in data analysis and visualization then you are better off picking a real world dataset that is not perfectly clean. This way you can demonstrate your ability to identify issues with data quality and clean data. You should also think about what visualizations you might want to produce and choose your data set accordingly. For example, if you want to create a heat map that shows geographical trends in data then you should make sure to choose a dataset with geographical variables.
- Machine learning & statistics. If you want to demonstrate your capabilities with machine learning and statistics, then you should think about what kind of modeling you want to do. If you are new to the field, then we recommend choosing a tabular dataset that has simple numeric and categorical variables. If you do choose to work with a tabular dataset, we recommend choosing a real world dataset that needs some cleaning. If you have already done a project with tabular data and want to learn something new, you can look for unstructured data like text or image data.
- Software engineering. If you want to shop off your software engineering skills then it is not as important to find a messy dataset that needs a lot of cleaning. In fact, it may be better to use a clean dataset so that you can focus more of your effort on writing clean code and using model deployment tools.
- Languages & tools. If you want to show off your proficiency in a specific language or tool, the type of dataset you want will depend on the kind of tool you want to use. If you want to show off your proficiency using Python and pandas to manipulate data then you should choose a messy real world dataset. If you want to get practice using flask for model deployment then you are in the clear to use a clean, pre-sanitized dataset.
- Find a question to answer. After you choose the dataset you want to work with, you need to find a question to answer with your data. Again, the competencies that you are focusing on should inform the type of question you want to ask. If you want to demonstrate your competencies in software engineering or a process-related tool then the question you ask is not as important. In this case, it is okay to use a dataset that has an obvious question associated with it and just answer that obvious question (ex. the titanic dataset where the obvious question is whether a passenger lived or died). If you want to demonstrate your competency in data analysis or modeling tabular data, you should try choosing a unique question that you thought of yourself. This demonstrates that you have the data awareness to be able to look at a dataset and determine what interesting questions can be answered with that data. The question you choose should provide valuable and actionable insights to either yourself or a hypothetical company that might work with this kind of data.
- Analyze the data. After you choose a question to answer, it is time to analyze the data and answer your question. This step will look different for every project so we will not go into too much detail here.
- Document your process. After you have answered your question, you should document your process. This is a step that is sometimes overlooked, but it is very important. Hiring managers will not spend a long time looking at your personal projects, so it needs to be clear to them from a glance what each project is and what competencies you are trying to prove. At bare minimum, you should write up a short introduction that clearly states what dataset you are using, what question you are answering, why the answer to that question provides value (if applicable), and what competencies you are demonstrating with this project. Do not just assume that hiring managers will browse through your project and see that you are trying to demonstrate your proficiency in a certain area. Specifically stating the competencies you are trying to demonstrate will help them determine what parts of your code and analysis to focus on.
Who are data science projects most useful for?
Having data science projects on a resume will be more helpful for some types of candidates than others. So what groups of people can benefit most from having data science projects on their resume?
- Junior candidates. Data science projects on resumes are generally most helpful for junior to mid level candidates where there is more of an emphasis on technical skills and execution. As candidates become more senior, there is more emphasis on interpersonal skills that are not as easy to demonstrate with data science projects on resumes. Additionally, more senior candidates are likely to have more work-related projects on their resumes that they can talk about so they do not benefit as much from having side projects on their resumes. This is not to say that data science projects are not useful for more senior candidates, especially candidates that are aiming to demonstrate highly specialized skills. Junior and entry level candidates that do not have many work-related projects on their resumes will just get more bang for their buck.
- Career changers. Data science projects on resumes are also useful if you are in the process of changing careers or fields. Even if you are just trying to make a small jump from an analytics role where you mostly work on reporting and metric definition to a role that involves more machine learning and modeling, side projects can provide you with valuable hands-on experience with new tools that you may not have the opportunity to use at your day jobs.
Where to display data science projects
Where should you display your data science projects after you have completed them? Here is some advice on where to display your data science projects.
- On your resume. Of course if you are working on data science projects with the intention of enhancing your resume, you should display your data science projects on your resume. In general, we recommend having a separate section for side projects called something like “personal projects” rather than lumping your projects into a general experience section. But how much room should you dedicate to personal projects? That depends on what previous experience you have and whether you have work-related projects that demonstrate your data science skills. If you do not have many work-related projects to show off, then you can include a few bullet points per project for the personal projects on your resume. If you have a few work-related projects and you are not changing fields then we recommend only including one high level bullet point per project to leave more room for your work projects.
- Github. Beyond listing your projects on your resume, you should also make your code available in a publicly available repository. The easiest way to do this is to upload your code to GitHub. Along with your code, you should upload a file that describes your project and what its goals were.
- Personal website. If you have a personal website, then you may choose to make your code and documentation available there rather than on GitHub.
Tips for data science projects on resumes
What other tips do we have for creating data science projects for resumes? Here are all of the points we haven’t touched on.
- It is okay to use school projects. If you are an entry level candidate, it is okay to use projects that you completed in school in your portfolio of data science projects. You already did the work, so you might as well reap some of the rewards.
- Navigation and documentation need to be clear. If you are including a link to a public GitHub profile that has a lot of repositories, make sure it is clear which repositories you want hiring managers to look at. Make sure to highlight those repositories and include README files that clearly describe the project and its importance.
- Quality over quantity. As with many things in life, you should aim for quality over quantity when you are working on data science projects for resumes. You are better off having one clean, completed, well documented project than a handful of half-completed projects with no documentation. Consider setting GitHub repositories containing half-completed projects to private when you are applying to jobs.
- Emphasize data over models. Even if you are working on projects to demonstrate your competency in machine learning and statistical modeling, you should spend more time focusing on your data than your models. For most jobs, you are better off using a simple, stable model that can be easily maintained than using a more complicated model that has 0.1% better accuracy. Let your projects reflect this type of thinking. And even if tiny increases in accuracy are to be desired, there is often more to gain from adding new data and features to your model than testing hundreds of parameter combinations.