Are you wondering what details you should include in a data science design document? Or maybe you are more interested in learning about what the main purpose of a data science design document is? Well either way, you are in the right place! In this article we tell you everything you need to know about design documents for data science projects.
First, we take some time to explain what a data science design document is and why it is important to write a data science design document for a project. After that, we discuss when you should write a design document for a data science project. We follow this up with a more detailed explanation of what should and should not be in a data science design document. Finally, we provide tips on how to write a strong data science design document.
What is a data science design document?
What is a data science design document? A data science design document is a document that lays out the technical details of how you will implement a data science project. It might include details such as the components you will build out, the tools you will use to implement those components, and the decisions you will make when faced with technical tradeoffs.
A technical design document should serve as a blueprint that you can follow to build your project. Someone who is not familiar with your project would be able to look at the document and have a clear understanding of what they need to build and how they should build it.
Why should you write a data science design document?
Why should you write a data science design document? Here are some of the main benefits of writing a data science design document.
- Build faster. A data science design document serves as a blueprint you can follow when you build out your project. This blueprint enables you to move faster because you will not have to stop building to consider technical tradeoffs. Investing a little time up front to create a clear, easy to follow plan can often save you a lot of time down. If you have a clear plan to follow then you will not have to wade through ambiguity and context switch as you build.
- Spot roadblocks ahead of time. Another benefit of creating a technical design document is that it makes it easier to spot roadblocks ahead of time. If you are able to identify roadblocks before you start building, it will reduce the amount of rework you have to do once you start building. It is painful to put weeks of work into building, just to find that the solution you have been building is not viable. Creating a strong design document and getting broad feedback on it helps you to avoid this fate.
- Ensure alignment with stakeholders. Writing a technical design document is also a great way to ensure that you are aligned with technical stakeholders. This is particularly important if you are working on a project that other teams will have technical dependencies on. Document reviews give stakeholders the opportunity to provide feedback before you spend any time building a solution that does not meet their needs. This reduces the amount of rework that needs to be done.
- Document important decisions. Technical design documents also serve as great sources of information that others can refer back to in the future. If you write a clear, opinionated technical design document up front then you may not have to write much additional documentation when you are done with your project.
When should you write a data science design document?
At what stage in the project lifecycle should you write a technical design document? In general, it is best to wait to write a technical design document until you have had some time to do some exploration and discovery. You should also wait until you have had time to explore different approaches you might use to solve a problem and weigh those approaches against one another. You should write a technical design document when you have achieved clarity on what approach you will take to solve your problem and what tooling you will use to implement your approach.
If you follow our recommended data science project lifecycle framework, then you should wait to write the technical design document until the end of the exploration stage.
What should be in a data science design document?
What should be in a data science design document? Here are the main topics that should be covered in a data science design document.
- Brief summary of the problem you are solving. The document should start with a brief summary of the problem that you are aiming to solve. Aim to limit the length of this section to 2 or 3 sentences.
- Brief summary of solution you will build. After stating the problem, you should include a brief summary of the solution you will build to solve the problem. Aim to limit the length of this section to 2 or 3 sentences.
- Technical background. Next, you should include any technical background that is necessary to understand the approach you will take and the reason you chose that approach. If you are using uncommon terminology then you should define this terminology up front. If there are technical constraints imposed by the tooling you are using or any other dependencies you have, this is a great place to call these out. If you are following patterns that were used in another project, explain this. The length of this section will vary depending on how much background is required to understand your solution.
- System design diagram. The next thing you should include in your technical design document is a system diagram that shows the components you will build and the way they will interact. This can be as simple as a few labeled boxes that have arrows pointing between them. If you are using different tools to build out different components, make sure to specify which tool will be used to build each component. This diagram should also include information about external dependencies your project has. Someone who is not familiar with your project should be able to look at the system design diagram and understand how data will flow through the system.
- Key decision points. You should also write out a list of the key technical decisions that you made when putting together your design document. For each decision point you list, you should include information about the alternatives that were considered and the benefits and drawbacks of each alternative. Your reader should be able to understand why you made each decision you made.
- Output. Finally, your technical design diagram should include a clear and detailed depiction of what the output of your project will be. If you are creating a dashboard, you might include a low fidelity sketch of the dashboard and the charts that will be in it. If your end product is an API then you might include a contract that shows what your API output will look like.
- Additional resources (if applicable). If applicable, you should also include links to other documentation that provide useful documentation or context on your project.
What should not be in a data science design document?
What information should you avoid including in a technical design document? Here are some examples of information that should not be included in your technical design document.
- Impact sizing and project justification. By the time you reach the stage where you are working on a technical design document, you should have already achieved alignment on what problem you are solving and why it is important. That means that you do not need to include a justification for why it is important to solve that problem. You may, however, link out to a project proposal document that contains more information about the impact you expect your project to have.
- Detailed timelines. You do not need to include detailed timeline information in a technical design document. Detailed timeline information is best reserved for a separate roadmap document. One reason for this is that your timelines will depend on the design decisions you make, so you will not be able to give detailed timeline information until you have aligned on a technical design.
- Tickets. A technical design document need not contain links to specific tickets that will be used to track the completion of individual tasks. Like detailed timelines, you are best off waiting to create these tickets until your technical design has been aligned on.
- Milestones. If you plan to deliver value iteratively by breaking your work up into multiple deliverables that can be delivered sequentially, we recommend reserving the breakdown of these milestones for a roadmap document as well. Information about the different milestones that you will deliver is most at home next to information about your timelines and tickets.
How to write a data science design document
What steps should you take to write a data science design document? Here are the steps we recommend taking to write a data science design document.
- Sketch out your output. We recommend starting out by sketching out what your output will look like. Once you have an idea of what you want your output to look like, it is easier to work back and list the steps you will need to take to create that output.
- List the tasks you need to complete. After you sketch out your output, you can start to put together a list of tasks you will need to complete in order to create that output. Feel free to go into as much detail as you want at this step. You can go back and group similar tasks into high level components later. As you work through your list of tasks, make sure to mark tasks that there are open questions around. What tool will you use to complete that task? What approach will you use to complete that task? Is that task actually necessary?
- Group tasks into components. After you write down the list of tasks you will need to complete, it is time to group these tasks into higher level components. For example, each component might represent one scheduled job you need to create or one module that you need to build out. If the tasks you wrote down were fairly high level, you might need to group steps together at all. If the tasks you wrote down were more in the weeds, you will likely need to complete this step.
- Draw a system design diagram. Draw a system diagram with boxes that represent each of the high level components and arrows showing how data will flow from one component to another. For each component, you should note what technology will be used to build out that component. You may do this by color coding the boxes that represent the different components.
- Mark decision points. Go back through your system diagram or task list and highlight any key decisions you will need to make as you build out these components. Think about what technologies you will use and what approaches you will take to achieve different tasks. Make sure to look back at the notes you created as you were listing out the tasks.
- Evaluate alternatives. For each decision point you marked in the previous step, list out the different approaches you can take. Include information about the benefits and drawbacks of choosing each approach. After that, choose the approach you intend to use when you build out your project. Explain what approach you are going to use and why.
- Fill in additional details. At this point, you should have most of the meat of your technical design document filled out. Go back and fill in the remaining details to complete the document.
Tips for creating a strong design document
Are you looking for some tips for creating a strong technical design document? Here are some tips for creating a technical design document that will be easy for your peers and stakeholders to ingest.
- Favor diagrams over text. Images and diagrams often convey technical thoughts much more clearly than words and paragraphs. When given the choice, you should favor diagrams over text.
- Favor bullet points over paragraphs. Similarly, you should break long blocks of text up into bulleted or numbered lists when possible. It will make it easier for your stakeholders to skim through your document and return back to specific sections they are interested in.
- Get feedback early. As with most things, technical design documents that have gone through multiple iterations are often stronger than documents that have not been written with peer feedback in mind. We recommend asking for feedback on your technical design document early and planning to iterate on your document multiple times.
- Data science project lifecycle
- Data science project proposal documents
- Getting feedback on data science projects
- Data science project backlogs