How to approach data science problems and deliver solutions customers want
My first internship in 2017 was as a Software Engineer for an investment bank. As a Maths student with limited coding experience, I didn’t warm to the role immediately. Initially, I worked in a machine learning engineering team but I found I had much more of an interest in actually building the models that I was meant to be productionising. So I moved into a Data Scientist role.
Fast forward 5 years and I’ve come back to machine learning engineering for a variety of reasons but I take with me the lessons that I learned in my Data Scientist past.
You’ll notice that none of my takeaways involve necessarily technical skills. I don’t talk about understanding how every model works or recommend reading 10 papers a week. Instead, I focus on how to approach data science problems and how best to engage with your customer. So, here are my five top takeaways from working as a Data Scientist.
Machine learning and AI are buzzwords. We hear every day how these technologies are “transforming businesses!”, “optimizing processes!”, “automating blah blah!”. More often than not, those in leadership positions either read about AI and think it’s the magic solution to all their problems, or they’re pressured into adopting it just so they can say that they use it.
In my experience, I’ve found that the end user doesn’t really care if you’ve used a machine learning model or rules-based logic, so long as it solves their problem.
There are a vast number of use cases in which AI can and should be applied. But ML for the sake of it will likely be more expensive, take longer, and be more complex.
As a Data Scientist working in a consultant capacity, something which I heard time and time again was customers saying:
“we have this problem and we need AI to fix it”
Instead of taking the problem statement at face value and diving into the solution, you need to really understand the underlying issue. Sit down with the customer and ask as many questions as you can about the pain points they are experiencing. Sometimes the problem that you were presented with at the start isn’t the underlying root cause. Perhaps by tackling the root cause you avoid essentially patching a problem that is likely to reoccur in another shape or form.
You’ll have to spend a lot of time conducting exploratory data analysis before even thinking about implementing a solution. On that note, you should spend the time getting to know the customer’s data inside out; regularly present findings to your customer, ask questions (particularly if the data is in an unfamiliar domain) and confirm hypotheses.
Sure neural networks and all their flavors are cool, but sometimes all you need is a simple regression model. More complicated models take more time to build, need a greater level of expertise to both build and maintain, and it’s often more difficult to explain the results.
Jupyter notebooks are perfect for exploring data and conducting quick experiments. However, it doesn’t take long for these notebooks to get ridiculously big and confusing. If you find yourself rewriting functions in separate notebooks, sending code between team members, or thinking “hmm was that logic in duplicate notebook 3 or 4?” then it’s probably time to make a Python module and put those functions in there.
Helper functions that are used time and time again are often put into a utils.py file. Putting your code into Python modules is cleaner and it’s easier to see what’s changed when you update the functions and push code to a repository.
I’ve been guilty many times of getting excited about the underlying model and how it works and then wanting to share this with stakeholders. But it’s imperative that you consider your target audience. Most customers probably don’t want to understand how that LSTM works under the hood and you shouldn’t waste both your and their time explaining it (unless they really want to know!).
When working as a Data Scientist and presenting insights and results to stakeholders I like to think that I’m telling them a story. Talk about the wider business context, and remind them of the problem you are trying to solve and why. Emphasize the business value of solving this problem and constantly refer back to the problem statement as you walk them through the steps of your solution.
To summarise, these tips are meant to help guide you through the whole data science journey. Determine if the problem needs AI, uncover the root cause problem, design a simple solution, keep your code organized as you write this solution and finally deliver this to your customer in a meaningful way.