Generally speaking, there are 2 kinds of data scientists in top technology companies, such as a FANG (Facebook, Amazon, Netflix, Google) in Silicon Valley. Here is how I would separate them into these 2 camps.
1. Data scientists: Analytics/Inference track
Those are people whose daily jobs typically involve collecting data, running analytics and experiments, sharing reports and scorecard on product health and customer churn, and marketing campaigns performance.
After data analysis, they are often asked to develop innovative ideas and proposals to improve product features and validate those new ideas using techniques such as a/b testing.
In summary, those people use data to ‘tell a story’ and to drive business decisions.
Most of them come from statistics, mathematics, economics, psychology, physics, or other quantitative but non-computer science background.
The salary range for this track usually is not as good as a software engineer or a machine learning engineer, typically 15%–20% lower, but the advantage of this track is that it’s much easier to get into or find a job
It can also be a great cornerstone for those interested in getting into a machine learning track later on.
2. Data Scientists: Machine Learning Engineering/Algorithm Development track
People who have been very successful in this track are usually hardcore computer scientists or software engineers. They understand basic or even advanced machine learning theories and implement ideas and make things happen.
The biggest unique advantage for those machine learning engineers or algorithm developers is they can quickly convert ideas into a prototype and create production-level source code that efficiently implements machine learning models into production, or external customer-facing environments, thanks to their computer engineering background.
The machine learning engineer track’s salary is at least on par with the software engineer track, if not much higher. The bar to get into the machine learning engineer track is high. It usually requires a good understanding of machine learning theories and practices but also solid software engineering skills
Table 1: skills comparison for 2 data scientists tracks: Analytics/Inference vs. Machine Learning/Algorithm
This article will focus on the analytics/inference track and walk you through my 7 steps to prepare for a data scientist job interview.
SQL is a must-know programming language for any data analytics professionals
However, many college graduates or young professionals are starting their job search without a solid understanding of SQL or are struggling with coding questions — which ultimately costs them their dream jobs.
The SQL interview can bear other names such as Technical Analysis; during an interview at a FAANG company, you will be asked to perform a series of SQL operations to extract data and insights and answer follow-up questions about their products
(*) FAANG: Facebook, amazon, apple, Netflix, and google
Given a table of user sign-up dates and their registered countries, write a query to produce the number of newly joined daily users in the last 30 days by our top 2 countries
1. user_id |BIGINT
2. joined_at | DATE
3. country | VARCHAR
How to prepare
a. If you are an absolute beginner:
Consider taking an online SQL course to understand SQL and then jump into coding practices.
A resource to consider: Cracking the SQL Interview for Data Scientists to learn SQL basic SELECT statements to advanced WINDOW functions step by step, with many coding assignments to reinforce your learning.
b. If you are an experienced SQL user:
There is no better way to prepare for a SQL interview than practicing coding exercises
A resource to consider: sqlpad.io, where you can practice and solve 80 SQL coding interview questions.
A resource to consider: sqlpad.io, where you can practice and solve 80 SQL coding interview questions:
c. Pay special attention to WINDOW functions.
WINDOW functions are a family of SQL utilities that are often asked during a data scientist job interview.
Writing a bug-free WINDOW function query could be quite challenging for any candidates, especially those who just get started with SQL. It takes time and practice to master those functions.
2. Product Sense
One of the data scientists’ main responsibilities is to extract insights from data and work with product managers and engineering teams to deliver actionable plans to improve the product. Think about how you would measure the success of different parts of the product. Why do you think the placement of the text box is at that specific locations? What can you do to improve it?
- If revenue dropped in a given week, what metrics would you look at to understand and why?
- How would you measure the health of our product search functionality?
How to prepare
- I highly recommend going through this book Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series), which gives you a perfect sense of how startup companies use analytics to drive their product decisions. Top technology companies, especially those in Silicon Valley, regardless of their sizes, tend to think of themselves as still a startup, at least with a startup mindset in growing the company.
- If you still have time, consider reading this book: Cracking the PM interview. If you are short on time, I will go through those 3 chapters: product, case studies, behavior question
3. Data processing with Python/R
The interviewer will evaluate your skills in basic operations in Python/R, 2 of the most popular programming languages, in most data science teams in Silicon Valley
The bad news is that you will most likely not even get a chance for a phone interview if you are not familiar with neither of the two languages
The good news is that you don’t actually need to know both of them. Pick either one and become very good at it. Build a project using either R or Python.
A side note: from my observation, it is highly likely Python will become the dominant player because of its great ecosystem. It’s a general programming language and much easier to productionize and serve a python model on the internet, comparing to R.
If you are brand new to either R or Python and choose a language to start with, I would pick Python.
I used to be a heavy R user and have presented at useR!, but I completely switched to Python 5 years ago and never regretted it.
In addition to basic data processing, you will very likely be asked to perform a series of analytics, visualization, or modeling with the data sets to make sure you will be hands-on with the tool and get a sense of your experience level
Read a CSV file into Python/R, handle missing data, build and train a classification model, evaluate its performance, and prepare a report and share the Jupiter notebook with the interviewer.
a. For Python people
After you familiarized yourself with basic data processing, you can jump onto sci-kit learn libraries which have some excellent tutorials including data processing, feature selection, and modeling with real data.
4. A/B testing
A/B testing is a statistical framework that helps validate an idea or a theory through data.
For example, a product manager wants to know if changing the color of a buy button from green to blue can encourage more purchases. As a data scientist, it is your job to work with the product manager and, quite often, the engineering team(can help implement the testing settings) to develop a testing plan.
You need to decide at least how many people will see the different colors of the button (sample size), and how many days will the testing run (usually multiples of a week, 7 days), and where should it be running (the US only, or some other small countries just in case testing group is a failure, you don’t want to have a very negative impact to the revenue).
The key assumption of A/B testing is that the control group and the testing group have to be independent. You will probably be asked several questions around this assumption.
You will also need to understand key concepts such as novelty effect, learning effect, A/A testing, Simpson’s paradox, etc.
The engineering team just invented a people-you-may-know widget. If it is implemented, a user will see their friends on the right-left corner of their homepage. How do you design an experiment to decide whether we should launch this feature or not?
How to prepare
Udacity has a free introduction class taught by practitioners from Google, which I highly recommend. As long as you get yourself through this class and feel comfortable with key concepts, and finished the home assignments, you should handle most of the A/B testing related questions
A side note: very often, you will be asked to make recommendations based on different scenarios, e.g., if the results are significant, what should the product marketing team do, and vice versa.
To answer this question, always use a framework, for example: if it is confirmed significantly positive, double down on this approach, expand this success story to other markets and repeat the test.
If it turns out the results are not significant or significant but negatively, come up with new theories and start testing new ideas.
It’s a never ending new ideas/proposals => A/B testing => recommendation cycle 😃.
5. Statistics/Statistical Inference
As a data scientist, you will most likely encounter many situations happening in the real world, for example, missing data, unbalanced samples, how to decide sample size, perform hypothesis testing, form reasonable assumptions, explain to your business leaders what significance interval means. Therefore statistics skills are necessary to ace a data scientist interview.
- What is Type I and Type II error, how do you explain p-value to a non-technical people? What are the assumptions for 2 sample t-test?
You can practice statistics questions on brilliant.org, which I found it quite easy to brush up my skills in preparing statistics interview questions quickly
Side note: probability questions are not the same as statistics questions. You can think of probability questions are more about math, while statistics questions are more about dealing with real data.
For 2 fair dices with marks 1–6, how many times on average we have to roll, so the sum of the two dices ends up greater than 10?
brilliant.org is a good resource
7. Behavioral questions
Behavior questions are probably the easiest part to prepare that has the most ROI (return on investment), but many people spend very little time on this and get caught off guard with questions like tell me a time when you disagreed with your boss.
- Tell me about your biggest failure/success/favorite project.
Describe an unpopular decision you made with the product team. How did you handle the situation and implement it?
List your past 5 projects with interesting stories using the SAR framework (situation, action, and results) to demonstrate your leadership, successes, failures/mistakes, challenges(disagreement with your manager, coworker).Find a partner and practice through a mock interview and get their feedback.
The important thing is that your stories have to be ‘meaty’, and be prepared when an interviewer dive into the details.
Another resource to consider is amazon’s top leadership principle.
Those are the 7 areas I recommend to focus on interviewing analytics/inference track data scientist positions
I hope they are useful, and if you have any questions, please feel free to reach out to me.
Tuğçe Daltekin from Berlin ^_^