Keep these simple pieces of advice handy to help you succeed as a data scientist
You made it.
You graduated/completed a bootcamp/finished a certificate/taught yourself, and now, finally, you’ve become a data scientist. After many exhausting months or years, you can officially call yourself a data scientist and begin applying for data science jobs.
Your hard work and effort have been worth it and you can now take those skills and begin making an impact in a company.
Now, though, it’s time to absorb just a little bit more information. Your brain is packed, but this community of data scientists with years of experience under their belt has something they want to share with you: single-sentence job advice for new data science graduates. Here, you will find 10 small pieces of advice to always keep in your back pocket that will help you succeed in your career as a data scientist.
When 500 new data scientists apply for the same entry-level position, what do you think sets one successful candidate apart from the rest?
Is it their ability to think through complex analyses? Perhaps.
Is it their ability to build a machine learning model faster than everyone else? Maybe.
Is it their ability to have the right combination of hard skills that a company is looking for as well as soft skills that will make them the right person for the team? In fact, yes. That’s exactly what sets a data scientist apart from the rest of the pack.
When competing against 500 other candidates for one data science position, it can be assumed that everyone will more or less have the right hard skills that a company is looking for, such as coding, mathematics, and the ability to construct machine learning and artificial intelligence models.
However, what everyone doesn’t have, is soft skills. The ability to work as part of a team, communicate, read, write, comprehend, problem solve, make critical decisions, and manage your time wisely, are all soft skills that will get you the job. These skills are what set you apart from all the other candidates and that will make you a valuable asset to a company.
In college, you will always have an appropriate data set with which to solve the problem you’ve been tasked.
In a bootcamp, nice clean data sets will help you develop your skills.
Through your own personal projects, you’ll find hundreds of data sets available online that contain thousands of complete entries which you can use to develop some pretty complex projects.
However, when you hit the real world, you’ll realize that, more often than not, the data sucks. Whether the data set is incomplete, too small, or has too many outliers to make a concrete decision, the data of the real world is often the crux of an analysis you’ve been tasked to complete.
And that’s okay. Bad data is a fact of life and you can only be expected to develop as accurate an analysis as you can with what you’ve been given to work with.
Solving a problem using data science in school or in a bootcamp is one thing. There, the people that you’re “working for” have a great understanding of data science and the types of problems that you can and cannot solve given a particular data set. They’ll be able to give you clear problems to work from and solve that actually make sense.
In the real world, it’s quite likely that some C-level executive will want your data science team to build a model or develop an analysis to solve a problem that no one really understands. There appears to be some kind of issue to those at the executive table, but what the actual problem is, no one can describe it very clearly.
This aspect of working in data science will become just another part of the job for you as time goes on. Your job is to help those executives gain a clearer understanding of what they think the problem is and to explain to them how you think it could be solved. Plain and simple. Don’t reinvent the wheel or break your back trying to conjure something from nothing. Your job is to just help them understand using data.
A “brag sheet” can be as comprehensive as additions to your resume or as simple as a sticky note stuck to the bottom of your monitor.
In essence, this should be the place where you list the accomplishments that you accumulate over your career.
Maybe you developed a model that found inaccuracies in your company’s reporting tactics which helped them discover what their numbers really were. Or perhaps you found ways to retain a website user’s attention for longer and reduced the website bounce rate. Or maybe you developed an artificial intelligence model that learned what a specific trend looked like in skin cancer that was used to help doctors improve the accuracy of their diagnoses by 17%.
Whatever your accomplishment or the impact delivered to your company or organization, it’s important to keep track of what you’ve done over your career. When you’re first starting out, this will help keep you motivated when times get tough, and when you’ve got several years of experience under your belt, it will help you progress throughout the rest of your career.
The unique thing about data scientists is that they need to be able to explain brutally technical concepts in simple terms that anyone in a company can understand.
Your best trick to achieve this is by developing the ability to create connections to concepts using an analogy. Analogies are great ways to provide a clear representation of a complex idea in a way that is easily digestible and understandable right from the get-go. These analogies should be relevant to the company or industry that you’re working in and should give enough information that a C-level executive can be confident in making a decision based on their understanding.
Analogies should be free from data science jargon and short enough to be memorable.
In my experience, creating analogies that people can relate to is vital to your success in being able to explain what’s going on within a company using data. I’ve used sports analogies, cooking analogies, animal analogies, and more to help people understand the technical concept I’m trying to tell them. These analogies will not only help ideas stick in a stakeholder’s mind, but they’ll also be tools the stakeholder can then use to spread your information.
This acronym will keep appearing, no matter how long ago you learned it when you first studied data science. Coming from the world of coding, this acronym will probably help advance your career more than any other piece of advice could.
Why make something complicated when you can keep it simple? Sure, it would be nice to impress your boss with some elaborate model that cleans your data using two lines of code. But wouldn’t it impress them more if you developed a model that could be easily understood, updated, integrated, and used after you left the company?
You may begin your career thinking that complexity in your work is the one way to remain relevant to a company. However, your ability to complete your work on time, in a simple manner, that can be used by anyone, and that produces the exact results sought after is what will keep you relevant. Furthermore, your ability to keep learning and increasing your skill set is what will also keep you a valuable member of the team.
If complexity is your thing, just make sure you check with your boss first to see if this is the right project for you to go all out.
Otherwise, keep it simple, keep it clean, and keep it easy to understand.
To be fair, it’s not exactly hard to impress a C-level executive with no wide-ranging technical experience. It’s important to remember that these people are often easily appreciative when something as simple as their modem gets reset.
However, there does come a time when you may want to be extra impressive. This need to impress must also be balanced with the boss’ boss still being able to kind of understand what you did or what you solved.
Again, as mentioned above, this may involve developing a complex model to solve a difficult problem that can still be described using simple terms. You want to impress your boss’s boss but still have them be able to kind of understand the impact you brought to the company.
When in doubt, add color to a graph, try a histogram instead of a bar chart, and use words like “machine learning” and “artificial intelligence” to impress without giving away too many details. Remember, arguably 90% of data science is a simple linear regression, you may just need to dress it up a little.
Data scientists are often given impossible tasks — complete a data analysis with non-existent data, develop a machine learning model that solves the company’s problems, or integrate new models into a code base that hasn’t been touched since 2005.
Many see data scientists as jacks of all trades (which we are) who can solve any company problem using data (which we can’t). This means that the stakes are high when we’re given a business problem and told to solve it by the end of the week. Unfortunately, not all problems can be solved by slapping a linear regression bandage on the wound.
Therefore, it’s often best to err on the side of caution and be conservative with what you believe can be achieved with the tools you’re given and the conditions you’re under. Underpromising saves you when things go as planned and overdelivering endears you to your boss and your boss’s boss when things go unexpectedly well.
“Only you and God can read your code and you’re one flow session away from it being only God.” — TheDragonSpark, r/datascience
Writing excellent code isn’t about being able to complete the most amount of work with the least amount of lines. It’s about being able to read and understand the code you wrote on a Friday after a long weekend away. It’s about being able to give your code to an intern five years down the road and have them be able to understand and optimize it. It’s about being able to leave a legacy at a company of easy-to-use code that doesn’t result in a disgruntled data scientist phoning you at odd hours for help.
Go further than naming your variables
y , and
z . Instead, use descriptive names that tell you exactly what a variable holds and what result it should produce when combined with another variable.
Name your functions to describe exactly what they do.
Use comments to clarify any confusing lines, logic, or places where you had to dive into the depths of StackOverflow.
Use the READ.me file to give a clear glimpse into your thought process, the goal of your code, and insight into how and why it should work.
Being diligent with accurately naming your variables and using comments will take you surprisingly far in your career as a data scientist.
Most data scientists suck at writing good code (compared to software engineers).
Most data scientists suck at math (compared to mathematicians).
Most data scientists suck at modeling (compared to machine learning developers).
This is what makes data scientists unique is that we hold the abilities for each of these specialties and can carry them out remarkably well even though they all come from vastly different areas.
Imposter syndrome is very common in data scientists, where everyone expects you to be a software engineer, mathematician, machine learning developer, business analyst, graphic designer, project manager, and more. Therefore, it’s important to remind yourself that you’re good at all of these skills but that it’s unnecessary to be an expert in each of them. If the company wanted a statistician, they would have hired a statistician. Instead, they hired you because you possess such a varied skillset that makes you the full package, capable of carrying out each of the tasks listed above, and then some.