Are you trying to decide what career path is best for you as a beginner and aspiring data enthusiast? People often ask, between a data analyst, data scientist and engineer which is more valuable? Which pays better and offers the best job security for the longterm? Well the answer depends on who you are but in simple terms the key differences between a data analyst vs a data scientist and data engineer are in the everyday tools they use and the skill sets required to achieve actionable insight with the data – the key goal for all roles in the big data world. Lets look into what this means for each role, how much they get to impact the business and how well they are compensated for it also. I’ve also done a summary video here on the differences between a data analyst vs a data engineer vs a data scientist you can also watch.
What is a Data Analyst and what tools and skills do they need?
A Data Analyst is essentially curating the data for the scientist. They use much simpler tools like Excel for example to do this and they sit in that reporting and visualization world. A more advanced data analyst is more likely to be an advanced SQL user. They are able to perform complex SQL operations and are experts in accessing the database to do this efficiently. If you look at the data science project pipeline here and here in Fig 1. and 2. respectively, you will see that a lot of their focus is actually pulling together information that can then be reported in a visual manner to help managers make decisions for the business based on what has already happened, an area often referred to as Business Intelligence. Data analysis can get quite in-depth once you open up the world of data analytics. These guys can often be seen mining for data, using fancy tools like Tableau (basically a visualization reporting dashboard tool) so they can use data to tell a story to perform all sorts of analytics from predictive (based on what happened to sales in our stores when we released apricot jams, what will happen to our sales if we introduce a bigger tub of apricot jam and try to sell that?) to prescriptive and more.
So what is the data analyst really trying to achieve?
As you may have guessed, the data scientists role may actually overlap with some of these activities. If the data isn’t gathered there cant be further ‘scientific” work or research done on it, more on this later. That said, a data analyst role is a great pathway to becoming a data scientist in the future. You get your hands dirty with data in terms of providing useful reports. Sometimes though, this means doing a lot of clean-up duty as well. You will often hear that a lot of the data scientists’ role is actually 80% cleaning and munging of data. Infact, there is a a guy on Quora – Mike West that has self-titled himself “Chief Data Janitor”. Funny but the reality is, clean up duty with data is a big part of the role. By clean-up, what we mean is, you could extract data about employees in a company and you want to know who are your “superstars” and the main metric you want to use to judge that is to see how many “employee of the month” awards they have had over the year. Well, because a machine learning algorithm is essentially looking for data in a simplistic format like true or false, 0 or 1 to perform work on, whenever you have an employee with no awards the computer is going to ask what should I do with this…“blank field”? So your data analyst or data janitor is going to come along and tidy that data set. Perhaps deciding where we have no awards, lets put a “0” or “false” to help the algorithm understand what is going on.
If you want to understand the more basic workings of data science, and words like algorithms and machine learning in useful every-day life examples, then you can download “Data Science – The What, The How and The Why “
So are data analysts talking to the business? Probably, in smaller companies and sometimes in larger older companies. In smaller companies possibly because they are also the data scientist or the company is still in the early stages of its lifecycle in utilising data for insight. In larger firms, you will actually have way more data analyst roles than you would data scientists, think of a hotel business for example with tons of user and transaction data globally. A data analyst is also a great role that can work remotely (depending on industry and there isn’t highly sensitive data involved). So if you enjoy talking to people this could be a bonus, though more likely than not you will be reporting to a chief analyst or lead and really just have to be comfortable working with simple to complex data curation tools and having a curious analytical nature. Salary wise, I have found there to be the widest range here. Some reasons are that location can be one of the biggest factors in salary ranges, Typically I have seen anything from $60,000 to $120,000 in a place like New York for example. If you are switching careers into data science/artificial intelligence, a data analyst is a great starter role especially if you have basic coding skills or interest in coding. If you do however have advanced programming skills then there is another role that could be better suited and offer more compensation and is in even greater demand.
How about the Data Engineer – what tools and skills do they need?
A data engineer is more of a builder. They are creating the features and technical infrastructure to store and allow the analyst to extract the data from. We often refer to this as data pipelines that serve up the data from various sources and data warehouses that allow tons and I mean a ton of data to be stored that can the be queried and curated by the data analyst. Think places like Youtube, Twitter or Facebook for example, they have a ton of users and those users are performing lots of activities and actions which create even more data round the clock and then add to that globally and you can begin to understand why data engineers are so important in large and even very small companies. In fact, you may find that really small companies want the data engineer to be the data analyst and also the data scientist all at once. Their role is so fundamental that literally if you have no data engineer you cannot do any meaningful data science or analytics work because your data hasn’t been served up to be analysed further. There is also another role that can overlap with a data engineer in the Big Data world which is a machine learning engineer. Rather than just focus on infrastructure, warehouse and pipeline readiness they are also looking at monitoring these models created by the Data Scientist once they are launched into real world scenarios so they also need to be comfortable with math as well as coding as they will maintain and tweak these models using tools like Python and R. Checkout a full breakdown on the differences in this video and also an interview with a PepsiCo turned NBA machine learning engineer here.
Biggest differentiator between data engineer vs the data analyst and data scientist
The biggest differentiator with the data engineer vs the data analyst and data science role is that apart from building infrastructure they need really deep technical coding skills on an order of magnitude that is more complex than you would need in the other two roles. Due to this and the importance of their role, they are very well compensated and in greater demand than ever. A data engineer is likely to come from a computer science or programming background. They are typically using more complex languages often referred to as “low-level languages” such as Java and C++ and the reason is they are focussed on delivering these data pipelines and access to the data warehouse in an efficient way that lots of people in the company can utilise at scale. As a result of this, they need more infrastructure type languages that help with system performance. They will also be familiar with data frameworks such as Hadoop and Spark that provide phenomenal storage and processing capabilities in the big data world. A data engineer is probably someone that doesn’t just love to build but has great programming skills and wants to still be involved in data driven technology and utilization. In a startup, you may find a “full stack data scientist” role which really means this person has strong data engineer skills on the backend but can also do front end work of the analyst and scientist. This doesn’t mean data engineers are recluse and don’t talk to people, they still need decent people skills to relate to the scientists for example but more importantly in a smaller company they will be required to interface with the business as a “full stack data scientist”. I keep putting this in quotes because this skill set is pretty rare. In my experience people are typically stronger scientists or have more of a flair for engineering but rarely both. Salary expectation wise they will start from around $100,000 to 150,000k for cities like New York. Even outside of this location this typical average will still be within 10-15% realistic.
Then The Data Scientist – what tools and skills would they utilize?
The Data Scientist as the name suggests is the scientist and they essentially are conducting experiments with the data. They need the data in place (from the data engineer) and in much larger companies curated by the data analyst for them and they are testing hypothesis and theories to figure out how to predict the future for impact. Impact is the word I like to use here because not everything is about monetary value. There is a place for Data science good, sustainability or even uncovering insight for medical advancement. The scientist typically already has analysis skills such as SQL but coupled with stronger coding skills in areas like Python or R. These languages allow them to build powerful models to test their ideas of predictions with. Excel is great but Python gives us so much more libraries to play with. A data scientist is also very client-facing. An often overlooked area when looking at explanations and differences in what data scientists do distinctly is they bring domain expertise to interpret the results of their experiments in context. Not just that, even curation of data to make sense of it often requires some domain intelligence to do it effectively. A data scientist without contextual understanding can interpret results inaccurately which may lead to bad decision making that then negatively impacts a company’s outlook and activities. For this reason also, they needs strong communication skills. Often referred to as storytelling and “explainability”. A data scientist salary will typically start from $120,000-$150,000k base in New York with a national average being within 10-15% of that. Careful with role titles like “junior data scientist” though, because they do not really exist. They are probably data analysts and just consider that realistically, you can’t just become a data scientist without experience. It takes time to build up understanding and value and is a career path you have to cultivate for the long-term.
So how can this help me career wise?
Well, you should do an audit on what you are good at now and what you are prepared to do to get started or transition to the data world. So for starters if you have zero coding skills currently and see yourself staying more on the business side less coding then a data analyst path is a great place to begin. If you are interested in transitioning into coding or already have strong coding skills then the data engineering path and possibly even the data scientist path if you do want a greater level of business exposure would be worth considering.
Take the career roadmap quiz to find out quickly what works best for you.
Summary of various tools and skills required by Data Analyst/Data Engineer/Data Scientist
So at a glance, here are the main differences between a data analyst, data engineer and data scientist.
Resources to help for learning Data analyst/data engineer/data scientist carer path
And if you are looking for further resources to help with each path, here are some recommendations I have curated from speaking with various analysts, engineers and scientists within the industry.
Data Analyst Recommended Resources
Storytelling with Data
Python for Data Analysis
Data Scientist Recommended Resources
Data Science from Scratch
R for Data Science
The Hundred Page Machine Learning Book
Data Engineer Recommended Resources
Python feature Engineering Cookbook
Spark the definitive guide
Foundation for Data Architecting