Data science is a broad field that combines domain knowledge, programming skills, mathematics and statistics in order to draw insight from data. As you may have guessed from the name — Data Science — the work done revolves around data, lots of data.
I have to admit, when I first heard about Data Science, I thought all individuals who worked in the field were data scientists. You can imagine my surprise when I found out that Data Science is not just a job role, but a field encompassing different skills and competencies.
But they are all Data Scientists right?
Well, yes and no. They all work in the data science field. While indeed there are Data Scientists, there are also other sought after experts in the data world such as the data engineers, data analysts, machine learning engineer etc.
Why? Don't they all work with data?
Yes, they do in fact work with data, but there is a lot of work to be done with data. The position in the data workflow where they interface with data differ as well as the form of data they work with.
While a single person such as a data scientist can carry out all these functions in a small company, in a larger company with well defined structures, there is usually much more data and infrastructure to handle. Hence it becomes burdensome for one person to do all the work.
Also to improve efficiency in company operations, it is practicable that these data tasks are handled by professionals who have honed their skills in such areas such as Database management, Predictive analysis, Data visualization, Machine Learning and the like.
The data science field is expanding as more companies have realized the importance of data. With these data science job roles are becoming more defined and certain skills and competencies are expected from data professionals. Although your duty may vary slightly from company to company, the basic job roles will remain the same. It is also common that some of the tasks of these professionals are intertwined with one another, so don’t be surprised if you see the same function appear more than one time.
We will go through the roles of the data analyst, data scientist, database administrator, machine learning engineer and data engineer taking note of their job descriptions, tools used and skills required for each role. It is also good to note that some of the skills and skills are similar.
Essential data science skills include:
Data Collection and Preparation
To work with data, we need data. The data collection process includes all processes and methods through which data is collected for analysis. While there are free datasets available in sites such as Kaggle or Google datasets.
There are many ways to acquire data sets too, a simple Google form or survey form is a basic source of data which can then be stored in databases. One can also acquire data through web scrapping though this should be done with caution.
When analyzing data, it is also common to access data in databases, making SQL an important skill for any data professional.
Data preparation is the process of making data suitable for use. We don’t always get structured data and even when we do, we can have missing data, unnamed columns and the likes. These issues are taken care of during data preparation.
Data visualization is an important aspect when working with data. From exploring data to discover trends and patterns to carrying out some basic statistics on the data and even reporting the findings from the data, Data visualization plays a key role in the data science process.
Data visualization can be carried out using business intelligence tools such as Tableau, Power BI, flourish and lots more. These tools are more than just visualizing tools as some basic analysis and operations can be carried out on data without having to code. Yes, you read that right. Although, some have plug-ins for popular languages such as R and Python.
These Business Intelligence tools as they are commonly called are also powerful. For example, you can connect Power BI to your database and edit queries if need be. You can check your BI software to see available data sources you can connect to.
Some level of programming is also required depending on your job role. It can be an added competitive advantage that sets you apart from other candidates.
While an in-depth knowledge of programming is not required, a sound knowledge of SQL is a good way to get your foot in the door. SQL which stands for Structured Query Language is the database language. As data professionals, working with databases is a part of everyday life by extension creating, reading, updating or deleting these databases. You should be careful with deleting though.
Data scientists are also well versed in R and/or Python, as well as some other programming languages that allow you work with data such as C, C++ and Java.
Your Data Science Team
Data Analysts perform a number of tasks including collecting data, carrying out statistical analysis, building data models and reporting. It is the data analysts job to turn data into useful insight for the company.
They work closely with business managers to identify the company needs which will determine the data they work on and the analysis method carried out. The data analyst makes use of Spreadsheets, Databases, visualization and reporting tools.
The skill set required for Data analysts include:
- Database systems
- Data visualization
- Mathematics and statistics
- Spreadsheet tools
- Presentation skills
- Communication tools
- Machine Learning
Probably the most common job role in data science (and broadest), the Data Scientist is one skilled in computer, statistics, analysis and mathematics.
The data scientist performs various tasks including exploratory data analysis and carrying out predictive analysis. The data scientist applies machine learning and deep learning techniques on data in order to detect patterns from previous data and use these to predict future events.
The data scientists uses tools such as MATLAB, Apache Spark, Databases.
The skill set of data scientists include:
Programming languages such as:
A data scientist should also be skilled at:
- Data visualization and storytelling
- Predictive modelling
- Machine learning
- Data wrangling
- Exploratory Data Analysis
- Ability to use Big Data tools
The database administrator is responsible for the management of a companies database. The database administrator plays a key role in the selection of software and hardware components of a database. The also ensure the integrity of the data and ensure the database security. The database administrator essentially acts as the Protector of the data.
The role of the database administrator cannot be over emphasized. They ensure availability of data and prevent loss of data by implementing data backup and recovering and are in charge of the overall wellbeing of the database.
Common tools used by database administrators include: phpMyAdmin, MySQL Workbench, SQL Server Management Studio, SQL Web Data Administrator etc.
The database administrator should be skilled in:
- Database queries
- Database design
- Knowledge of Relational Database Management Systems
- Data security
- Data modeling
- Data backup and recovery
- Storage technologies
- Operating systems
The insights you draw from data, can only be as good as the data available. The data engineer creates a pipeline or data feed that automates the data collection process to on-site or cloud storage systems. The data engineer makes data easily accessible to end users (such as the data scientists, data analysts), the data engineer is a team player, working hand-in-hand with data scientists and analysts to ensure that the required data is available in the right format for easy analysis.
The data engineer is in-charge of building and maintaining Extract Transform Load(ETL) pipelines:
- Data extraction from various sources
- transforming the data
- Loading data into destination(database, data warehouse)
- Data synchronization
The data engineer makes use of such tools as: MS SQL, Dbeaver, Azure, AWS, Spark, Hadoop.
The skill set of an intending data engineer includes:
- Statistical modeling and regression analysis
- Programming (Python, C++, Java)
- Database Systems
- Data warehousing solutions
- Machine Learning
- Data modeling
- Big Data tools — Hadoop
- Data Visualization skills
You can also view DataCamp’s infographic titled, “The Data Science Industry: Who Does What” where some of these roles are outlined. You should check that too.
We have looked at some key roles in data science. This list is not exhaustive and I advise that you check Job Descriptions before applying for any job role to ensure you are well suited and competent for the role before applying.
Good luck with your applications!