Data Science
Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
Data science practitioners apply machine learning algorithms to numbers, text, images, video, audio, and more to produce artificial intelligence (AI) systems to perform tasks that ordinarily require human intelligence.
In turn, these systems generate insights which analysts and business users can translate into tangible business value.
What do I need to know to be a Data Scientist?
While data scientists often come from many different educational and work experience backgrounds, most should be strong in, or in an ideal case be experts in four fundamental areas. These are:
- Business/Domain
- Mathematics (includes statistics and probability)
- Computer science (e.g., software/data architecture and engineering)
- Communication (both written and verbal)
There are other skills and expertise that are highly desirable as well, but these are the primary four.
In reality, people are often strong in one or two of these, but usually not equally strong in all four. If you do happen to meet a data scientist that is truly an expert in all, then you’ve essentially found yourself a unicorn.
Based on these, a data scientist definition could be a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights.
A data scientist does this through business domain expertise, effective communication and results interpretation, and utilization of any and all relevant statistical techniques, programming languages, software packages and libraries, and data infrastructure.
The insights that data scientists uncover should be used to drive business decisions and take actions intended to achieve business goals.
Data science goals
In order to understand the importance of data science, one must first understand the typical goals and deliverables associated with it’s initiatives, and also the process itself.
Let’s first discuss some common data science goals and deliverables.
- Prediction (predict a value based on inputs)
- Classification (e.g., spam or not spam)
- Recommendations (e.g., Amazon and Netflix recommendations)
- Pattern detection and grouping (e.g., classification without known classes)
- Anomaly detection (e.g., fraud detection)
- Recognition (image, text, audio, video, facial, …)
- Actionable insights (via dashboards, reports, visualizations, …)
- Automated processes and decision-making (e.g., credit card approval)
- Scoring and ranking (e.g., FICO score)
- Segmentation (e.g., demographic-based marketing)
- Optimization (e.g., risk management)
- Forecasts (e.g., sales and revenue)
Each of these is intended to address a specific goal and/or solve a specific problem.
The real question is which goal, and whose goal is it?
For example, a data scientist may think that her goal is to create a high performing prediction engine.
The business that plans to utilize the prediction engine, on the other hand, may have the goal of increasing revenue, which can be achieved by using this prediction engine.
Even if an executive is able to determine that a specific recommendation engine would help increase revenue, they may not realize that there are probably many other ways that the company’s data can be used to increase revenue as well.
It can therefore not be emphasized enough that the ideal data scientist has a fairly comprehensive understanding about how businesses work in general, and how a company’s data can be used to achieve top-level business goals
With significant business domain expertise, a data scientist should be able to regularly discover and propose new data initiatives to help the business achieve its goals and maximize their Key Performance Indicators.