The five Wh’s of Data Science – What, Why, When, Who and Which.
While preparing the upcoming October workshop in Data Science, Innaxis has gathered wh- questions and simple answers about the “new reality” of data science. We also provide links to pages where more information about these important questions have been provided.
The basic answer to what is Data Science could be “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. Definitions, especially of new terms should remain simple despite the urge to make them complicated. Furthermore, the boundaries of Big Data, Data Science, Statistics and Data Mining definitions are not so discernible and include common principles and tools and, importantly, the same aim: extraction of valuable information.
What is the reason for extracting information from data? There is a brilliant quote by Jean Baudrillard “Information can tell us everything. It has all the answers. But they are answers to questions we have not asked, and which doubtless don’t even arise” In this context, proper data science is [ generally ] neither basic science nor long term research; it is considered an extremely valuable resource for the creation of business. Mining large amounts of both structured and unstructured data to identify patterns that can directly help an organization in terms of costs, in creating customer profiles, increasing efficiencies, recognizing new market opportunities and enhancing the organization’s competitive advantage.
Through history, an extensive list of names have been given to a well known duality: information=power; from the middle ages census to the Royal Navy strategies based on statistical analysis. Concerning the current understanding of Data Science, its name has moved away from being a synonym for Data Analysis in the early 20th century to being associated, from the nineteen-nineties, with Knowledge Discovery (KD). One of the very best compilations of data science history and publications over the last 60 years can be found in this Forbes article.
Throughout history, the various methods and tools used have changed, developing as both the mathematical, extraction and software and hardware capabilities have increased in recent years. The consequent “sudden” eruption in Data Science jobs, which identifies the market’s real interest in those potential benefits that knowledge extraction offers, is visually described with the following graph taken from Linkedin analytics:
Courtesy LinkedIn Corp.
If you are a lawyer or a doctor everybody knows more or less your level of education at university and the nature of your daily tasks. What is then a “Data Scientist”? The clear paths that could lead to a Data Science career are not so defined and are difficult to identify. The so called “Sexiest Job of the 21st century” (according to the Harvard Business Review), needs a common definition and even specific university degrees. The data jockeys that have always been employed in Wall Street are no longer alone. Meanwhile the scope and variety of data now available is a non-stop, growing, force resulting in operational, statistical and even hacking backgrounds being welcome to extract value from it. More information about data scientist careers and the main disciplines can be found in this excellent article from naturejobs.com.
In order to understand Data Science job titles, we recommend you also have a look at this article by Vincent Granville from DataScienceCentral. It’s a living tongue twister: data mining activity done by a data scientist regarding data scientist job titles. Summing it up, it is pretty similar to the following recipe: Take a mixer from the kitchen; add the words “Data” “Analytics” “Scientist”; switch it on; include some institutional label “director” “Junior” “Manager”. An additional optional topping could be your university degree “engineer” “mathematician”. There you have one of the possible names of current data scientist.
Which data is “datascience-able”? As we described in our previous post about Data Science, there is huge potential in almost every imaginable field that could provide sufficient quality data for analysis. Although, even where the date is available, there are challenges faced, generally connected with data storing and managing capabilities. These challenges are covered in detail in the Innaxis blogpost, “The benefits and challenges of Big Data”. One of the remarkable and exciting things about Data Science is that there is additional knowledge to extract from data sets that at first sight are not expected to provide anything beyond the obvious potential from the so called “direct” datasets. The reality is it’s hard to know which data sets will add value before testing them with Data Science. When discovered, hidden patterns and unseen correlations are really adding more valuable knowledge to entities than direct cause-and-effect relationships. They represent being one step ahead, which is crucial in the highly competitive world in which we are living.
By Héctor Ureta – Collaborative R&D Aerospace Engineer at Innaxis