WHAT IS DATA SCIENCE? THE SIMPLE ANSWER IS THAT IT DEPENDS ON WHO YOU ASK.
Although it is a relatively new term that only really emerged as commonplace in the early 2000s, its use and context have changed significantly. Anthony describes the many layers of complexity that contribute to successful data science.
In its original sense, data science was sometimes used to describe pure statistics, but more often, the technical application of statistical methods and models and the corresponding computational work.
In recent years, the term data science has changed, and depending on who you speak to – which company they work for or where they studied – the definition will be different. I realise this is ironic, considering I am giving you a definition of data science that may be different from the next person you speak to (which I believe proves my point).
In many modern organisations, data science is becoming largely analogous to business intelligence – namely, it may involve a lot of data cleansing, querying, analytics functions, and building reports.
In other organisations, data scientists may focus solely on machine learning, artificial intelligence, or complex statistical model implementation and deployment. Or, they may be the team members architecting database solutions, or building and developing the applications used for deployment of analyses and models.
In many cases this is further complicated when an organisation has business analysts, data architects and engineers, and machine learning engineers – which to an extent cover most of the tasks I’ve mentioned – but then also employ a fleet of data scientists.
This lack of specificity is exemplified by looking up ‘data scientist’ in the Oxford English Dictionary: "A person employed to analyse and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making."
Just to improve on the vagueness of the role, the example sentence is as follows: "Silicon Valley technology companies are hiring data scientists to help them glean insights from the terabytes of data that they collect everyday."
The reality is, there are many layers of complexity to straddle if someone wants to ‘analyse and interpret complex digital data’, which is what data science is in my opinion. I believe data scientists typically dabble in all the previously mentioned areas; they don’t necessarily focus on a specific task or part of the pipeline, but instead work across relevant spaces, collaborate with cross-functional teams, and contribute to all aspects of the data process.
Because of this, data scientists generally have a well-rounded knowledge of all parts of the data pipeline within an organisation. They usually understand and are comfortable with multiple parts of the typical data journey:
Let me give you an example. A business analyst needs to start reporting daily purchase forecasts for a certain product at different future time intervals, and the data is not currently available.
The analyst’s first stop might be a data scientist. The data scientist would work with the analyst to understand exactly what they are looking for, how it needs to be fed back, etc. – general information gathering. The data scientist can then think about what data sources are available, what models might be used for predictions, whether it is a one-off task or how it can be automated, etc. In many cases, the data scientist may handle the entire task from end-to-end.
What really makes data scientists valuable is that they have a deep understanding of the technical aspects of an analysis, so that if there is a complexity – i.e. a complex launch configuration to schedule a machine learning job or an unforeseen issue in a dataset – they can comfortably work with the appropriate specialist technical team member to jointly identify and implement a solution.
Modern data science clearly doesn’t always have a concise, agreed definition, and even based on my preferred definition, the data scientist has wide-ranging skills. As a result, it has become common to think ‘hey – we need someone that can do data’, so a job posting is placed for a data scientist, and the required skill set probably looks like this:
If you find yourself in a situation where you have piles of data but no idea what to do, I wouldn’t recommend placing a generic, all-skills ‘data scientist’ job post. Although many data scientists truly do possess the complete set of skills, It reads more like ‘we don’t quite understand our data or what to do with it, and don’t know who we need on our team’.
Instead, consult an expert that knows all aspects of data to help you find your way. You might be right – maybe you do need a data scientist. But maybe what you need more urgently is a database architect or a business analyst. Speak with someone that can help you understand your needs. It’s worth the upfront investment, because although they are highly skilled and valuable team members, generically hiring a data scientist to do ‘data stuff’ under the assumption that they can solve all your problems can be costly, and sometimes not the right fit for your needs.
So, back to the original question – what is data science? A little bit of everything.
Have a project in mind? No need to be shy, drop us a note and tell us how we can help realise your vision.