The confusion around Data Science: roles and responsibilities

Over the course of the past years I’ve been speaking with multiple players in the tech industry.
It’s clear that the interest in Data Science is fast rising.
From employers focused in shaping a company wide data strategy to scientists freshly out of college, everyone is trying to understand their position in this new big data era.
Media outlets are raving about the data revolution and there is a growing collective perception that data is the “hot new thing”.

Unfortunately, there is as much misinformation as there are facts.
Around the hype, there is a ring of truth: this is something new. But at the same time, it’s a fragile, nascent idea at real risk of being rejected prematurely.
For one thing, it’s being paraded around as a magic bullet, raising unrealistic expectations that will surely be disappointed.

How we got to the term Data Science

The first documented appearance of the term Data Science is dated 2001.
William Cleveland wrote a position paper about data science called “Data Science: An action plan to expand the field of statistics.” as part of the April edition of the International Statistics Review.

The paper proposes a new field of study named data science. It then goes on to list and explain 6 technical focus areas for a university data science department.

  • Multidisciplinary Investigations
  • Models and Methods for Data
  • Computing with Data
  • Pedagogy
  • Tool Evaluation
  • Theory
    It took roughly a decade before the term was adopted by the industry.

    In 2011, DJ Patil described how he and Jeff Hammerbacher— then at LinkedIn and Facebook, respectively—coined the term “data scientist” in 2008. That is when “data scientist” emerged as a job title. You can read the original blog post here.

    Is not clear if that was the first time the term was used in a professional context, but it took other 4 years before the term reached Wikipedia. Was only 2012 when Data Science first made it appearance on the popular online encyclopedia.

    Interesting to note how the term Data Science and the role definition of a Data Scientist has always been a controversial debate.
    A good resource of information is this Quora question. Answers from that thread are included today in universities courses in order to reiterate different prospective and the overall ambiguity of the term.

drewconway
To give you a perspective, on the left Drew Conway Venn diagram explains his take on what is Data Science. On the right, a reinterpretation of the same diagram, with probably a more realistic expectation. Everyone has his own opinion.

Real expectations for Data Science

What do data scientists look like? What do they do in their day to day work in a real company?

It depends on the level of seniority and whether you’re talking about the Internet/online industry in particular.
(The role of data scientist is not be exclusive to the tech world, but that’s where the term originated. From now on, I’ll focus on this specific segment).

A chief data scientist (CDS elsewhere also referred to as CDO, chief data officer ) should be setting the data strategy of the company, which involves a variety of things: setting everything up from the engineering and infrastructure for collecting data, understand privacy concerns, deciding what data will be user-facing, how data is going to be used to make decisions, and how it’s going to be built back into the product.
She should manage a team of engineers, scientists, and analysts and should communicate with leadership across the company, including the CEO, CTO, product leadership, and often COO too. She’ll also be concerned with patenting innovative solutions and setting research goals.
She is responsible to grow a supportive leadership structure that helps her propagate her vision, focus on localized strategic thinking and ultimately be accountable of the execution.

She identifies and improves communication to bring conflict within the team into the open and facilitate resolution. Openly shares credit for team accomplishment. Monitors individual and team effectiveness and recommends improvement to facilitate collaboration. Considered a role model as a team player. Demonstrates high level of enthusiasm and commitment to team goals under difficult or adverse situations; encourages others to respond similarly.
Expect this definition to propagate to all the different level of leadership in data: Head of Data Engineering, Head of Data Science, Vp, director, etc.

Moving away from organizational leadership positions and looking into the individual contributor career, more generally, a data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. She spends a lot of time in the process of collecting, cleaning, and munging data, because data is never clean.
This process requires persistence, statistics, and software engineering skills. Skills that are also necessary for understanding biases in the data, and for debugging logging output from code.

Once she gets the data into shape, a crucial part is exploratory data analysis, which combines visualization and data sensing. She’ll find patterns, build models, and algorithms, some with the intention of understanding product usage and the overall health of the product, and others to serve as prototypes that ultimately get feedback into product future implementations. She designs experiments, and she is a critical part of a data informed decision making process. She’ll communicate with team members, engineers, and leadership in clear language and with data visualizations so that even if her colleagues are not immersed in the data themselves, they will understand the implications.

In the industry you can find this position named in multiple ways: Analyst, Machine Learning Engineer, a more generic Data Science, Product Data Science, etc.
So far the most realistic definition that resonate with the way I’ve been working with data science organizations can be found in this article from Rober Cheng , describing the two main profiles for a Data Scientist at Twitter and AirBnB. Fairly similar in the way it works for Google and SoundCloud as well.

Focusing on the outcomes instead of the job title, this is how data scientists add value to a company:

  • Performing offline analysis that informs mission-critical business decisions, e.g., identifying key user segments or activities.
  • Improving products: think of search and recommendations that, although engineering at the core, rely on the quality of data and derived data.
  • Creating data products: for example, user stats, content consumption, related content.

Last, but not less important, the role of Lead Data Scientist, also called Principal Data Scientist.
This is a position often present in companies not big (or mature) enough to have a C-level focusing on Data.
It is also common in research labs, where the business edge is not as crucial as the domain knowledge.
In this role, the person is consistently fosters collaboration and respect among team members by addressing elements of the group process that impedes, or could impede, the group from reaching its goal. Engages the “right people,” despite location or functional specialty, in the team by matching individual capabilities and skills to the team’s goals. Works with a wide range of teams and readily shares lessons learned. Explains the context of multiple, complex inter-related situations. Asks searching, probing questions, plays devil’s advocate, and solicits authoritative perspectives and advice prior to approving plans and recommendations.


At SoundCloud we are constantly expanding our Data Science organization, counting today over 30 individual contributors.
Want to join us? Let me know!

Copyright © 2015– Nicola Bortignon