Eric Colson in Harvard Business review writes:
But the goal of data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities. Algorithmic products and services like recommendations systems, client engagement bandits, style preference classification, size matching, fashion design systems, logistics optimizers, seasonal trend detection, and more can’t be designed up-front. They need to be learned. There are no blueprints to follow; these are novel capabilities with inherent uncertainty. Coefficients, models, model types, hyper parameters, all the elements you’ll need must be learned through experimentation, trial and error, and iteration. With pins, the learning and design are done up-front, before you make it. With data science, you learn as you go, not before you go.
In order to encourage learning and iteration, data science roles need to be made more general, with broad responsibilities agnostic to technical function. That is, organize the data scientists such that they are optimized to learn. This means hiring “full stack data scientists”—generalists—that can perform diverse functions: from conception to modeling to implementation to measurement. It’s important to note that I am not suggesting that hiring full-stack data scientists results in fewer people overall. Rather, I am merely suggesting that when organized differently, their incentives are better aligned with learning vs. efficiency gains. For example, say you have a team of three creating three business capabilities. In the pin factory, each specialist will be one-third devoted to each capability, since no one else can do their job. In the full-stack, each generalist is completely devoted to a business capability, increasing scale and learning.
I completely agree. In fact, I think the term “data scientist” is vastly over used and badly specified. I have worked with a lot of data scientists. They have all been incredibly intelligent group of people who can deep dive into data. But they are not necessarily capable of retrieving, manipulating, and organizing data into optimal streams for processing.
I think a “data science generalist” is going to be hard to find, but pairing data scientist(s) with a generalist developer would lead to an optimal pairing.