Join today’s top leaders online at the Data Summit on March 9. Register here.
Data science is a team sport. This sentiment rings true not only with our experiences at IBM, but also with our enterprise clients, who often ask us for advice on how to structure data science teams within their own organizations.
Before this is done, however, it is important to remember that the various skills required to run a data science project are both rare and distinct. This means we need to make sure each member of the team can focus on what they do best.
Consider this breakdown of a data science project, along with the skills required for each role:
While each role is certainly distinct, each member of the team should have T-shaped skills, which means they will need to have depth in their own role, but also a shallow understanding of adjacent roles.
Let’s explore each role on the board in a little more depth.
Product owners are the subject matter experts, with a deep understanding of the particular industry and its concerns. In some cases, the product owner’s primary role will be on the business side, while periodically working with the data science team to solve a specific data science problem or set of problems before returning to a role wider.
In fact, returning to the normal role is a plus for the data science team. This means that the product owner acts as the end user of the models and can offer actionable feedback and requests. It also means that the product owner can champion data science within the business units themselves.
Product owners are most often responsible for:
- Define the business problem and work with data scientists to define the working hypothesis
- Assist in locating data and data handlers if needed
- Brokering and resolving data quality issues
Data engineers are the assistants who move all data to the center of gravity and connect that data through services and message queues. They also create APIs to make data generally available to the business, and they are responsible for engineering data into the platform that best suits the team’s needs. With data engineers, we look for these three main skills:
- Proficiency in at least three of the following: Python, Scala, Java, Ruby, SQL
- Proficiency in using and creating REST APIs
- Able to integrate predictive and prescriptive models into applications and processes
Data scientists tend to fill one of two distinct roles: machine learning engineers and decision optimization engineers. Since market conditions have made the “data scientist” such an important role, making this distinction can remove some confusing wiggle room. (For our detailed thoughts on this, check out our recent article on VentureBeat.)
Machine learning engineers
Machine learning engineers build the machine learning models, which means identifying important data elements and features to use in each model. They determine the types of models to use and test the accuracy and precision of those models. They are also responsible for the long-term monitoring and maintenance of the models. They need these three main skills:
- Training and experience in the application of probability and statistics
- Experience in data modeling and evaluation and deep understanding of supervised and unsupervised machine learning
- Programming experience in at least two of the following: Python, R, Scala, Julia, or Java, with Python expertise preferred
Decision Optimization Engineers
Decision optimization engineering skills and backgrounds overlap with those of machine learning engineers, but the differences are significant. Decision optimization engineers need these three main skills:
- Experience applying mathematical modeling and/or constraint programming to a range of industry problems
- Strong Python programming skills and ability to apply predictive models as input to decision optimization problems
- Experience creating Monte Carlo simulation/optimization for what-if scenario analysis
This brings us to data journalists, the team members who help represent the output of the model in the context of the data that drove it and who can clearly articulate the business problem at hand. With data journalists, we look for these three main skills:
- Coding skills in Python, Java or Scala
- Experience integrating data and outputting predictive and prescriptive models in the context of a business problem
- Proficiency in data analysis, scraping and wrestling
If you can assemble a team with these critical skills – and if you can ensure they collaborate well and maintain a meaningful understanding of each other’s work – you’ll be well on your way to uncovering the insights and understanding that can boost any organization you are leading.
Without them, you could be flying blind.
Seth Dobrin is vice president and chief data officer at IBM Analytics.
VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more