Data science teams should focus on analysing data and building models, not infrastructure management.
It is indeed a great tool! And by taking a look at the pros, it seems to make a lot of sense to use it for machine learning — both kubernetes and machine learning are tech jargon words used to describe the future, so it makes sense to have them together. Seriously, auto scaling for the time and computational consuming processes like model’s training? Makes total sense to use it.
But should data science teams start working with it?
What are we missing here?
A great tool for data science and it should be used by data scientists? That’s right. Kubernetes is an infrastructure tool, hence, it should be used by infrastructure specialists.
There’s a new movement regarding this topic and it is called MLOps. I joined a community that is making the first steps on the movement — MLOps.community — and surprisingly it grew for a bunch of people (~40) trying to figure out how to make processes simpler for Machine Learning into a strong community (~900) sharing knowledge, creating webinars and tools. It is agreed that the skills needed to scale ML using Kubernetes are much different from the ones needed to build models. However, there’s not yet a common standard for the role of these people in a company: some are called Machine Learning Engineers, other kept the previous role of Infrastructure Engineer, DevOps or SRE, and some are innovating by being named “AI Infrastructure Engineers”.
To give you an idea on how complex it is to create MLOps culture and processes, there are companies specializing on it. Yes, now that you are thinking on delegating this job to that one person, think that there are entire companies, maybe bigger than yours, working on this. MLOps platforms, AI platforms, DataOps platforms, Data Science platforms are all similar and focus on solving scalability for data science, among other technicalities.
Think well before deciding between building internally versus buying.
Data scientists are spending tons of time dealing with containerization and kubernetes in general. This is not a task for data scientists, it is a new specialized role that is emerging and is infrastructure related. Let data scientists work on their field and create value for the company, instead of having them dealing with issues out of their scope.
Gonçalo Martins Ribeiro is CEO at YData.
Improved data for AI
YData provides a privacy by design DataOps platform for Data Scientists to work with synthetic and high quality data.