These days every trend regarding how IT operations are handled gets an “Ops” moniker: DevOps, DevSecOps, AIOps, DataOps, MLOps and some other more exotic ones, such as GitOps and FinOps. This note presents a short point of view on whether MLOps is simply DataOps applied to ML models producing analytics, or whether there is a capability missing in DataOps-style data pipelines to fulfill MLOps goals.

Let’s start by the accepted definition of these terms, according to Wikipedia:

  • DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics.
  • MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage the production of ML lifecycle. MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements.

Notice the similarity of these definitions: they talk about practices for delivering project results faster and with better quality (which is exactly what the DevOps movement is after, thereby justifying why they are tagged with the “Ops” moniker). They also target

  • The same personae: members of multi-disciplinary analytics teams (not just developers and operations people, as in DevOps),
  • The same stages in the software lifecycle, end-to-end from design to monitoring the value of your production system (not just the build-test-deploy phases, as in DevOps).

Most things you need to do in traditional data projects and in ML-based projects to deliver faster and with better quality are basically the same. To build an ML model, you need to go through a lot of mundane data management “data prep”: you have to clean and normalize the data for training; then, you transform it to deal with regulations (e.g., masking/anonymizing sensitive data), to make it easier to use (discretizing continuous variables into bins), and/or to come up with measurable quantities as columns describing the subjects of analysis (e.g., customers), stored in rows –this is called feature engineering.

Then, a data scientist creates a model. While model development differs from traditional programming, in the end, a model is nothing but code and must be managed as such: artifacts to produce the model are just source code that must be stored and versioned in a revision control tool. It must be tested, so that accuracy is high before they are deployed. Tests in ML are somewhat elaborate, as they typically include tuning hyperparameters and cross-validation. Finally, it should be monitored, so that the quality of the predictions is not compromised by a lack of quality in the data. There is also monitoring the accuracy of predictions, a subject we will come back later.

These steps boil down to building the same kind of data pipelines as in traditional BI-style analytics (granted, some of the tools are different), and can be managed for faster delivery and better quality with the same DataOps methodology:

  • Orchestrating data pipelines built with different tools,
  • Managing several development environments using branch-and-merge revision control tools,
  • Deploying analytics to production using infrastructure-as-code techniques (as in DevOps), and
  • Automating tests that monitor quality both for the code (as in DevOps) and the data.

We hope we have made the point that whatever specific set of tools and frameworks, consistent with this DataOps methodology, you adopt for a traditional data management project, can also be applied to an ML-based analytics project.

Now, are there any MLOps goals that cannot be attained by judiciously applying DataOps? Does the specific nature of a supervised ML model, namely, that it is built by learning the data it is trained with, imply any special capability not covered by well-managed DevOps-style pipelines?

The answer is in the monitoring step. The predictive performance of a model in production may degrade with time because of several reasons. The most frequent include:

  1. The model may be sensitive to noise.
  2. The general distribution of the input data features may drift with time (data drift). For instance, car buyer preferences may change as oil prices change, so a car insurance model may be affected by a change in the distribution of car buyer preference features.
  3. The concept underlying the target variable, i.e., the statistical properties of the model output, may have changed (concept drift) due to unforeseen changes in business conditions built into your model assumptions, even when the distribution of the input data has not. For an extreme example, a credit default model trained with the economic conditions of before the global financial crisis would have been labeled quite differently than after it. And, frequently, you may no longer have the right features.

Noise and data drift can be handled with a DataOps-style test automation suite. The former, by casting anomaly detection algorithms that discard true noises as part of your tests; the latter, by estimating and storing the distribution of the important features on the training data (detecting such important features is key to enhance the explainability of your model, a desirable property of it) and comparing regularly the distributions of the incoming features with the stored ones. An example of a framework that makes it easy to build pipelines that detect data drift by comparing distributions is AWS Deequ.

Concept drift is different. Here, to monitor the accuracy of the model, in practice, an SME is frequently involved to look at the predictions of the model on new incoming data and compare them against new labels for this data at this point in time. If the discrepancy is high, we might be in presence of concept drift, and the model needs to be retrained. As labeling is expensive, active learning is one popular category of algorithms designed to deal with this problem, where the learning algorithm in production is selective about which incoming data to ask the expert to label:  it gathers a small set that produces the maximum corrective effect. Bringing humans experts into the loop via a data pipeline with no intelligence about what data should be relabelled is bound to fail in real cases. Here are some other techniques for managing concept drift.

Is this the whole story? what about topics such as governance and security of data and ML models? Are these topics covered of DataOps and MLOps? Should for instance protection of models against adversarial attacks be part of MLOps (which clearly is not something covered by DataOps)? It depends on how broad we define both DataOps and MLOps. There are people who say that governance and security must be dealt with separately: this is probably the reason we hear about DevSecOps for integrating security into software development as an extension of DevOps and DataGovOps for automating governance as an extension of DataOps. In any case, we will defer this topic to another blog.

In summary, my interpretation of MLOps is simply that it the same as DataOps applied to ML models producing analytics, with an added capability to manage concept drift that makes sure the model remains accurate over time.

If you enjoyed reading Fernando’s thoughts, don’t forget to watch his 15-minute podcast that helps you think like a CDO.