AI

MLFlow vs. Kubeflow: What is the Difference?

MLFlow vs. Kubeflow

Introduction

If you have studied in the part of a data science team or if you are in a engineer team, chances are that you have heard of both Kubeflow and MLFlow. These are both machine learning platforms with a bunch of open source tools, however they have differences.

Both of them aid in the the development of entire machine learning model formats. Kubeflow is more pipe line focused while MLFlow is more experiment tracking focused. In this article we will explore the similarities and differences when it comes to MLFlow and Kubeflow.

Also Read: What is the Internet of Things (IoT)?

What is MLFlow?

MLFlow is an open source framework that aids in the machine learning cycle. It contains components that allow you to monitor your model during the training code process. It also has the ability to store models, load models, and create pipelines.

The framework has three main features that we will touch upon. These are MLflow tracking, MLflow projects, and MLflow models. It also offers centralized lifecycle stage transitions. First we will start by looking at MLflow tracking.

MLFlow Tracking

MLflow Tracking is probably the most used tool for a Data Scientist when performing experiment tracking. There are four items that are tracked in MLFlow tracking. These are parameters, metrics, artifacts, and the source code. The logging parameters are key value pairs of the input parameters. Metrics refers to evaluation metrics.

The artifacts are the output files. Artifacts may appear in any type of format including images. Source code refers to the original code which will sometimes be linked to another open source platform known as GitHub. These items are very easy to track in code, usually only needing one line each. In general, tracking allows you to create an extensive logging framework around your model. You can define custom metrics so that after a run you can compare the output to previous runs.

MLFlow Projects

An MLFlow project is a way to package data science code, usually Python, into a reusable way. It can also be used to package machine learning models. Projects also provides a way for running projects using an API and a command line interface. This makes it easy to chain projects together into something we call a workflow. In general MLFlow projects allow you to organize and provide a description for your code to other data scientists.

The projects themselves may be either a directory of files or a Git repository that contains your code using descriptor files. It is important to not that by default, MLflow uses a new, temporary working directory for Git projects. This means that you should generally pass any file arguments to MLflow project using absolute, not relative, paths. MLflow currently supports the following project environments: Virtualenv environment, conda environment, Docker container environment, and system environment.

MLFlow Models

MLFlow models are a standard format that is used to package machine learning models. This is done through a variety of downstream tools such as real time serving through Apache Spark. Each MLFlow Model is a directory containing arbitrary files, together with an machine learning model file in the root of the directory that can define multiple flavors that the model can be viewed in.

What do we mean when we say flavors?

Well flavors are the key concept that makes MLflow Models powerful. They are a convention that deployment tools can use to understand the model, which makes it possible to write tools that work with models from any ML library without having to integrate each tool with each library. There are plenty of built in flavors that are supported such as a python function flavor.

Model Registry

The MLFlow central model registry helps to manage the entire lifecycle of a MLFlow model. It does through the use of API’s, UI’s, and a centrailized model store. It also helps facilitate things like stage transition, model versions, code versions, and model lineage. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Each distinct model version can be assigned one stage at any given time. MLflow provides predefined stages for common use-cases such as staging, production or archived. You can transition a model version from one stage to another stage. Before you can add a model to the Model Registry, you must log it using the log model methods of the corresponding model flavors. Once a model has been logged, you can add, modify, update, transition, or delete it in the Model Registry through the UI or the API.

What is Kubeflow?

Similar to MLFlow, Kubeflow is also an open source tool. It allows models to be trained and deployed on Kubernetes. It is a platform that also allows cloud vendor applications to run smoothly. There are four main components of Kubeflow that we will discuss. These are notebooks, tensorflow model training, Kubeflow pipelines, and deployment.

Kubeflow is a platform for data scientists who want to build and experiment with ML pipelines. Kubeflow is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving. Kubeflow builds on Kubernetes as a system for deploying, scaling, and managing complex systems.

Why containerize your ML Applications

A big problem with machine learning models is that very few models actually make it to production. A survery conducted in November 2021 had an overwhelming number of respondents reporting that only 0–20% of models deployed to production, without a docker image. About 35% of the people reported that technical hurdles were the main impediment. To try and alleviate this issue it is best to containerize machine learning applications.

What does containerizing them mean?

Well containers make code reproducible, portable, and scalable. Think of a container as a system image containing your code, your model, and all its dependencies installed. Once you build a container image, you can spin up as many instances as you would need using Amazon ECS, Kubernetes, or any other similar frameworks. These frameworks make sure enough instances are running, old ones are cycled out, and if there’s a crash they are restarted. Modern DevOps teams have already migrated to using containers for all engineering code. We can do the same for our models.

Benefits of Containers

Before containerization, we had virutalization. Virtualization was the use of virtual machines across the cloud environment using virutal command centers. Virtualization approaches have had some challenges that made these environments inefficient. These involved OS dependencies, environment inconsistencies, and isolation levels. These issues and many more were all solved through the use of containerization. Containerization is more efficient than virtualization, making it a natural evolution of the latter.

Whereas virtualization is vital in distributing several operating systems on a single tracking server, containerization is more flexible and granular. It focuses on breaking down operating systems into chunks that you can use more efficiently. Additionally, an application container provides a way to package apps in a portable, software-defined environment. Containerization allows software developers to create and deploy apps faster and more securely.

Using traditional methods, you develop code in a specific computing environment that often results in errors and bugs when you transfer it to a new location. Containerization eliminates this problem by allowing you to bundle the supplication code together with its related configuration files, dependencies and libraries. You then abstract that single package of software away from the host operating system, allowing it to stand alone and become portable. Some benefits of containerization include:

Portability.

An application container creates an executable software package abstracted away from the host OS. Hence, it is not dependent upon or tied to the host OS, making it portable and allowing it to run consistently and uniformly across any platform or cloud.

Speed.

Developers refer to containers as lightweight because they share the host machine’s OS kernel and aren’t subject to extra overhead. Their lightweight feature drives higher production server efficiencies and reduces single server and licensing costs. It also speeds up start-time since there is no OS to boot.

Scalability.

Application container technology offers high scalability. An application container can handle increasing workloads by reconfiguring the existing architecture to enable resources using a service-oriented app design.

Security.

Isolating applications as containers prevents malicious code from affecting other containerized apps or the host system. You can also define security permissions to automatically block access to unwanted components that seek to enter other containers or limit communications.

Why ML on Kubernetes

Now that we talked about what containers are and the benefits of them, let us look at a container called Kubernetes, which was developed at Google, who are also responsible for developing the Google Cloud. Kubernetes is an open source container or orchestration tool that helps schedule and automate deployment, management and scaling of containerized applications. The Kubernetes platform is all about optimization.

It automates many of the DevOps processes that were previously handled manually and simplifying the work of software developers. Recently, companies have started realizing that in machine learning, research and development plays a huge part. But if not enough time, budget and effort is spent in properly designing and deploying machine learning systems in a reliable, available, discoverable and efficient manner, it becomes challenging to reap all of its benefits.

By thinking of machine learning systems with an engineering mindset, it becomes clear that Kubernetes is a good match to achieve the aforementioned reliability, availability and time-to-market challenges. Kubernetes helps adopting the main principles of machine learning operations, and by doing that we are increasing the chances that our machine learning systems will provide the expected value after all the investment made in the research, development and design steps.

Kubeflow Components

Notebooks, tensorflow model training, Kubeflow pipelines, and model deployment are the four major components of Kubeflow that we will discuss.

Notebooks.

Kubeflow Notebooks provides a way to run web-based development environments inside your Kubernetes cluster by running them inside Pods. This is a built in notebook server service that has native support for JupyterLab, which uses notebook containers, RStudio, and VS code. Users can create notebook containers and add compute resources directly in the cluster, rather than locally on their workstations. Administrators can provide standard notebook images for their organization with required packages pre-installed. Access control is managed by Kubeflow’s RBAC, enabling easier notebook sharing across the organization.

Tensorflow Model Training.

Another built in operator, that makes it easy to run models at scale on Kubernetes. In particular, Kubeflow’s job operator can handle distributed TensorFlow training jobs. The training controller can be configured to use CPUs or GPUs and to suit various cluster sizes.

Kubeflow Pipelines.

Kubeflow Pipelines is a comprehensive solution for deploying and managing end-to-end machine learning workflows. The solution uses docker containers. Use Kubeflow Pipelines for rapid and reliable experimentation. You can schedule and compare runs, and examine detailed reports on each.

Deployment.

Deploying machine learning models on Kubernetes is facilitated through the use of external add-ons in Kubernetes.

MLFlow vs Kubeflow Similarities

Now that we have gone through what MLFlow and Kubeflow are, let us start to compare the similarities between the two. First they are both open source platforms. This means they are free for anyone to use. Despite being open source, they both receive support from various organizations.

MLFlow and Kubeflow both are used to aid in the development of machine learning models. Both of them have tons of customization options, they also are both scalable and portable. Both also have many options for model management, model tracking, and model performance. They can also manage and set resource quotas. Finally both MLFlow and Kubeflow can be considered as machine learning platforms on their own.

MLFlow vs Kubeflow Differences

Now that we examined some of the similarities between these two platforms, it is time to look at differences. Some key differences between the two can be seen below:

MLFlow vs. Kubeflow - Differences
MLFlow vs. Kubeflow – Differences

Methods.

Kubeflow is a container orchestration system and is a much more complex tool. This means that when training a model, everything happens within Kubeflow. This ensures reproducibility. On the other hand, MLFlow is a Python program where the training depends on the code written by the developer.

Cooperative Environments.

Kubeflow tracking is done with Kubeflow metadata. MLFlow uses experiment tracking and has the ability to develop locally and track runs through a logging process using remote archives.

Deployment.

In Kubeflow, pipelines help to push forward with model deployment. In MLFlow, the model registry is responsible for pushing forward model deployment.

Source: YouTube

Also Read: AI global arms race.

Conclusion

Kubeflow and MLFlow are both very useful tools to use for data scientists. There are certain situations where one is better than the other, for example Kubeflow is better for large scale projects with multi-step workflows over MLFlow. These type of projects usually will need to deliver a single production machine learning solution. There may be other situations, such as cloud deployment models, where you use both Kubeflow and MLFlow as they are both category leaders when it comes to efficient workflows. In the end, the choice is up to you. Thank you for reading this article.

References

What Is JupyterHub? https://www.run.ai/guides/machine-learning-operations/mlflow-vs-kubeflow. Accessed 14 Feb. 2023.

Kubeflow vs. MLflow. https://www.topcoder.com/thrive/articles/kubeflow-vs-mlflow. Accessed 14 Feb. 2023.

Inc., Royal Cyber. “Kubeflow vs. MLflow — An MLOps Comparison – Royal Cyber Inc.” Medium, 18 Feb. 2022, https://royalcyberinc.medium.com/kubeflow-vs-mlflow-an-mlops-comparison-36db04a665d8. Accessed 14 Feb. 2023.

MLOps.community. “ML Flow vs Kubeflow 2022 // Byron Allen // Coffee Sessions #108.” YouTube, Video, 19 July 2022, https://www.youtube.com/watch?v=9YcLBSqZNzE. Accessed 14 Feb. 2023.