Skip to content

This document guides you through the installation of the MLSysOps framework and all required components for executing all supported scenarios.

We assume a vanilla ubuntu 22.04 environment, although the MLSysOps framework is able to run on a number of distros.

We will be installing and setting up each component individually:

Core Framework🔗

MLSysOps Framework Installation🔗

The main prerequisite is that there is Karmada instance installed, with at least one Kubernetes cluster registered. We assume that the Karmada is installed in a standalone cluster. Karmada instance should include the karmada-search plugin.

MLSysOps Framework consists of three main components called MLSysOps Agents. These components require the following services to operate before starting:

  • Ejabberd XMPP Server
  • Redis
  • Docker installed in Karmada-Management VM

There are two services that provide additional functionalities to the user:

  • Northbound API: This service is part of the MLSysOps agents. It provides endpoints for controlling the components and behaviors of the agents.
  • ML Connector: This service is responsible for managing and deploying Machine Learning models. It exposes its functionality through a separate API.

To ensure the correct bootstrap, the agents should start in the following order: 1. Continuum agent 2. Cluster agent 3. Node agents

All the deployments take place in a Kubernetes cluster, in separate namespace 'mlsysops-framework'. All the third-party services, as well as the Continuum agent are deployed in the managament cluster, the same that is installed in karmada host.

System descriptions preparation🔗

Before the installation process takes place, system descriptions for every layer must be prepared. A system description is a YAML file, implemented as Kubernetes CRDs. Examples can be found in descriptions/ directory. The descriptions for each layer reside in the respectively named directory: continuum, clusters, nodes. Each file MUST have the name of the corresponding hostname, followed by the .yaml or .yml suffix. For example, a machine at the node level, with hostname node-1, should have a description file named node-1.yaml under the directory nodes/.

  • Continuum level descriptions, require one single file, that declare the continuumID and the clusters that we allow MLSysOps to manage.
  • Cluster level descritptions, require a file for each cluster registered in Karmada. It contains the clusterID and a list of node hostnames, that MLSysOps is allowed to manage.
  • Node level descriptions, contain the detailed information about the node resources. Example here.

Option 1: Automated Deployment🔗

MLSysOps CLI tool can be used to automatically deploy all the necessary components. It needs the kubeconfigs of Karmada host cluster and Karmada API.

  • export KARMADA_HOST_KUBECONFIG=<path to karmada host kubeconfig>
  • export KARMADA_API_KUBECONFIG=<path to karmada api kubeconfig>
  • export KARMADA_HOST_IP=<karmada host ip>

And then execute the CLI command inside deployments directory, with descriptions directory files prepared: - python3 deploy.py

Option 2: Manual Deployment🔗

Continuum - Management Cluster🔗

In Karmada host cluster: export KUBECONFIG=<karmada host kubeconfig>

  • Create namespace
  • kubectl apply -f namespace.yml

  • Install Required services

  • Change POD_IP to Karmada host IP in xmpp/deployment.yaml.
  • kubectl apply -f xmpp/deployment.yaml
  • Change {{ KARMADA_HOST_IP }} to Karmada host IP in api-service-deployment.yaml.
  • kubectl apply -f api-service-deployment.yaml
  • Change {{ KARMADA_HOST_IP }} to Karmada host IP in redis-stack-deployment.yaml.
  • kubectpl apply -f redis-stack-deployment.yaml
  • docker compose up -d -f <mlconnector.docker-composer.yaml>

  • Apply RBAC

  • kubectl apply -f mlsysops-rbac.yaml

  • Attach Karmada API kubeconfig as ConfigMap

  • kubectl create configmap continuum-karmadapi-config --from-file=<path/to/local/karmada-api.kubeconfig> --namespace=mlsysops-framework
  • Attach Continuum system description as ConfigMap
  • kubectl create configmap continuum-system-description --from-file=<descriptions/continuum/hostname.yaml> --namespace=mlsysops-framework

  • Start continuum agent

  • kubectl apply -f continuum-agent-daemonset.yaml

In Karmada API Server: export KUBECONFIG=<karmada api server kubeconfig>

Cluster deployments🔗

  • Setup Karmada propagation policies
  • kubectl apply -f cluster-propagation-policy.yaml
  • kubectl apply -f propagation-policy.yaml

  • Create the namespace

  • kubectl apply -f namespace.yaml

  • Apply RBAC

  • kubectl apply -f mlsysops-rbac.yaml

  • Create a configmap based on the system description, using as the namefile, the hostname of the cluster manage node hostname

  • kubectl create configmap cluster-system-description --from-file=descriptions/clusters --namespace=mlsysops-framework

  • Apply the daemonset YAML file

  • Change {{ KARMADA_HOST_IP }} to Karmada host IP in cluster-agents-daemonset.yaml.
  • kubectl apply -f cluster-agents-daemonset.yaml

Nodes deployment🔗

  • Namespaces and RBAC were created with the cluster setup.
  • Prepare the system description, for each node, and name each file with the host name.
  • Create a configmap based on the system description, for each node.
  • kubectl create configmap node-system-descriptions --from-file=descriptions/nodes --namespace=mlsysops-framework
  • env variable.
  • Apply the daemonset YAML file
  • kubectl apply -f node-agents-daemonset.yaml

Note: Be aware that some instructions might override existing tools and services.