Quickstart
This document acts as a quickstart guide to showcase indicative features of the MLSysOps Framework. Please refer to the installation guide for more detailed installation instructions, or the design document for more details regarding MLSysOps's architecture.
MLSysOps Framework Installation🔗
The main prerequisite is that there is Karmada instance installed, with at least one Kubernetes cluster registered. We assume that the Karmada is installed in a standalone cluster. Karmada instance should include the karmada-search plugin. You can follow the instructions in Testbed installation to create the appropriate environment.
MLSysOps Framework consists of three main components called MLSysOps Agents. These components require the following services to operate before starting:
- Ejabberd XMPP Server
- Redis
- Docker installed in Karmada-Management VM
There are two services that provide additional functionalities to the user:
- Northbound API: This service is part of the MLSysOps agents. It provides endpoints for controlling the components and behaviors of the agents.
- ML Connector: This service is responsible for managing and deploying Machine Learning models. It exposes its functionality through a separate API.
To ensure the correct bootstrap, the agents should start in the following order:
- Continuum agent
- Cluster agent
- Node agents
All the deployments take place in a Kubernetes cluster, in separate namespace 'mlsysops-framework'. All the third-party services, as well as the Continuum agent are deployed in the managament cluster, the same that is installed in karmada host.
Step 1: Clone the repo🔗
git clone https://github.com/mlsysops-eu/mlsysops-framework
and enter deployments directory
cd deployments
Step 2: System descriptions preparation🔗
Before the installation process takes place, system descriptions for every layer must be prepared. A system description is a YAML file, implemented as Kubernetes CRDs. Examples can be found in descriptions/ directory. The descriptions for each layer reside in the respectively named directory: continuum, clusters, nodes. Each file MUST have the name of the corresponding hostname, followed by the .yaml or .yml suffix. For example, a machine at the node level, with hostname node-1, should have a description file named node-1.yaml under the directory nodes/.
- Continuum level descriptions, require one single file, that declare the continuumID and the clusters that we allow MLSysOps to manage.
- Cluster level descritptions, require a file for each cluster registered in Karmada. It contains the clusterID and a list of node hostnames, that MLSysOps is allowed to manage.
- Node level descriptions, contain the detailed information about the node resources. Example here.
Before deploying, prepare system descriptions as Kubernetes CRDs:
- Stored in the
descriptions/directory
📁 File structure:🔗
descriptions/
├── continuum/
│ └── <continuum-hostname>.yaml
├── clusters/
│ └── <cluster-hostname>.yaml
└── nodes/
└── <node-hostname>.yaml
Descriptions define IDs, managed components, and resource details. All files are required before installation.
Step 3: Deploy the Framework🔗
There are two ways to deploy the framework:
Option 1: Automated using the MLSysOps CLI🔗
You can install the CLI in two ways:
From TestPyPI:
From GitHub (includes deployments folder):
This exposes the mls command.
Set environment variables:
export KARMADA_HOST_KUBECONFIG=<path to host kubeconfig>
export KARMADA_API_KUBECONFIG=<path to api kubeconfig>
export KARMADA_HOST_IP=<host IP>
Run deployment:
This will:
- Deploy core services (ejabberd, redis, API service)
- Register system descriptions
- Deploy all agents in correct order
Alternative: You can also run the CLI script directly:
Wait for all pods to be created:
Option 2: Manual Deployment🔗
Follow the order below to deploy manually if you prefer full control.
Management Cluster (Continuum)🔗
-
Create namespace:
-
Install services:
-
Start ML Connector:
-
Apply RBAC:
-
Add configuration and system descriptions:
-
Start the Continuum Agent:
Karmada API Cluster (Cluster Agents)🔗
-
Apply policies and namespace:
-
Add system descriptions:
-
Start Cluster Agents:
Node Agents🔗
- Ensure node descriptions are in place
- Add them via ConfigMap:
kubectl create configmap node-system-descriptions --from-file=descriptions/nodes --namespace=mlsysops-framework
- Start Node Agents:
Step 4: Deploy a test application🔗
We use a simple TCP Client - Server application, that send messages periodically. The files are in tests/application of the repo.
Update the test_CR and test_MLSysOps_description, with the node names of the cluster and the clusterID.
apply the CR:
kubectl apply -f tests/application/test_CR.yaml
or the description via the MLS CLI:
cli/mls.py apps deploy-app --path tests/application/test_MLSysOps_descirption.yaml
You can watch the pods starting and be managed by the MLSysOps Framework. The client pod will be relocated every 30 seconds, with round-robin logic to every worker node.
kubectl get pods -n mlsysops --context clusterID