Koordinator v0.1.0 - QoS based scheduling system

March 31, 2022 · 5 min read

Joseph

Koordinator maintainer

Fangsong Zeng

Koordinator maintainer

We’re pleased to announce the release of Koordinator v0.1.0.

Overview

Koordinator is a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes. It aims to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilizations.

Key Features

Koordinator enhances the kubernetes user experiences in the workload management by providing the following:

Well-designed priority and QoS mechanism to co-locate different types of workloads in a cluster and run different types of pods on a single node. Allowing for resource overcommitments to achieve high resource utilizations but still satisfying the QoS guarantees by leveraging an application profiling mechanism.
Fine-grained resource orchestration and isolation mechanism to improve the efficiency of latency-sensitive workloads and batch jobs.
Flexible job scheduling mechanism to support workloads in specific areas, e.g., big data, AI, audio and video.
A set of tools for monitoring, troubleshooting and operations.

Node Metrics

Koordinator defines the NodeMetrics CRD, which is used to record the resource utilization of a single node and all Pods on the node. koordlet will regularly report and update NodeMetrics. You can view NodeMetrics with the following commands.

$ kubectl get nodemetrics node-1 -o yaml
apiVersion: slo.koordinator.sh/v1alpha1
kind: NodeMetric
metadata:
  creationTimestamp: "2022-03-30T11:50:17Z"
  generation: 1
  name: node-1
  resourceVersion: "2687986"
  uid: 1567bb4b-87a7-4273-a8fd-f44125c62b80
spec: {}
status:
  nodeMetric:
    nodeUsage:
      resources:
        cpu: 138m
        memory: "1815637738"
  podsMetric:
  - name: storage-service-6c7c59f868-k72r5
    namespace: default
    podUsage:
      resources:
        cpu: "300m"
        memory: 17828Ki

Colocation Resources

After the Koordinator is deployed in the K8s cluster, the Koordinator will calculate the CPU and Memory resources that have been allocated but not used according to the data of NodeMetrics. These resources are updated in Node in the form of extended resources.

koordinator.sh/batch-cpu represents the CPU resources for Best Effort workloads, koordinator.sh/batch-memory represents the Memory resources for Best Effort workloads.

You can view these resources with the following commands.

$ kubectl describe node node-1
Name:               node-1
....
Capacity:
  cpu:                          8
  ephemeral-storage:            103080204Ki
  koordinator.sh/batch-cpu:     4541
  koordinator.sh/batch-memory:  17236565027
  memory:                       32611012Ki
  pods:                         64
Allocatable:
  cpu:                          7800m
  ephemeral-storage:            94998715850
  koordinator.sh/batch-cpu:     4541
  koordinator.sh/batch-memory:  17236565027
  memory:                       28629700Ki
  pods:                         64

Cluster-level Colocation Profile

In order to make it easier for everyone to use Koordinator to co-locate different workloads, we defined ClusterColocationProfile to help gray workloads use co-location resources. A ClusterColocationProfile is CRD like the one below. Please do edit each parameter to fit your own use cases.

apiVersion: config.koordinator.sh/v1alpha1
kind: ClusterColocationProfile
metadata:
  name: colocation-profile-example
spec:
  namespaceSelector:
    matchLabels:
      koordinator.sh/enable-colocation: "true"
  selector:
    matchLabels:
      sparkoperator.k8s.io/launched-by-spark-operator: "true"
  qosClass: BE
  priorityClassName: koord-batch
  koordinatorPriority: 1000
  schedulerName: koord-scheduler
  labels:
    koordinator.sh/mutated: "true"
  annotations: 
    koordinator.sh/intercepted: "true"
  patch:
    spec:
      terminationGracePeriodSeconds: 30

Various Koordinator components ensure scheduling and runtime quality through labels koordinator.sh/qosClass, koordinator.sh/priority and kubernetes native priority.

With the webhook mutating mechanism provided by Kubernetes, koord-manager will modify Pod resource requirements to co-located resources, and inject the QoS and Priority defined by Koordinator into Pod.

Taking the above Profile as an example, when the Spark Operator creates a new Pod in the namespace with the koordinator.sh/enable-colocation=true label, the Koordinator QoS label koordinator.sh/qosClass will be injected into the Pod. According to the Profile definition PriorityClassName, modify the Pod's PriorityClassName and the corresponding Priority value. Users can also set the Koordinator Priority according to their needs to achieve more fine-grained priority management, so the Koordinator Priority label koordinator.sh/priority is also injected into the Pod. Koordinator provides an enhanced scheduler koord-scheduler, so you need to modify the Pod's scheduler name koord-scheduler through Profile.

If you expect to integrate Koordinator into your own system, please learn more about the core concepts.

CPU Suppress

In order to ensure the runtime quality of different workloads in co-located scenarios, Koordinator uses the CPU Suppress mechanism provided by koordlet on the node side to suppress workloads of the Best Effort type when the load increases. Or increase the resource quota for Best Effort type workloads when the load decreases.

When installing through the helm chart, the ConfigMap slo-controller-config will be created in the koordinator-system namespace, and the CPU Suppress mechanism is enabled by default. If it needs to be closed, refer to the configuration below, and modify the configuration of the resource-threshold-config section to take effect.

apiVersion: v1
kind: ConfigMap
metadata:
  name: slo-controller-config
  namespace: {{ .Values.installation.namespace }}
data:
  ...
  resource-threshold-config: |
    {
      "clusterStrategy": {
        "enable": false
      }
    }

Colocation Resources Balance

Koordinator currently adopts a strategy for node co-location resource scheduling, which prioritizes scheduling to machines with more resources remaining in co-location to avoid Best Effort workloads crowding together. More rich scheduling capabilities are on the way.

Tutorial - Colocation of Spark Jobs

Apache Spark is an analysis engine for large-scale data processing, which is widely used in Big Data, SQL Analysis and Machine Learning scenarios. We provide a tutorial to help you how to quickly use Koordinator to run Spark Jobs in colocation mode with other latency sensitive applications. For more details, please refer to the tutorial.

Summary

Fore More details, please refer to the Documentation. Hope it helps!

Overview​

Key Features​

Node Metrics​

Colocation Resources​

Cluster-level Colocation Profile​

CPU Suppress​

Colocation Resources Balance​

Tutorial - Colocation of Spark Jobs​

Summary​