Skip to main content

· 4 min read
Joseph

We’re pleased to announce the release of Koordinator v0.2.0.

Overview​

Koordinator v0.1.0 implements basic co-location scheduling capabilities, and after the project was released, it has received attention and positive responses from the community. For some issues that everyone cares about, such as how to isolate resources for best-effort workloads, how to ensure the runtime stability of latency-sensitiv applications in co-location scenarios, etc., we have enhanced node-side scheduling capabilities in koordinator v0.2.0 to solve these problems.

Install or Upgrade to Koordinator v0.2.0​

Install with helms​

Koordinator can be simply installed by helm v3.5+, which is a simple command-line tool and you can get it from here.

# Firstly add koordinator charts repository if you haven't do this.
$ helm repo add koordinator-sh https://koordinator-sh.github.io/charts/

# [Optional]
$ helm repo update

# Install the latest version.
$ helm install koordinator koordinator-sh/koordinator --version 0.2.0

Upgrade with helm​

# Firstly add koordinator charts repository if you haven't do this.
$ helm repo add koordinator-sh https://koordinator-sh.github.io/charts/

# [Optional]
$ helm repo update

# Upgrade the latest version.
$ helm upgrade koordinator koordinator-sh/koordinator --version 0.2.0 [--force]

For more details, please refer to the installation manual.

Isolate resources for best-effort workloads​

In Koodinator v0.2.0, we refined the ability to isolate resources for best-effort worklods.

koordlet will set the cgroup parameters according to the resources described in the Pod Spec. Currently supports setting CPU Request/Limit, and Memory Limit.

For CPU resources, only the case of request == limit is supported, and the support for the scenario of request <= limit will be supported in the next version.

Active eviction mechanism based on memory safety thresholds​

When latency-sensitiv applications are serving, memory usage may increase due to bursty traffic. Similarly, there may be similar scenarios for best-effort workloads, for example, the current computing load exceeds the expected resource Request/Limit.

These scenarios will lead to an increase in the overall memory usage of the node, which will have an unpredictable impact on the runtime stability of the node side. For example, it can reduce the quality of service of latency-sensitiv applications or even become unavailable. Especially in a co-location environment, it is more challenging.

We implemented an active eviction mechanism based on memory safety thresholds in Koodinator.

koordlet will regularly check the recent memory usage of node and Pods to check whether the safty threshold is exceeded. If it exceeds, it will evict some best-effort Pods to release memory. This mechanism can better ensure the stability of node and latency-sensitiv applications.

koordlet currently only evicts best-effort Pods, sorted according to the Priority specified in the Pod Spec. The lower the priority, the higher the priority to be evicted, the same priority will be sorted according to the memory usage rate (RSS), the higher the memory usage, the higher the priority to be evicted. This eviction selection algorithm is not static. More dimensions will be considered in the future, and more refined implementations will be implemented for more scenarios to achieve more reasonable evictions.

The current memory utilization safety threshold default value is 70%. You can modify the memoryEvictThresholdPercent in ConfigMap slo-controller-config according to the actual situation,

apiVersion: v1
kind: ConfigMap
metadata:
name: slo-controller-config
namespace: koordinator-system
data:
colocation-config: |
{
"enable": true
}
resource-threshold-config: |
{
"clusterStrategy": {
"enable": true,
"memoryEvictThresholdPercent": 70
}
}

CPU Burst - Improve the performance of latency-sensitive applications​

CPU Burst is a service level objective (SLO)-aware resource scheduling feature. You can use CPU Burst to improve the performance of latency-sensitive applications. CPU scheduling for a container may be throttled by the kernel due to the CPU limit, which downgrades the performance of the application. Koordinator automatically detects CPU throttling events and automatically adjusts the CPU limit to a proper value. This greatly improves the performance of latency-sensitive applications.

The code of CPU Burst has been developed and is still under review and testing. It will be released in the next version. If you want to use this ability early, you are welcome to participate in Koordiantor and improve it together. For more details, please refer to the PR #73.

More​

For more details, please refer to the Documentation. Hope it helps!

· 5 min read
Joseph
Fangsong Zeng

We’re pleased to announce the release of Koordinator v0.1.0.

Overview​

Koordinator is a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes. It aims to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilizations.

Key Features​

Koordinator enhances the kubernetes user experiences in the workload management by providing the following:

  • Well-designed priority and QoS mechanism to co-locate different types of workloads in a cluster and run different types of pods on a single node. Allowing for resource overcommitments to achieve high resource utilizations but still satisfying the QoS guarantees by leveraging an application profiling mechanism.
  • Fine-grained resource orchestration and isolation mechanism to improve the efficiency of latency-sensitive workloads and batch jobs.
  • Flexible job scheduling mechanism to support workloads in specific areas, e.g., big data, AI, audio and video.
  • A set of tools for monitoring, troubleshooting and operations.

Node Metrics​

Koordinator defines the NodeMetrics CRD, which is used to record the resource utilization of a single node and all Pods on the node. koordlet will regularly report and update NodeMetrics. You can view NodeMetrics with the following commands.

$ kubectl get nodemetrics node-1 -o yaml
apiVersion: slo.koordinator.sh/v1alpha1
kind: NodeMetric
metadata:
creationTimestamp: "2022-03-30T11:50:17Z"
generation: 1
name: node-1
resourceVersion: "2687986"
uid: 1567bb4b-87a7-4273-a8fd-f44125c62b80
spec: {}
status:
nodeMetric:
nodeUsage:
resources:
cpu: 138m
memory: "1815637738"
podsMetric:
- name: storage-service-6c7c59f868-k72r5
namespace: default
podUsage:
resources:
cpu: "300m"
memory: 17828Ki

Colocation Resources​

After the Koordinator is deployed in the K8s cluster, the Koordinator will calculate the CPU and Memory resources that have been allocated but not used according to the data of NodeMetrics. These resources are updated in Node in the form of extended resources.

koordinator.sh/batch-cpu represents the CPU resources for Best Effort workloads, koordinator.sh/batch-memory represents the Memory resources for Best Effort workloads.

You can view these resources with the following commands.

$ kubectl describe node node-1
Name: node-1
....
Capacity:
cpu: 8
ephemeral-storage: 103080204Ki
koordinator.sh/batch-cpu: 4541
koordinator.sh/batch-memory: 17236565027
memory: 32611012Ki
pods: 64
Allocatable:
cpu: 7800m
ephemeral-storage: 94998715850
koordinator.sh/batch-cpu: 4541
koordinator.sh/batch-memory: 17236565027
memory: 28629700Ki
pods: 64

Cluster-level Colocation Profile​

In order to make it easier for everyone to use Koordinator to co-locate different workloads, we defined ClusterColocationProfile to help gray workloads use co-location resources. A ClusterColocationProfile is CRD like the one below. Please do edit each parameter to fit your own use cases.

apiVersion: config.koordinator.sh/v1alpha1
kind: ClusterColocationProfile
metadata:
name: colocation-profile-example
spec:
namespaceSelector:
matchLabels:
koordinator.sh/enable-colocation: "true"
selector:
matchLabels:
sparkoperator.k8s.io/launched-by-spark-operator: "true"
qosClass: BE
priorityClassName: koord-batch
koordinatorPriority: 1000
schedulerName: koord-scheduler
labels:
koordinator.sh/mutated: "true"
annotations:
koordinator.sh/intercepted: "true"
patch:
spec:
terminationGracePeriodSeconds: 30

Various Koordinator components ensure scheduling and runtime quality through labels koordinator.sh/qosClass, koordinator.sh/priority and kubernetes native priority.

With the webhook mutating mechanism provided by Kubernetes, koord-manager will modify Pod resource requirements to co-located resources, and inject the QoS and Priority defined by Koordinator into Pod.

Taking the above Profile as an example, when the Spark Operator creates a new Pod in the namespace with the koordinator.sh/enable-colocation=true label, the Koordinator QoS label koordinator.sh/qosClass will be injected into the Pod. According to the Profile definition PriorityClassName, modify the Pod's PriorityClassName and the corresponding Priority value. Users can also set the Koordinator Priority according to their needs to achieve more fine-grained priority management, so the Koordinator Priority label koordinator.sh/priority is also injected into the Pod. Koordinator provides an enhanced scheduler koord-scheduler, so you need to modify the Pod's scheduler name koord-scheduler through Profile.

If you expect to integrate Koordinator into your own system, please learn more about the core concepts.

CPU Suppress​

In order to ensure the runtime quality of different workloads in co-located scenarios, Koordinator uses the CPU Suppress mechanism provided by koordlet on the node side to suppress workloads of the Best Effort type when the load increases. Or increase the resource quota for Best Effort type workloads when the load decreases.

When installing through the helm chart, the ConfigMap slo-controller-config will be created in the koordinator-system namespace, and the CPU Suppress mechanism is enabled by default. If it needs to be closed, refer to the configuration below, and modify the configuration of the resource-threshold-config section to take effect.

apiVersion: v1
kind: ConfigMap
metadata:
name: slo-controller-config
namespace: {{ .Values.installation.namespace }}
data:
...
resource-threshold-config: |
{
"clusterStrategy": {
"enable": false
}
}

Colocation Resources Balance​

Koordinator currently adopts a strategy for node co-location resource scheduling, which prioritizes scheduling to machines with more resources remaining in co-location to avoid Best Effort workloads crowding together. More rich scheduling capabilities are on the way.

Tutorial - Colocation of Spark Jobs​

Apache Spark is an analysis engine for large-scale data processing, which is widely used in Big Data, SQL Analysis and Machine Learning scenarios. We provide a tutorial to help you how to quickly use Koordinator to run Spark Jobs in colocation mode with other latency sensitive applications. For more details, please refer to the tutorial.

Summary​

Fore More details, please refer to the Documentation. Hope it helps!