跳到主要内容
版本:v1.4

Capacity Scheduling - Elastic Quota Management

Capacity Scheduling is an ability of koord-scheduler to manage different user's resource usage in a shared-cluster.

Introduction

When several users or teams share a cluster, fairness of resource allocation is very important. the Koordinator provides multi-hierarchy elastic quota management mechanism for the scheduler.

  • It supports configuring quota groups in a tree structure, which is similar to the organizational structure of most companies.
  • It supports the borrowing / returning of resources between different quota groups, for better resource utilization efficiency. The busy quota groups can automatically temporarily borrow the resources from the idle quota groups, which can improve the utilization of the cluster. At the same time, when the idle quota group turn into the busy quota group, it can also automatically take back the "lent-to" resources.
  • It considers the resource fairness between different quota groups. When the busy quota groups borrow the resources from the idle quota groups, the resources can be allocated to the busy quota groups under some fair rules.

Setup

Prerequisite

  • Kubernetes >= 1.18
  • Koordinator >= 0.71

Installation

Please make sure Koordinator components are correctly installed in your cluster. If not, please refer to Installation.

Configurations

Capacity-Scheduling is Enabled by default. You can use it without any modification on the koord-descheduler config.

Use Capacity-Scheduling

Quick Start by Label

1.Create a Deployment quota-example with the YAML file below.

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: default
labels:
quota.scheduling.koordinator.sh/parent: ""
quota.scheduling.koordinator.sh/is-parent: "false"
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi
$ kubectl apply -f quota-example.yaml
elasticquota.scheduling.sigs.k8s.io/quota-example created

$ kubectl get eqs -n default
NAME AGE
test-d 2s

2.Create a pod pod-example with the YAML file below.

apiVersion: v1
kind: Pod
metadata:
name: pod-example
namespace: default
labels:
quota.scheduling.koordinator.sh/name: "quota-example"
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl apply -f pod-example.yaml
pod/pod-example created

3.Verify quota-example has changed.

$ kubectl get eqs -n default quota-example -o yaml
kind: ElasticQuota
metadata:
annotations:
quota.scheduling.koordinator.sh/request: '{"cpu":"40m","memory":"40Mi"}'
quota.scheduling.koordinator.sh/runtime: '{"cpu":"40m","memory":"40Mi"}'
quota.scheduling.koordinator.sh/shared-weight: '{"cpu":"40","memory":"40Gi"}'
creationTimestamp: "2022-10-08T09:26:38Z"
generation: 2
labels:
quota.scheduling.koordinator.sh/is-parent: "false"
quota.scheduling.koordinator.sh/parent: root
manager: koord-scheduler
operation: Update
time: "2022-10-08T09:26:50Z"
name: quota-example
namespace: default
resourceVersion: "39012008"
spec:
max:
cpu: "40"
memory: 40Gi
min:
cpu: "10"
memory: 20Mi
status:
used:
cpu: 40m
memory: 40Mi

Quick Start by Namespace

1.Create namespace

$ kubectl create ns quota-example
namespace/quota-example created

2.Create a Deployment quota-example with the YAML file below.

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: quota-example
labels:
quota.scheduling.koordinator.sh/parent: ""
quota.scheduling.koordinator.sh/is-parent: "false"
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi
$ kubectl apply -f quota-example.yaml
elasticquota.scheduling.sigs.k8s.io/quota-example created

$ kubectl get eqs -n quota-example
NAME AGE
test-d 2s

2.Create a pod pod-example with the YAML file below.

apiVersion: v1
kind: Pod
metadata:
name: pod-example
namespace: quota-example
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl apply -f pod-example.yaml
pod/pod-example created

3.Verify quota-example has changed.

$ kubectl get eqs -n quota-example quota-example -o yaml
kind: ElasticQuota
metadata:
annotations:
quota.scheduling.koordinator.sh/request: '{"cpu":"40m","memory":"40Mi"}'
quota.scheduling.koordinator.sh/runtime: '{"cpu":"40m","memory":"40Mi"}'
quota.scheduling.koordinator.sh/shared-weight: '{"cpu":"40","memory":"40Gi"}'
creationTimestamp: "2022-10-08T09:26:38Z"
generation: 2
labels:
quota.scheduling.koordinator.sh/is-parent: "false"
quota.scheduling.koordinator.sh/parent: root
manager: koord-scheduler
operation: Update
time: "2022-10-08T09:26:50Z"
name: quota-example
namespace: quota-example
resourceVersion: "39012008"
spec:
max:
cpu: "40"
memory: 40Gi
min:
cpu: "10"
memory: 20Mi
status:
used:
cpu: 40m
memory: 40Mi

Quota Debug Api.

$ kubectl -n koordinator-system get lease koord-scheduler --no-headers | awk '{print $2}' | cut -d'_' -f1 | xargs -I {} kubectl -n koordinator-system get pod {} -o wide --no-headers | awk '{print $6}'
10.244.0.64

$ curl 10.244.0.64:10251/apis/v1/plugins/ElasticQuota/quota/quota-example
{
"allowLentResource": true,
"autoScaleMin": {
"cpu": "10",
"memory": "20Mi",
},
"isParent": false,
"max": {
"cpu": "40",
"memory": "40Gi",
},
"min": {
"cpu": "10",
"memory": "20Mi",
},
"name": "quota-example",
"parentName": "root",
"podCache": {
"pod-example": {
"isAssigned": true,
"resource": {
"cpu": "40m",
"memory": "40Mi"
}
}
},
"request": {
"cpu": "40m",
"memory": "40Mi"
},
"runtime": {
"cpu": "40m",
"memory": "41943040",
},
"runtimeVersion": 39,
"sharedWeight": {
"cpu": "40",
"memory": "40Gi",
},
"used": {
"cpu": "40m",
"memory": "40Mi"
}
}

The main different with yaml is that we can find all quota's pods and its status in podCache.

Advanced Configurations

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: false
quota.scheduling.koordinator.sh/parent: "parent"
quota.scheduling.koordinator.sh/allow-lent-resource: true
annotations:
quota.scheduling.koordinator.sh/shared-weight: '{"cpu":"40","memory":"40Gi"}'
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi
  • quota.scheduling.koordinator.sh/is-parent is disposed by the user. It reflects the "child\parent" attribute of the quota group. Default is child.
  • quota.scheduling.koordinator.sh/parent is disposed by the user. It reflects the parent quota name. Default is root.
  • quota.scheduling.koordinator.sh/shared-weight is disposed by the user. It reflects the ability to share the "lent to" resource. Default equals to "max".
  • quota.scheduling.koordinator.sh/allow-lent-resource is disposed by the user. It reflects whether quota group allows lent unused "min" to others.

WebHook Verify

1.Except for the first level quota group, we require that the sum of "min" of all sub quota groups should be less than or equal to the "min" of parent group.

first create parent quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-parent-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: true
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi

then create child quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: false
quota.scheduling.koordinator.sh/parent: "quota-parent-example"
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 20
memory: 20Mi
kubectl apply -f quota-example.yaml
Error from server: error when creating "quota-example.yaml": admission webhook "vquota.kb.io" denied the request: checkMinQuotaSum allChildren SumMinQuota > parentMinQuota, parent: quota-parent-example

2.Parent and child's min\max resource key must same. first create parent quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-parent-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: true
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi

then create child quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: false
quota.scheduling.koordinator.sh/parent: "quota-parent-example"
spec:
max:
cpu: 40
memory: 40Gi
test: 200
min:
cpu: 10
memory: 20Mi
$ kubectl apply -f quota-example.yaml
Error from server: error when creating "quota-example.yaml": admission webhook "vquota.kb.io" denied the request: checkSubAndParentGroupMaxQuotaKeySame failed: quota-parent-example's key is not the same with quota-example

3.Parent group cannot run pod.

first create parent quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-parent-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: true
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi

then create pod:

apiVersion: v1
kind: Pod
metadata:
name: pod-example
namespace: default
labels:
quota.scheduling.koordinator.sh/name: "quota-parent-example"
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl apply -f pod-example_xb.yaml
Error from server: error when creating "pod-example.yaml": admission webhook "vpod.kb.io" denied the request: pod can not be linked to a parentQuotaGroup,quota:quota-parent-example, pod:pod-example

4.The parent of node can only be parent group, not child group.

first create parent quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-parent-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: false
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi

then create child quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: false
quota.scheduling.koordinator.sh/parent: "quota-parent-example"
spec:
max:
cpu: 40
memory: 40Gi
test: 200
min:
cpu: 10
memory: 20Mi
$ kubectl apply -f quota-example.yaml
Error from server: error when creating "elastic-quota-example_xb.yaml": admission webhook "vquota.kb.io" denied the request: quota-example has parentName quota-parent-example but the parentQuotaInfo's IsParent is false

5.A quota group can't be converted on the attribute of parent group\child group.

first create parent quota:

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
name: quota-parent-example
namespace: default
labels:
quota.scheduling.koordinator.sh/is-parent: true
spec:
max:
cpu: 40
memory: 40Gi
min:
cpu: 10
memory: 20Mi

then modify quota.scheduling.koordinator.sh/is-parent:false:

$ kubectl apply -f quota-parent-example.yaml
elastic-quota-example_xb_parent.yaml": admission webhook "vquota.kb.io" denied the request: IsParent is forbidden modify now, quotaName:quota-parent-example

used > runtime revoke

We offer a config to control if quota's used > runtime, we allow the scheduler to delete over-resource-used pod from low priority to high priority. you should follow the below config of koord-scheduler-config.yaml in helm.

apiVersion: v1
kind: ConfigMap
metadata:
name: koord-scheduler-config
namespace: {{ .Values.installation.namespace }}
data:
koord-scheduler-config: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
resourceLock: leases
resourceName: koord-scheduler
resourceNamespace: {{ .Values.installation.namespace }}
profiles:
- pluginConfig:
- name: ElasticQuota
args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: ElasticQuotaArgs
quotaGroupNamespace: {{ .Values.installation.namespace }}
enableCheckParentQuota: true
monitorAllQuotas: true
revokePodInterval: 60s
delayEvictTime: 300s
plugins:
queueSort:
disabled:
- name: "*"
enabled:
- name: Coscheduling
preFilter:
enabled:
- name: NodeNUMAResource
- name: DeviceShare
- name: Reservation
- name: Coscheduling
- name: ElasticQuota
filter:
...
  • enableCheckParentQuota check parentQuotaGroups' used and runtime Quota. Default is false.
  • monitorAllQuotas enable "used > runtime revoke" logic. Default is false.
  • revokePodInterval check loop time interval.
  • delayEvictTime when "used > runtime" continues over delayEvictTime will really trigger eviction.

To let scheduler can really delete the pod successfully, you should config the rbac/koord-scheduler.yaml as below in helm.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: koord-scheduler-role
rules:
{{- if semverCompare "<= 1.20-0" .Capabilities.KubeVersion.Version }}
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- list
- watch
{{- end }}
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- get
- update
- apiGroups:
- ""
resources:
- pods
verbs:
- patch
- update
- delete
- apiGroups:
- ""
resources:
- pods/eviction
verbs:
- create
- apiGroups:
...

To prevent Pods from being revoked, you can add label quota.scheduling.koordinator.sh/preemptible: false to the Pod:

apiVersion: v1
kind: Pod
metadata:
name: pod-example
namespace: default
labels:
quota.scheduling.koordinator.sh/name: "quota-example"
quota.scheduling.koordinator.sh/preemptible: false
spec:
...

In this case, the Pod is not allowed to use resources exceeding the Min. Since the "Min" resources are the guaranteed resources, the Pod will not be evicted.