Skip to main content
Version: v1.5 🚧

GangScheduling

Introduction​

We provide Gang mechanism for the scheduler to control pods binding opportunity. User can declare a resource-collection-minimum number, only when assigned-resources reach the given limitation can trigger the binding. We provide Strict and NonStrict to control the resource-accumulation-process by a configuration. We also provide a two-level Gang description for better matching the real scenario, which is different from community.

Setup​

Prerequisite​

  • Kubernetes >= 1.18
  • Koordinator >= 0.70

Installation​

Please make sure Koordinator components are correctly installed in your cluster. If not, please refer to Installation.

Configurations​

GangScheduling is Enabled by default. You can use it without any modification on the koord-scheduler config.

Use GangScheduling​

Quick Start​

apply gang through gang crd​

1.create pod-group

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
name: gang-example
namespace: default
spec:
scheduleTimeoutSeconds: 100
minMember: 2
$ kubectl get pgs -n default
NAME AGE
gang-example 13s

2.create child pod1

apiVersion: v1
kind: Pod
metadata:
name: pod-example1
namespace: default
labels:
pod-group.scheduling.sigs.k8s.io: gang-example
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
pod-example1 0/1 Pending 0 7s

3.create child pod2

apiVersion: v1
kind: Pod
metadata:
name: pod-example2
namespace: default
labels:
pod-group.scheduling.sigs.k8s.io: gang-example
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
pod-example1 1/1 Running 0 53s
pod-example2 1/1 Running 0 5s
$ kubectl get pg gang-example -n default -o yaml
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
creationTimestamp: "2022-10-09T09:08:17Z"
generation: 6
spec:
minMember: 1
scheduleTimeoutSeconds: 100
status:
phase: Running
running: 2
scheduled: 2

apply gang through annotation​

1.create child pod1

apiVersion: v1
kind: Pod
metadata:
name: pod-example1
namespace: default
annotations:
gang.scheduling.koordinator.sh/name: "gang-example"
gang.scheduling.koordinator.sh/min-available: "2"
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
pod-example1 0/1 Pending 0 7s

2.create child pod2

apiVersion: v1
kind: Pod
metadata:
name: pod-example2
namespace: default
annotations:
gang.scheduling.koordinator.sh/name: "gang-example"
gang.scheduling.koordinator.sh/min-available: "2"
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
pod-example1 1/1 Running 0 53s
pod-example2 1/1 Running 0 5s
$ kubectl get pg gang-example -n default -o yaml
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
creationTimestamp: "2022-10-09T09:08:17Z"
generation: 6
spec:
minMember: 1
scheduleTimeoutSeconds: 100
status:
phase: Running
running: 2
scheduled: 2

device resource debug api:​

$ kubectl -n koordinator-system get lease koord-scheduler --no-headers | awk '{print $2}' | cut -d'_' -f1 | xargs -I {} kubectl -n koordinator-system get pod {} -o wide --no-headers | awk '{print $6}'
10.244.0.64

$ curl 10.244.0.64:10251/apis/v1/plugins/Coscheduling/gang/default/gang-example
{
"boundChildren": {
"default/pod-example1": {},
"default/pod-example2": {}
},
"children": {
"default/pod-example1": {},
"default/pod-example2": {}
},
"childrenScheduleRoundMap": {
"default/pod-example1": 2,
"default/pod-example2": 2
},
"createTime": "2022-10-09T07:31:53Z",
"gangFrom": "GangFromPodAnnotation",
"gangGroup": null,
"hasGangInit": true,
"minRequiredNumber": 2,
"mode": "Strict",
"name": "default/gang-example",
"onceResourceSatisfied": true,
"scheduleCycle": 2,
"scheduleCycleValid": true,
"totalChildrenNum": 2,
"waitTime": 600000000000,
"waitingForBindChildren": {}
}

advanced configuration for gang​

1.apply through pod-group.

apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
name: gang-example1
namespace: default
annotations:
gang.scheduling.koordinator.sh/total-number: "3"
gang.scheduling.koordinator.sh/mode: "NonStrict"
gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"

spec:
scheduleTimeoutSeconds: 100
minMember: 2

  • gang.scheduling.koordinator.sh/total-number specifies the total children number of the gang. If not specified,it will be set with the minMember
  • gang.scheduling.koordinator.sh/mode defines the Gang Scheduling operation when failed scheduling. Support Strict\NonStrict, default is Strict
  • gang.scheduling.koordinator.sh/groups defines which gangs are bundled as a group. The gang will go to bind only all gangs in one group meet the conditions

2.apply through pod annotations.

apiVersion: v1
kind: Pod
metadata:
name: pod-example2
namespace: default
annotations:
gang.scheduling.koordinator.sh/name: "gang-example1"
gang.scheduling.koordinator.sh/min-available: "2"
gang.scheduling.koordinator.sh/total-number: "3"
gang.scheduling.koordinator.sh/mode: "Strict\NonStrict"
gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"
gang.scheduling.koordinator.sh/waiting-time: "100s"
spec:
schedulerName: koord-scheduler
containers:
- command:
- sleep
- 365d
image: busybox
imagePullPolicy: IfNotPresent
name: curlimage
resources:
limits:
cpu: 40m
memory: 40Mi
requests:
cpu: 40m
memory: 40Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
  • gang.scheduling.koordinator.sh/total-number specifies the total children number of the gang. If not specified,it will be set with the gang.scheduling.koordinator.sh/min-available
  • gang.scheduling.koordinator.sh/mode defines the Gang Scheduling operation when failed scheduling. Support Strict\NonStrict, default is Strict
  • gang.scheduling.koordinator.sh/groups defines which gangs are bundled as a group. The gang will go to bind only all gangs in one group meet the conditions
  • gang.scheduling.koordinator.sh/waiting-time specifies gang's max wait time in Permit Stage.

advanced configuration for scheduler​

you can modify koord-scheduler-config.yaml in helm to adjust Coscheduling configuration as below:

apiVersion: v1
kind: ConfigMap
metadata:
name: koord-scheduler-config
namespace: {{ .Values.installation.namespace }}
data:
koord-scheduler-config: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
resourceLock: leases
resourceName: koord-scheduler
resourceNamespace: {{ .Values.installation.namespace }}
profiles:
- pluginConfig:
- name: Coscheduling
args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: CoschedulingArgs
defaultTimeout: 600s
controllerWorkers: 1
- name: ElasticQuota
...