GangScheduling
Introduction
We provide Gang mechanism for the scheduler to control pods binding opportunity. User can declare a resource-collection-minimum number,
only when assigned-resources reach the given limitation can trigger the binding. We provide Strict and NonStrict to
control the resource-accumulation-process by a configuration. We also provide a two-level Gang description for better matching
the real scenario, which is different from community.
Setup
Prerequisite
- Kubernetes >= 1.18
- Koordinator >= 0.70
Installation
Please make sure Koordinator components are correctly installed in your cluster. If not, please refer to Installation.
Configurations
GangScheduling is Enabled by default. You can use it without any modification on the koord-scheduler config.
Use GangScheduling
Quick Start
apply gang through gang crd
1.create pod-group
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: gang-example
  namespace: default
spec:
  scheduleTimeoutSeconds: 100
  minMember: 2
$ kubectl get pgs -n default
  NAME           AGE
  gang-example   13s
2.create child pod1
apiVersion: v1
kind: Pod
metadata:
  name: pod-example1
  namespace: default
  labels:
    pod-group.scheduling.sigs.k8s.io: gang-example
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 40m
        memory: 40Mi
      requests:
        cpu: 40m
        memory: 40Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always
$ kubectl get pod -n default
  NAME           READY   STATUS    RESTARTS   AGE
  pod-example1   0/1     Pending   0          7s
3.create child pod2
apiVersion: v1
kind: Pod
metadata:
  name: pod-example2
  namespace: default
  labels:
    pod-group.scheduling.sigs.k8s.io: gang-example
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 40m
        memory: 40Mi
      requests:
        cpu: 40m
        memory: 40Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always
$ kubectl get pod -n default
  NAME           READY   STATUS    RESTARTS   AGE
  pod-example1   1/1     Running   0          53s
  pod-example2   1/1     Running   0          5s
$ kubectl get pg gang-example -n default -o yaml
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
  creationTimestamp: "2022-10-09T09:08:17Z"
  generation: 6
spec:
  minMember: 1
  scheduleTimeoutSeconds: 100
status:
  phase: Running
  running: 2
  scheduled: 2
apply gang through annotation
1.create child pod1
apiVersion: v1
kind: Pod
metadata:
  name: pod-example1
  namespace: default
  annotations:
    gang.scheduling.koordinator.sh/name: "gang-example"
    gang.scheduling.koordinator.sh/min-available: "2"  
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 40m
        memory: 40Mi
      requests:
        cpu: 40m
        memory: 40Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always
$ kubectl get pod -n default
  NAME           READY   STATUS    RESTARTS   AGE
  pod-example1   0/1     Pending   0          7s
2.create child pod2
apiVersion: v1
kind: Pod
metadata:
  name: pod-example2
  namespace: default
  annotations:
    gang.scheduling.koordinator.sh/name: "gang-example"
    gang.scheduling.koordinator.sh/min-available: "2"  
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 40m
        memory: 40Mi
      requests:
        cpu: 40m
        memory: 40Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always
$ kubectl get pod -n default
  NAME           READY   STATUS    RESTARTS   AGE
  pod-example1   1/1     Running   0          53s
  pod-example2   1/1     Running   0          5s
$ kubectl get pg gang-example -n default -o yaml
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
  creationTimestamp: "2022-10-09T09:08:17Z"
  generation: 6
spec:
  minMember: 1
  scheduleTimeoutSeconds: 100
status:
  phase: Running
  running: 2
  scheduled: 2
device resource debug api:
$ kubectl -n koordinator-system get lease koord-scheduler --no-headers | awk '{print $2}' | cut -d'_' -f1 | xargs -I {} kubectl -n koordinator-system get pod {} -o wide --no-headers | awk '{print $6}'
  10.244.0.64
$ curl 10.244.0.64:10251/apis/v1/plugins/Coscheduling/gang/default/gang-example
{
    "boundChildren": {
        "default/pod-example1": {},
        "default/pod-example2": {}
    },
    "children": {
        "default/pod-example1": {},
        "default/pod-example2": {}
    },
    "childrenScheduleRoundMap": {
        "default/pod-example1": 2,
        "default/pod-example2": 2
    },
    "createTime": "2022-10-09T07:31:53Z",
    "gangFrom": "GangFromPodAnnotation",
    "gangGroup": null,
    "hasGangInit": true,
    "minRequiredNumber": 2,
    "mode": "Strict",
    "name": "default/gang-example",
    "onceResourceSatisfied": true,
    "scheduleCycle": 2,
    "scheduleCycleValid": true,
    "totalChildrenNum": 2,
    "waitTime": 600000000000,
    "waitingForBindChildren": {}
}
advanced configuration for gang
1.apply through pod-group.
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: gang-example1
  namespace: default
  annotations:
    gang.scheduling.koordinator.sh/total-number: "3"
    gang.scheduling.koordinator.sh/mode: "NonStrict"
    gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"
    
spec:
  scheduleTimeoutSeconds: 100
  minMember: 2
  
- gang.scheduling.koordinator.sh/total-numberspecifies the total children number of the gang. If not specified,it will be set with the- minMember
- gang.scheduling.koordinator.sh/modedefines the Gang Scheduling operation when failed scheduling. Support- Strict\NonStrict, default is- Strict
- gang.scheduling.koordinator.sh/groupsdefines which gangs are bundled as a group. The gang will go to bind only all gangs in one group meet the conditions
2.apply through pod annotations.
apiVersion: v1
kind: Pod
metadata:
  name: pod-example2
  namespace: default
  annotations:
    gang.scheduling.koordinator.sh/name: "gang-example1"
    gang.scheduling.koordinator.sh/min-available: "2"  
    gang.scheduling.koordinator.sh/total-number: "3"
    gang.scheduling.koordinator.sh/mode: "Strict\NonStrict"
    gang.scheduling.koordinator.sh/groups: "[\"default/gang-example1\", \"default/gang-example2\"]"
    gang.scheduling.koordinator.sh/waiting-time: "100s"
spec:
  schedulerName: koord-scheduler
  containers:
  - command:
    - sleep
    - 365d
    image: busybox
    imagePullPolicy: IfNotPresent
    name: curlimage
    resources:
      limits:
        cpu: 40m
        memory: 40Mi
      requests:
        cpu: 40m
        memory: 40Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  restartPolicy: Always
- gang.scheduling.koordinator.sh/total-numberspecifies the total children number of the gang. If not specified,it will be set with the- gang.scheduling.koordinator.sh/min-available
- gang.scheduling.koordinator.sh/modedefines the Gang Scheduling operation when failed scheduling. Support- Strict\NonStrict, default is- Strict
- gang.scheduling.koordinator.sh/groupsdefines which gangs are bundled as a group. The gang will go to bind only all gangs in one group meet the conditions
- gang.scheduling.koordinator.sh/waiting-timespecifies gang's max wait time in Permit Stage.
advanced configuration for scheduler
you can modify koord-scheduler-config.yaml in helm to adjust Coscheduling configuration as below:
apiVersion: v1
kind: ConfigMap
metadata:
  name: koord-scheduler-config
  namespace: {{ .Values.installation.namespace }}
data:
  koord-scheduler-config: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: true
      resourceLock: leases
      resourceName: koord-scheduler
      resourceNamespace: {{ .Values.installation.namespace }}
    profiles:
      - pluginConfig:
        - name: Coscheduling
        args:
            apiVersion: kubescheduler.config.k8s.io/v1beta2
            kind: CoschedulingArgs
            defaultTimeout: 600s
            controllerWorkers: 1
        - name: ElasticQuota
        ...