Version: v1.6

Enhanced NodeResourceFit

Introduction

The NodeResourcesFit plugin of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example:

In AI scenarios, pods that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation.
Pods that apply for CPU and memory are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU and memory on GPU machines, resulting in real tasks that require GPUs pending due to insufficient non-GPU resources.

It is therefore hoped that both strategies can be provided to address this business need. Koordinator provides the NodeResourcesFitPlus and ScarceResourceAvoidance plugins to support the requirements of these two scenarios, thereby:

Different types of resources can be configured with different strategies to prioritize them in the form of weights
Prevent pods that have not applied for scarce resources from being scheduled to nodes with scarce resources.

Plugin Logic Explanation

NodeResourcesFitPlus

The scheduler args for the NodeResourcesFitPlus plugin can be configured as follows:

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

The two strategies have the following score calculation methods and effects:

The final node score can be calculated as follows:

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

ScarceResourceAvoidance

The scheduler args for the ScarceResourceAvoidance plugin can be configured as follows:

resources: 
- nvidia.com/gpu
- scarceResource1
- scarceResource2

After obtaining scarce resource types from the config, the plugin will score the node and Pod fitness as follows:

Obtain the set of resource types requested by the pod and the set of resource types that the node can allocate.
ResourceTypesUnused = ResourceTypesTotal - ResourceTypesRequested
ScarceResourceTypesUnused = intersection(ResourceTypesUnused, ScarceResourceTypes)

The more unused scarce resource types, the lower the score will be.

Example

SchedulerConfiguration Example

We can configure schedulerConfiguration as follows:

profiles:
- pluginConfig:
  - args:
      apiVersion: kubescheduler.config.k8s.io/v1
      kind: NodeResourcesFitPlusArgs
      resources: 
        nvidia.com/gpu:
          type: MostAllocated
          weight: 2
        cpu:
          type: LeastAllocated
          weight: 1
        memory:
          type: LeastAllocated
          weight: 1
    name: NodeResourcesFitPlus
  - args:
      apiVersion: kubescheduler.config.k8s.io/v1
      kind: ScarceResourceAvoidanceArgs
      resources: 
      - nvidia.com/gpu
    name: ScarceResourceAvoidance
  plugins:
    score:
      enabled:
      - name: NodeResourcesFitPlus
        weight: 2
      - name: ScarceResourceAvoidance
        weight: 2
      disabled:
      - name: "*"
  schedulerName: koord-scheduler

GPU Applications And Node Score

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-scheduler1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-scheduler1
  template:
    metadata:
      labels:
        app: test-scheduler1
    spec:
      schedulerName: koord-scheduler
      containers:
        - image: dockerpull.com/nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100"
              memory: "100"
              nvidia.com/gpu: 2
            limits:
              cpu: "100"
              memory: "100"
              nvidia.com/gpu: 2

Node allocation rate information

node1: cpu 58%, memory 21%, nvidia.com/gpu 6 (total 8)

node2: cpu 22%, memory 5%, nvidia.com/gpu 0 (total 8)

Result: Prioritize GPU machines and pool GPU resources

Log view => Top10 scores for pod

| # | Pod | Node | Score | NodeResourcesFitPlus | ScarceResourceAvoidance |
| 0 | test-scheduler1-xxx | node1 | 358 | 158 | 200 |
| 1 | test-scheduler1-xxx | node2 | 296 | 96 | 200 |

CPU Applications And Node Score

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-scheduler1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-scheduler1
  template:
    metadata:
      labels:
        app: test-scheduler1
    spec:
      schedulerName: koord-scheduler
      containers:
        - image: dockerpull.com/nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100"
              memory: "100"
            limits:
              cpu: "100"
              memory: "100"

Node allocation rate information

node1: cpu 22%, memory 10%, nvidia.com/gpu 6 (total 8)
node2: cpu 40%, memory 50%
node3: cpu 30%, memory 20%

Result: Prioritize CPU machines and disperse CPU and MEM resources

Log view => Top10 scores for pod

| # | Pod | Node | Score | NodeResourcesFitPlus | ScarceResourceAvoidance |
| 0 | test-scheduler1-xxx | node1 | 310 | 144 | 166 |
| 1 | test-scheduler1-xxx | node2 | 262 | 62 | 200 |
| 2 | test-scheduler1-xxx | node3 | 326 | 126 | 200 |

Compatible With Native NodeResourceFit

MostAllocated

native configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
           scoringStrategy:
             resources:
             - name: cpu
               weight: 2
             - name: memory
               weight: 1
             type: MostAllocated
         name: NodeResourcesFit
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFit"

NodeResourcesFitPlus configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeResourcesFitPlusArgs
          resources: 
            cpu:
              type: MostAllocated
              weight: 2
            memory:
              type: MostAllocated
              weight: 1
         name: NodeResourcesFitPlus
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFitPlus"

LeastAllocated

native configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
           scoringStrategy:
             resources:
             - name: cpu
               weight: 2
             - name: memory
               weight: 1
             type: LeastAllocated
         name: NodeResourcesFit
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFit"

NodeResourcesFitPlus configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeResourcesFitPlus
          resources: 
            cpu:
              type: LeastAllocated
              weight: 2
            memory:
              type: LeastAllocated
              weight: 1
         name: NodeResourcesFitPlus
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFitPlus"

Enhanced NodeResourceFit

Introduction​

Plugin Logic Explanation​

NodeResourcesFitPlus​

ScarceResourceAvoidance​

Example​

SchedulerConfiguration Example​

GPU Applications And Node Score​

CPU Applications And Node Score​

Compatible With Native NodeResourceFit​

MostAllocated​

LeastAllocated​