版本：v1.7

增强的 NodeResourceFit

介绍

原生的 Kubernetes NodeResourcesFit 插件只能为所有资源采用一种策略，例如 MostRequestedPriority 和 LeastRequestedPriority。然而，在实际工业应用中，这种设计并不适用于某些场景。例如：

在 AI 场景中，申请 GPU 的 Pod 优先占用整台 GPU 机器，以防止 GPU 碎片化。
申请 CPU 和内存的 Pod 会被优先分配到非 GPU 机器上，以防止 GPU 机器上的 CPU 和内存被过度消耗，从而导致需要 GPU 的实际任务因非 GPU 资源不足而待处理。

因此，希望这两种策略都能提供以满足这一业务需求。Koordinator 提供 NodeResourcesFitPlus 和 ScarceResourceAvoidance 插件来支持这两个场景的需求，从而：

不同类型的资源可以配置不同的策略，并通过权重的形式进行优先级排序。
防止未申请稀缺资源的 Pod 被调度到具有稀缺资源的节点上。

插件逻辑介绍

NodeResourcesFitPlus

NodeResourcesFitPlus 插件的调度器参数可以配置如下：

resources: 
  nvidia.com/gpu:
    type: MostAllocated
    weight: 2
  cpu:
    type: LeastAllocated
    weight: 1
  memory:
    type: LeastAllocated
    weight: 1

这两种策略的评分计算方法和效果如下：

最终的节点评分可以按如下方式计算：

finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)

ScarceResourceAvoidance

ScarceResourceAvoidance 插件的调度器参数可以按如下方式配置：

resources: 
- nvidia.com/gpu
- scarceResource1
- scarceResource2

从配置中获取稀缺资源类型后，插件将按如下方式对节点和 Pod 的匹配性进行评分：

获取 Pod 请求的资源类型集合以及节点可以分配的资源类型集合。
ResourceTypesUnused = ResourceTypesTotal - ResourceTypesRequested
ScarceResourceTypesUnused = intersection(ResourceTypesUnused, ScarceResourceTypes)

未使用的稀缺资源类型越多，评分越低。

例子

SchedulerConfiguration 样例

我们可以按如下方式配置 schedulerConfiguration：

profiles:
- pluginConfig:
  - args:
      apiVersion: kubescheduler.config.k8s.io/v1
      kind: NodeResourcesFitPlusArgs
      resources: 
        nvidia.com/gpu:
          type: MostAllocated
          weight: 2
        cpu:
          type: LeastAllocated
          weight: 1
        memory:
          type: LeastAllocated
          weight: 1
    name: NodeResourcesFitPlus
  - args:
      apiVersion: kubescheduler.config.k8s.io/v1
      kind: ScarceResourceAvoidanceArgs
      resources: 
      - nvidia.com/gpu
    name: ScarceResourceAvoidance
  plugins:
    score:
      enabled:
      - name: NodeResourcesFitPlus
        weight: 2
      - name: ScarceResourceAvoidance
        weight: 2
      disabled:
      - name: "*"
  schedulerName: koord-scheduler

GPU 应用和节点打分

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-scheduler1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-scheduler1
  template:
    metadata:
      labels:
        app: test-scheduler1
    spec:
      schedulerName: koord-scheduler
      containers:
        - image: dockerpull.com/nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100"
              memory: "100"
              nvidia.com/gpu: 2
            limits:
              cpu: "100"
              memory: "100"
              nvidia.com/gpu: 2

节点分配率信息：

node1: cpu 58%, memory 21%, nvidia.com/gpu 6 (total 8)

node2: cpu 22%, memory 5%, nvidia.com/gpu 0 (total 8)

结果：优先选择 GPU 机器并 Binpack GPU 资源

查看打分 => Top10 scores for pod

| # | Pod | Node | Score | NodeResourcesFitPlus | ScarceResourceAvoidance |
| 0 | test-scheduler1-xxx | node1 | 358 | 158 | 200 |
| 1 | test-scheduler1-xxx | node2 | 296 | 96 | 200 |

CPU 应用和节点打分

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-scheduler1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-scheduler1
  template:
    metadata:
      labels:
        app: test-scheduler1
    spec:
      schedulerName: koord-scheduler
      containers:
        - image: dockerpull.com/nginx
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100"
              memory: "100"
            limits:
              cpu: "100"
              memory: "100"

节点分配率信息：

node1: cpu 22%, memory 10%, nvidia.com/gpu 6 (total 8)
node2: cpu 40%, memory 50%
node3: cpu 30%, memory 20%

结果：优先选择 GPU 机器并 Binpack GPU 资源

查看打分 => Top10 scores for pod

| # | Pod | Node | Score | NodeResourcesFitPlus | ScarceResourceAvoidance |
| 0 | test-scheduler1-xxx | node1 | 310 | 144 | 166 |
| 1 | test-scheduler1-xxx | node2 | 262 | 62 | 200 |
| 2 | test-scheduler1-xxx | node3 | 326 | 126 | 200 |

与原生 NodeResourceFit 兼容

MostAllocated

原生配置：

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
           scoringStrategy:
             resources:
             - name: cpu
               weight: 2
             - name: memory
               weight: 1
             type: MostAllocated
         name: NodeResourcesFit
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFit"

NodeResourcesFitPlus 配置:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeResourcesFitPlusArgs
          resources: 
            cpu:
              type: MostAllocated
              weight: 2
            memory:
              type: MostAllocated
              weight: 1
         name: NodeResourcesFitPlus
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFitPlus"

LeastAllocated

原生配置:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
           scoringStrategy:
             resources:
             - name: cpu
               weight: 2
             - name: memory
               weight: 1
             type: LeastAllocated
         name: NodeResourcesFit
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFit"

NodeResourcesFitPlus 配置:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
     pluginConfig:
       - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeResourcesFitPlus
          resources: 
            cpu:
              type: LeastAllocated
              weight: 2
            memory:
              type: LeastAllocated
              weight: 1
         name: NodeResourcesFitPlus
      plugins:
        score:
          enabled:
          - name: "NodeResourcesFitPlus"

增强的 NodeResourceFit

介绍​

插件逻辑介绍​

NodeResourcesFitPlus​

ScarceResourceAvoidance​

例子​

SchedulerConfiguration 样例​

GPU 应用和节点打分​

CPU 应用和节点打分​

与原生 NodeResourceFit 兼容​

MostAllocated​

LeastAllocated​

介绍