跳到主要内容
版本:v1.4

NRI Mode Resource Management

Glossary

NRI, node resource interface. See: https://github.com/containerd/nri

Summary

We hope to enable NRI mode resource management for koordinator for easy deployment and in-time control.

Motivation

Koordinator as a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes and its runtime hooks support two working modes for different scenarios: Standalone and Proxy. However, both of them have some constraints. NRI (Node Resource Interface), which is a public interface for controlling node resources is a general framework for CRI-compatible container runtime plug-in extensions. It provides a mechanism for extensions to track the state of pod/containers and make limited modifications to their configuration. We'd like to integrate NRI framework to address Standalone and Proxy constraints based on this community recommend mechanism.

Goals

  • Support NRI mode resource management for koordinator.
  • Support containerd container runtime.

Non-Goals/Future Work

  • Support docker runtime

Proposal

Different from standalone and proxy mode, Koodlet will start an NRI plugin to subscribe pod/container lifecycle events from container runtime (e.g. containerd, crio), and then koordlet NRI plugin will call runtime hooks to adjust pod resources or OCI spec. The flow should be:

  • Get pod/container lifecycle events and OCI format information from container runtime (e.g. containerd, crio).
  • Transform the OCI format information into internal protocols. (e.g. PodContext, ContainerContext) to re-use existing runtime hook plugins.
  • Transform the runtime hook plugins' response into OCI spec format
  • Return OCI spec format response to container runtime(e.g. containerd, crio).

nri-proposal.png

User Stories

Story 1

As a cluster administrator, I want to apply QoS policy before pod's status become running.

Story 2

As a cluster administrator, I want to deploy koordinator cluster without restart.

Story 3

As a cluster administrator, I want to adjust resources' policies at runtime.

Story 4

As a GPU user, I want to inject environment before pod running.

Requirements

  • Need to upgrade containerd to >= 1.7.0, crio to >= v1.25.0

Functional Requirements

NRI mode should support all existing functionalities supported by standalone and Proxy mode.

Non-Functional Requirements

Non-functional requirements are user expectations of the solution. Include considerations for performance, reliability and security.

Implementation Details/Notes/Constraints

  1. koordlet NRI plugin
type nriServer struct {
stub stub.Stub
mask stub.EventMask
options Options // server options
}

// Enable 3 hooks (RunPodSandbox, CreateContainer, UpdateContainer) in NRI
func (p *nriServer) Configure(config, runtime, version string) (stub.EventMask, error) {
}

// Sync all pods/containers information before koordlet nri plugin run
func (p *nriServer) Synchronize(pods []*api.PodSandbox, containers []*api.Container) ([]*api.ContainerUpdate, error) {
}

func (p *nriServer) RunPodSandbox(pod *api.PodSandbox) error {
podCtx.FromNri(pod)
RunHooks(...)
podCtx.NriDone()
}

func (p *nriServer) CreateContainer(pod *api.PodSandbox, container *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}

func (p *nriServer) UpdateContainer(pod *api.PodSandbox, container *api.Container) ([]*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
  1. koordlet enhancement for NRI
  • PodContext
// fill PodContext from OCI spec
func (p *PodContext) FromNri(pod *api.PodSandbox) {
}

// apply QoS resource policies for pod
func (p *PodContext) NriDone() {
}
  • ContainerContext
// fill ContainerContext from OCI spec
func (c *ContainerContext) FromNri(pod *api.PodSandbox, container *api.Container) {
}

// apply QoS resource policies for container
func (c *ContainerContext) NriDone() (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
}

Risks and Mitigations

Alternatives

There are several approaches to extending the Kubernetes CRI (Container Runtime Interface) to manage container resources such as standalone and proxy. Under standalone running mode, resource isolation parameters will be injected asynchronously. Under proxy running mode, proxy can hijack CRI requests from kubelet for pods and then apply resource policies in time. However, proxy mode needs to configure and restart kubelet.

There are a little difference in execution timing between NRI and proxy modes. Hook points (execution timing) are not exactly same. The biggest difference is proxy call koordlet hooks between kubelet and containerd. However, NRI will call NRI plugin (koodlet hooks) in containerd, that means containerd still could do something before or after containerd call NRI plugin (koordlet hooks). For example, under NRI running mode, containerd setup pod network first and then call NRI plugin (koordlet hooks) in RunPodSanbox, but under proxy running mode, containerd couldn't do anything before koordlet hooks running when proxy handle RunPodSandbox CRI request.

  • Standalone

    • kubelet -- CRI Request -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
    • kubelet -> Node Agent -> CRI Runtime / containers

standalone.png

  • Proxy

    • kubelet -- CRI Request -> CRI Proxy -- CRI Request (hooked) -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers

proxy.png

  • NRI

    • kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers

      Koordlet NRI plugin

nri.png

Upgrade Strategy

  • Need to upgrade containerd to 1.7.0+ or CRIO to 1.26.0+
  • Need to enable NRI