NRI Mode Resource Management
Glossary
NRI, node resource interface. See: https://github.com/containerd/nri
Summary
We hope to enable NRI mode resource management for koordinator for easy deployment and in-time control.
Motivation
Koordinator as a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes and its runtime hooks support two working modes for different scenarios: Standalone
and Proxy
. However, both of them have some constraints. NRI (Node Resource Interface), which is a public interface for controlling node resources is a general framework for CRI-compatible container runtime plug-in extensions. It provides a mechanism for extensions to track the state of pod/containers and make limited modifications to their configuration. We'd like to integrate NRI framework to address Standalone
and Proxy
constraints based on this community recommend mechanism.
Goals
- Support NRI mode resource management for koordinator.
- Support containerd container runtime.
Non-Goals/Future Work
- Support docker runtime
Proposal
Different from standalone and proxy mode, Koodlet will start an NRI plugin to subscribe pod/container lifecycle events from container runtime (e.g. containerd, crio), and then koordlet NRI plugin will call runtime hooks to adjust pod resources or OCI spec. The flow should be:
- Get pod/container lifecycle events and OCI format information from container runtime (e.g. containerd, crio).
- Transform the OCI format information into internal protocols. (e.g. PodContext, ContainerContext) to re-use existing runtime hook plugins.
- Transform the runtime hook plugins' response into OCI spec format
- Return OCI spec format response to container runtime(e.g. containerd, crio).
User Stories
Story 1
As a cluster administrator, I want to apply QoS policy before pod's status become running.
Story 2
As a cluster administrator, I want to deploy koordinator cluster without restart.
Story 3
As a cluster administrator, I want to adjust resources' policies at runtime.
Story 4
As a GPU user, I want to inject environment before pod running.
Requirements
- Need to upgrade containerd to >= 1.7.0, crio to >= v1.25.0
Functional Requirements
NRI mode should support all existing functionalities supported by standalone and Proxy mode.
Non-Functional Requirements
Non-functional requirements are user expectations of the solution. Include considerations for performance, reliability and security.
Implementation Details/Notes/Constraints
- koordlet NRI plugin
type nriServer struct {
stub stub.Stub
mask stub.EventMask
options Options // server options
}
// Enable 3 hooks (RunPodSandbox, CreateContainer, UpdateContainer) in NRI
func (p *nriServer) Configure(config, runtime, version string) (stub.EventMask, error) {
}
// Sync all pods/containers information before koordlet nri plugin run
func (p *nriServer) Synchronize(pods []*api.PodSandbox, containers []*api.Container) ([]*api.ContainerUpdate, error) {
}
func (p *nriServer) RunPodSandbox(pod *api.PodSandbox) error {
podCtx.FromNri(pod)
RunHooks(...)
podCtx.NriDone()
}
func (p *nriServer) CreateContainer(pod *api.PodSandbox, container *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
func (p *nriServer) UpdateContainer(pod *api.PodSandbox, container *api.Container) ([]*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
- koordlet enhancement for NRI
- PodContext
// fill PodContext from OCI spec
func (p *PodContext) FromNri(pod *api.PodSandbox) {
}
// apply QoS resource policies for pod
func (p *PodContext) NriDone() {
}
- ContainerContext
// fill ContainerContext from OCI spec
func (c *ContainerContext) FromNri(pod *api.PodSandbox, container *api.Container) {
}
// apply QoS resource policies for container
func (c *ContainerContext) NriDone() (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
}
Risks and Mitigations
Alternatives
There are several approaches to extending the Kubernetes CRI (Container Runtime Interface) to manage container resources such as standalone
and proxy
. Under standalone
running mode, resource isolation parameters will be injected asynchronously. Under proxy
running mode, proxy can hijack CRI requests from kubelet for pods and then apply resource policies in time. However, proxy
mode needs to configure and restart kubelet.
There are a little difference in execution timing between NRI
and proxy
modes. Hook points (execution timing) are not exactly same. The biggest difference is proxy
call koordlet hooks between kubelet and containerd. However, NRI will call NRI plugin (koodlet hooks) in containerd, that means containerd still could do something before or after containerd call NRI plugin (koordlet hooks). For example, under NRI
running mode, containerd setup pod network first and then call NRI plugin (koordlet hooks) in RunPodSanbox, but under proxy
running mode, containerd couldn't do anything before koordlet hooks running when proxy
handle RunPodSandbox CRI request.
Standalone
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
- kubelet -> Node Agent -> CRI Runtime / containers
Proxy
- kubelet -- CRI Request -> CRI Proxy -- CRI Request (hooked) -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
NRI
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers
↘ ↗
Koordlet NRI plugin
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers
Upgrade Strategy
- Need to upgrade containerd to 1.7.0+ or CRIO to 1.26.0+
- Need to enable NRI