NRI Mode Resource Management
Glossaryโ
NRI, node resource interface. See: https://github.com/containerd/nri
Summaryโ
We hope to enable NRI mode resource management for koordinator for easy deployment and in-time control.
Motivationโ
Koordinator as a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes and its runtime hooks support two working modes for different scenarios: Standalone
and Proxy
. However, both of them have some constraints. NRI (Node Resource Interface), which is a public interface for controlling node resources is a general framework for CRI-compatible container runtime plug-in extensions. It provides a mechanism for extensions to track the state of pod/containers and make limited modifications to their configuration. We'd like to integrate NRI framework to address Standalone
and Proxy
constraints based on this community recommend mechanism.
Goalsโ
- Support NRI mode resource management for koordinator.
- Support containerd container runtime.
Non-Goals/Future Workโ
- Support docker runtime
Proposalโ
Different from standalone and proxy mode, Koodlet will start an NRI plugin to subscribe pod/container lifecycle events from container runtime (e.g. containerd, crio), and then koordlet NRI plugin will call runtime hooks to adjust pod resources or OCI spec. The flow should be:
- Get pod/container lifecycle events and OCI format information from container runtime (e.g. containerd, crio).
- Transform the OCI format information into internal protocols. (e.g. PodContext, ContainerContext) to re-use existing runtime hook plugins.
- Transform the runtime hook plugins' response into OCI spec format
- Return OCI spec format response to container runtime(e.g. containerd, crio).
User Storiesโ
Story 1โ
As a cluster administrator, I want to apply QoS policy before pod's status become running.
Story 2โ
As a cluster administrator, I want to deploy koordinator cluster without restart.
Story 3โ
As a cluster administrator, I want to adjust resources' policies at runtime.
Story 4โ
As a GPU user, I want to inject environment before pod running.
Requirementsโ
- Need to upgrade containerd to >= 1.7.0, crio to >= v1.25.0
Functional Requirementsโ
NRI mode should support all existing functionalities supported by standalone and Proxy mode.
Non-Functional Requirementsโ
Non-functional requirements are user expectations of the solution. Include considerations for performance, reliability and security.
Implementation Details/Notes/Constraintsโ
- koordlet NRI plugin
type nriServer struct {
stub stub.Stub
mask stub.EventMask
options Options // server options
}
// Enable 3 hooks (RunPodSandbox, CreateContainer, UpdateContainer) in NRI
func (p *nriServer) Configure(config, runtime, version string) (stub.EventMask, error) {
}
// Sync all pods/containers information before koordlet nri plugin run
func (p *nriServer) Synchronize(pods []*api.PodSandbox, containers []*api.Container) ([]*api.ContainerUpdate, error) {
}
func (p *nriServer) RunPodSandbox(pod *api.PodSandbox) error {
podCtx.FromNri(pod)
RunHooks(...)
podCtx.NriDone()
}
func (p *nriServer) CreateContainer(pod *api.PodSandbox, container *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
func (p *nriServer) UpdateContainer(pod *api.PodSandbox, container *api.Container) ([]*api.ContainerUpdate, error) {
containerCtx.FromNri(pod, container)
RunHooks(...)
containCtx.NriDone()
}
- koordlet enhancement for NRI
- PodContext
// fill PodContext from OCI spec
func (p *PodContext) FromNri(pod *api.PodSandbox) {
}
// apply QoS resource policies for pod
func (p *PodContext) NriDone() {
}
- ContainerContext
// fill ContainerContext from OCI spec
func (c *ContainerContext) FromNri(pod *api.PodSandbox, container *api.Container) {
}
// apply QoS resource policies for container
func (c *ContainerContext) NriDone() (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
}
Risks and Mitigationsโ
Alternativesโ
There are several approaches to extending the Kubernetes CRI (Container Runtime Interface) to manage container resources such as standalone
and proxy
. Under standalone
running mode, resource isolation parameters will be injected asynchronously. Under proxy
running mode, proxy can hijack CRI requests from kubelet for pods and then apply resource policies in time. However, proxy
mode needs to configure and restart kubelet.
There are a little difference in execution timing between NRI
and proxy
modes. Hook points (execution timing) are not exactly same. The biggest difference is proxy
call koordlet hooks between kubelet and containerd. However, NRI will call NRI plugin (koodlet hooks) in containerd, that means containerd still could do something before or after containerd call NRI plugin (koordlet hooks). For example, under NRI
running mode, containerd setup pod network first and then call NRI plugin (koordlet hooks) in RunPodSanbox, but under proxy
running mode, containerd couldn't do anything before koordlet hooks running when proxy
handle RunPodSandbox CRI request.
Standalone
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
- kubelet -> Node Agent -> CRI Runtime / containers
Proxy
- kubelet -- CRI Request -> CRI Proxy -- CRI Request (hooked) -> CRI Runtime -- OCI Spec -> OCI compatible runtime -> containers
NRI
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers
โโโโโโโโโโโโโโโโโ โ โ
โโโโโโโโโโโโโโKoordlet NRI plugin
- kubelet -- CRI Request -> CRI Runtime -- OCI Spec --> OCI compatible runtime -> containers
Upgrade Strategyโ
- Need to upgrade containerd to 1.7.0+ or CRIO to 1.26.0+
- Need to enable NRI