Skip to main content
Version: v1.7

Component Guide

Introductionโ€‹

Koordinator is a QoS-based scheduling system that enhances Kubernetes cluster efficiency and reliability for hybrid workloads. This guide documents each component's purpose, architecture, configuration, and operations. Components communicate through the Kubernetes API server and share configuration via ConfigMaps for coordinated resource management.

koord-managerโ€‹

The koord-manager is Koordinator's control plane, managing CRDs and webhooks while coordinating subsystems through leader election. It initializes controllers, webhooks, and shared informers for cluster event processing.

Key configuration options:

  • --enable-leader-election: Enable high availability
  • --metrics-addr: Expose monitoring metrics
  • --feature-gates: Control alpha/beta features
  • --config-namespace: Specify configuration namespace
  • Webhook server: Runs on port 9876 for admission control

Component Interaction:

koord-manager
โ”œโ”€โ”€ Webhook Server (Port 9876)
โ”œโ”€โ”€ Leader Election (High Availability)
โ”œโ”€โ”€ Coordinates with koord-scheduler
โ””โ”€โ”€ Coordinates with koordlet

Diagram sources

Section sources

koord-schedulerโ€‹

The koord-scheduler extends Kubernetes scheduler with advanced capabilities for co-located workloads through a plugin-based architecture.

Configuration uses the --config flag pointing to a YAML file (typically koord-scheduler-config ConfigMap). The configuration extends Kubernetes scheduler schema with Koordinator-specific components.

Key scheduling plugins:

  • LoadAwareScheduling: Real-time node resource utilization
  • NodeNUMAResource: NUMA-aware CPU and memory allocation
  • Reservation: Resource reservation with preemption support
  • ElasticQuota: Dynamic quota allocation and eviction
  • Coscheduling: Gang scheduling for pod groups
  • DeviceShare: Shared device management (GPU, RDMA, FPGA)

Scheduler Configuration Hierarchy:

Scheduler Configuration
โ”œโ”€โ”€ Global Settings
โ”‚ โ””โ”€โ”€ InsecureServing
โ”œโ”€โ”€ Plugin Configurations
โ”‚ โ”œโ”€โ”€ LoadAwareScheduling (Load-aware scheduling)
โ”‚ โ”œโ”€โ”€ NodeNUMAResource (NUMA resource management)
โ”‚ โ”œโ”€โ”€ Reservation (Resource reservation)
โ”‚ โ”œโ”€โ”€ ElasticQuota (Elastic quota)
โ”‚ โ”œโ”€โ”€ Coscheduling (Gang scheduling)
โ”‚ โ””โ”€โ”€ DeviceShare (Device sharing)
โ””โ”€โ”€ Framework Extensions
โ””โ”€โ”€ ServicesEngine

Diagram sources

Section sources

koordletโ€‹

The koordlet runs as a daemon on each node, collecting metrics, enforcing QoS policies, and managing runtime hooks.

Configuration uses ConfigMaps with these key settings:

  • ConfigMap name and namespace
  • States informer configuration
  • Metric cache settings
  • QoS manager configuration
  • Runtime hook configuration
  • Audit and prediction settings

Architecture subsystems:

  • MetricCache: Stores collected metrics with pluggable backends
  • MetricsAdvisor: Analyzes metrics for optimization recommendations
  • QOSManager: Enforces QoS policies and resource allocation
  • RuntimeHooks: Integrates with container runtimes
  • Prediction: Provides resource usage prediction
  • StatesInformer: Maintains consistent pod and node state views

koordlet Daemon Components:

Core components and responsibilities:

  • daemon: Main daemon process containing subsystems
    • MetricAdvisor: Metrics analyzer
      • Run(stopCh): Run main loop
      • HasSynced(): Check sync status
    • StatesInformer: State notifier
      • Run(stopCh): Run main loop
      • HasSynced(): Check sync status
    • MetricCache: Metrics cache
      • Run(stopCh): Run main loop
    • QOSManager: QoS manager
      • Run(stopCh): Run main loop
    • RuntimeHook: Runtime hooks
      • Run(stopCh): Run main loop
    • PredictServer: Prediction server
      • Setup(): Setup dependencies
      • Run(stopCh): Run main loop
    • ResourceUpdateExecutor: Resource update executor
      • Run(stopCh): Run main loop
    • extensionControllers: Extension controllers list

Diagram sources

Section sources

koord-deschedulerโ€‹

The koord-descheduler identifies and evicts pods to improve resource utilization and cluster balance through a modular, profile-based architecture.

Operates as a Kubernetes controller manager with:

  • Descheduler Core: Coordinates descheduling policies
  • Controller Manager: Manages reconciliation loops for custom resources
  • Profiles: Define enabled plugins and configurations
  • Plugins: Implement descheduling strategies (deschedule, balance, filter, evict)
  • Informer Factory: Maintains cached cluster resource views
  • Eviction Limiter: Controls pod eviction rate to prevent disruption

koord-descheduler Architecture:

koord-descheduler
โ”œโ”€โ”€ Descheduler Core
โ”‚ โ”œโ”€โ”€ Profiles
โ”‚ โ”‚ โ”œโ”€โ”€ Deschedule Plugins
โ”‚ โ”‚ โ”œโ”€โ”€ Balance Plugins
โ”‚ โ”‚ โ”œโ”€โ”€ Filter Plugins
โ”‚ โ”‚ โ””โ”€โ”€ Evict Plugins
โ”‚ โ”œโ”€โ”€ Eviction Limiter
โ”‚ โ””โ”€โ”€ Informer Factory
โ”‚ โ”œโ”€โ”€ Node Informer
โ”‚ โ”œโ”€โ”€ Pod Informer
โ”‚ โ””โ”€โ”€ Custom Resource Informers
โ””โ”€โ”€ Controller Manager
โ”œโ”€โ”€ Migration Controller
โ””โ”€โ”€ Drain Controller

Diagram sources

Section sources

koord-device-daemonโ€‹

The koord-device-daemon discovers and labels heterogeneous devices (GPUs, NPUs, etc.) on nodes, running as a daemon that periodically scans and updates node labels.

Key configuration flags:

  • --oneshot: Single execution mode
  • --no-timestamp: Disable timestamp in labels
  • --sleep-interval: Device discovery frequency
  • --prints-output-file: Output file path for device info

Architecture components:

  • Resource Feature Discovery: Discovers and processes device information
  • Prints Writer: Outputs device information
  • Manager Map: Registry for different hardware type device managers
  • Configuration: Manages component configuration from files, environment, and CLI

koord-device-daemon Execution Flow:

1. Start
โ†“
2. Load Configuration
โ†“
3. Create Printers
โ†“
4. Generate Device Prints
โ†“
5. Output Device Prints
โ†“
6. Check Oneshot Mode
โ”œโ”€ Yes โ†’ Exit
โ””โ”€ No โ†’ Sleep Interval
โ†“
Rerun Discovery
โ†“
Return to Step 4

Diagram sources

Section sources

koord-runtime-proxyโ€‹

The koord-runtime-proxy acts as middleware between kubelet and container runtimes, intercepting CRI calls to apply resource management policies.

Key configuration flags:

  • --koord-runtimeproxy-endpoint: Service endpoint
  • --remote-runtime-service-endpoint: Backend runtime service
  • --backend-runtime-mode: Container engine (Containerd or Docker)
  • --runtime-hook-server-key/val: Runtime hook server identification

Supports two backend modes:

  • Containerd: CRI server for Containerd runtime
  • Docker: Docker server for Docker runtime

Architecture:

  • Runtime Manager Server: Abstract interface for runtime implementations
  • CRI Server/Docker Server: Runtime-specific implementations
  • Dispatcher: Routes CRI calls to handlers
  • Resource Executor: Applies resource policies to containers

koord-runtime-proxy Workflow:

koord-runtime-proxy
โ”œโ”€โ”€ Runtime Proxy Endpoint
โ”œโ”€โ”€ Backend Runtime Mode
โ”‚ โ”œโ”€โ”€ Containerd Mode
โ”‚ โ”‚ โ””โ”€โ”€ CRI Server
โ”‚ โ””โ”€โ”€ Docker Mode
โ”‚ โ””โ”€โ”€ Docker Server
โ””โ”€โ”€ Runtime Hook Server Key/Val
โ””โ”€โ”€ Skip Hook Server Pods

Data Flow:
Intercept CRI Calls โ†’ Apply Resource Policies โ†’ Forward to Backend Runtime

Diagram sources

Section sources

Component Communication and Integrationโ€‹

Koordinator components communicate through the Kubernetes API server and share configuration via ConfigMaps in a control plane/data plane pattern.

Primary communication patterns:

  • API Server Interactions: All components watch resources, update status, and create custom resources
  • Shared Configuration: ConfigMaps mounted as volumes or accessed via API server
  • Webhook Integration: koord-manager webhooks called during resource creation/updates
  • Metrics Collection: koordlet collects node metrics for scheduler decisions
  • Event Propagation: Events and status updates through API server

Integration workflow:

  1. koord-manager initializes controllers and webhooks
  2. koord-scheduler registers with Kubernetes scheduler framework
  3. koordlet starts on each node and collects metrics
  4. koord-device-daemon discovers and labels node devices
  5. koord-runtime-proxy intercepts container runtime calls
  6. Components coordinate through API server shared state

Component Communication Sequence:

Communication Flow:

1. koord-manager โ†’ API Server
- Watch CRDs and ConfigMaps

2. koord-scheduler โ†’ API Server
- Register as scheduler

3. koordlet โ†’ API Server
- Report node metrics

4. koord-device-daemon โ†’ API Server
- Update node device labels

5. koord-runtime-proxy โ†’ API Server
- Intercept CRI calls

6. API Server โ†’ koord-scheduler
- Provide scheduling metrics

7. koord-scheduler โ†’ API Server
- Schedule pods with QoS policies

8. API Server โ†’ koordlet
- Apply QoS policies to pods

Diagram sources

Operational Considerationsโ€‹

Effective Koordinator operation requires attention to configuration, monitoring, troubleshooting, and lifecycle management.

Configuration Management: Configuration managed through ConfigMaps mounted as volumes or via API server. Changes typically require component restarts.

Monitoring and Metrics: All components expose Prometheus metrics including:

  • Component health and readiness
  • API server request latencies and errors
  • Resource utilization and efficiency
  • Scheduling and descheduling performance
  • QoS policy enforcement statistics

Common Issues:

  • Webhook Timeouts: Increase timeout settings in large clusters
  • Resource Starvation: Adjust QoS policies and limits
  • Scheduling Failures: Verify node labels and taints/tolerations
  • Metric Collection: Check koordlet connectivity and permissions
  • Device Discovery: Verify device drivers and permissions

Lifecycle Management: Manage components as standard Kubernetes workloads with appropriate resources, limits, and probes. Perform rolling updates carefully, especially for koord-manager webhooks.

Section sources

Performance and Scalingโ€‹

Performance and scalability depend on cluster size, workload characteristics, and configuration.

koord-manager Scaling: Single replica sufficient for small/medium clusters. Large clusters may need multiple replicas. Tune webhook timeouts based on load.

koord-scheduler Performance: Influenced by enabled plugins, policy complexity, and scheduling frequency. Optimize plugin configuration and use efficient informer caches.

koordlet Resource Usage: Depends on metric collection frequency, monitored resources, and QoS policy complexity. Tune collection intervals and retention based on requirements.

Scaling Guidelines:

  • Monitor component resources and adjust requests/limits
  • Scale koord-manager based on API server load
  • Optimize scheduler plugins for workload patterns
  • Tune metric collection frequency
  • Use node affinity and taints for component placement

Performance and Scaling Configuration Flow:

Cluster Deployment Process:

1. Cluster Deployment
โ†“
2. Determine koord-manager Replicas
โ”œโ”€ Small Cluster โ†’ Deploy Single Replica
โ””โ”€ Large Cluster โ†’ Deploy Multiple Replicas
โ†“
3. Configure Webhook Settings
โ†“
4. Set Webhook Timeouts
โ†“
5. Configure Concurrency Settings
โ†“
6. Configure Request Queue
โ†“
7. Configure Retry Policy
โ†“
8. Monitor Performance
โ†“
9. Adjust Configuration as Needed
โ†“
Return to Step 3 (iterate as needed)

Diagram sources

Security and Best Practicesโ€‹

Deploy Koordinator securely with these best practices:

RBAC Configuration: Define minimum required permissions for each component following least privilege principle. RBAC configs are in config/rbac directory.

Network Security: Configure network policies to restrict component communication. Webhook server should only be accessible to API server.

Secret Management: Store sensitive configuration in Secrets, not ConfigMaps. Properly manage and rotate TLS certificates.

Production Best Practices:

  • Use dedicated namespaces for components
  • Implement resource requests and limits
  • Configure liveness and readiness probes
  • Enable leader election for control plane components
  • Regularly update to latest stable versions
  • Monitor logs and metrics for issues
  • Test configuration changes in non-production first
  • Implement backup and recovery procedures

Security Considerations:

  • Audit component permissions regularly
  • Keep components updated for security fixes
  • Limit webhook timeouts to prevent DoS
  • Use network policies to restrict communication
  • Implement proper logging and monitoring
  • Follow Kubernetes pod security best practices

Section sources