Component Guide
Introductionโ
Koordinator is a QoS-based scheduling system that enhances Kubernetes cluster efficiency and reliability for hybrid workloads. This guide documents each component's purpose, architecture, configuration, and operations. Components communicate through the Kubernetes API server and share configuration via ConfigMaps for coordinated resource management.
koord-managerโ
The koord-manager is Koordinator's control plane, managing CRDs and webhooks while coordinating subsystems through leader election. It initializes controllers, webhooks, and shared informers for cluster event processing.
Key configuration options:
--enable-leader-election: Enable high availability--metrics-addr: Expose monitoring metrics--feature-gates: Control alpha/beta features--config-namespace: Specify configuration namespace- Webhook server: Runs on port 9876 for admission control
Component Interaction:
koord-manager
โโโ Webhook Server (Port 9876)
โโโ Leader Election (High Availability)
โโโ Coordinates with koord-scheduler
โโโ Coordinates with koordlet
Diagram sources
Section sources
koord-schedulerโ
The koord-scheduler extends Kubernetes scheduler with advanced capabilities for co-located workloads through a plugin-based architecture.
Configuration uses the --config flag pointing to a YAML file (typically koord-scheduler-config ConfigMap). The configuration extends Kubernetes scheduler schema with Koordinator-specific components.
Key scheduling plugins:
- LoadAwareScheduling: Real-time node resource utilization
- NodeNUMAResource: NUMA-aware CPU and memory allocation
- Reservation: Resource reservation with preemption support
- ElasticQuota: Dynamic quota allocation and eviction
- Coscheduling: Gang scheduling for pod groups
- DeviceShare: Shared device management (GPU, RDMA, FPGA)
Scheduler Configuration Hierarchy:
Scheduler Configuration
โโโ Global Settings
โ โโโ InsecureServing
โโโ Plugin Configurations
โ โโโ LoadAwareScheduling (Load-aware scheduling)
โ โโโ NodeNUMAResource (NUMA resource management)
โ โโโ Reservation (Resource reservation)
โ โโโ ElasticQuota (Elastic quota)
โ โโโ Coscheduling (Gang scheduling)
โ โโโ DeviceShare (Device sharing)
โโโ Framework Extensions
โโโ ServicesEngine
Diagram sources
Section sources
koordletโ
The koordlet runs as a daemon on each node, collecting metrics, enforcing QoS policies, and managing runtime hooks.
Configuration uses ConfigMaps with these key settings:
- ConfigMap name and namespace
- States informer configuration
- Metric cache settings
- QoS manager configuration
- Runtime hook configuration
- Audit and prediction settings
Architecture subsystems:
- MetricCache: Stores collected metrics with pluggable backends
- MetricsAdvisor: Analyzes metrics for optimization recommendations
- QOSManager: Enforces QoS policies and resource allocation
- RuntimeHooks: Integrates with container runtimes
- Prediction: Provides resource usage prediction
- StatesInformer: Maintains consistent pod and node state views
koordlet Daemon Components:
Core components and responsibilities:
- daemon: Main daemon process containing subsystems
- MetricAdvisor: Metrics analyzer
Run(stopCh): Run main loopHasSynced(): Check sync status
- StatesInformer: State notifier
Run(stopCh): Run main loopHasSynced(): Check sync status
- MetricCache: Metrics cache
Run(stopCh): Run main loop
- QOSManager: QoS manager
Run(stopCh): Run main loop
- RuntimeHook: Runtime hooks
Run(stopCh): Run main loop
- PredictServer: Prediction server
Setup(): Setup dependenciesRun(stopCh): Run main loop
- ResourceUpdateExecutor: Resource update executor
Run(stopCh): Run main loop
- extensionControllers: Extension controllers list
- MetricAdvisor: Metrics analyzer
Diagram sources
Section sources
koord-deschedulerโ
The koord-descheduler identifies and evicts pods to improve resource utilization and cluster balance through a modular, profile-based architecture.
Operates as a Kubernetes controller manager with:
- Descheduler Core: Coordinates descheduling policies
- Controller Manager: Manages reconciliation loops for custom resources
- Profiles: Define enabled plugins and configurations
- Plugins: Implement descheduling strategies (deschedule, balance, filter, evict)
- Informer Factory: Maintains cached cluster resource views
- Eviction Limiter: Controls pod eviction rate to prevent disruption
koord-descheduler Architecture:
koord-descheduler
โโโ Descheduler Core
โ โโโ Profiles
โ โ โโโ Deschedule Plugins
โ โ โโโ Balance Plugins
โ โ โโโ Filter Plugins
โ โ โโโ Evict Plugins
โ โโโ Eviction Limiter
โ โโโ Informer Factory
โ โโโ Node Informer
โ โโโ Pod Informer
โ โโโ Custom Resource Informers
โโโ Controller Manager
โโโ Migration Controller
โโโ Drain Controller
Diagram sources
Section sources
koord-device-daemonโ
The koord-device-daemon discovers and labels heterogeneous devices (GPUs, NPUs, etc.) on nodes, running as a daemon that periodically scans and updates node labels.
Key configuration flags:
--oneshot: Single execution mode--no-timestamp: Disable timestamp in labels--sleep-interval: Device discovery frequency--prints-output-file: Output file path for device info
Architecture components:
- Resource Feature Discovery: Discovers and processes device information
- Prints Writer: Outputs device information
- Manager Map: Registry for different hardware type device managers
- Configuration: Manages component configuration from files, environment, and CLI
koord-device-daemon Execution Flow:
1. Start
โ
2. Load Configuration
โ
3. Create Printers
โ
4. Generate Device Prints
โ
5. Output Device Prints
โ
6. Check Oneshot Mode
โโ Yes โ Exit
โโ No โ Sleep Interval
โ
Rerun Discovery
โ
Return to Step 4
Diagram sources
Section sources
koord-runtime-proxyโ
The koord-runtime-proxy acts as middleware between kubelet and container runtimes, intercepting CRI calls to apply resource management policies.
Key configuration flags:
--koord-runtimeproxy-endpoint: Service endpoint--remote-runtime-service-endpoint: Backend runtime service--backend-runtime-mode: Container engine (Containerd or Docker)--runtime-hook-server-key/val: Runtime hook server identification
Supports two backend modes:
- Containerd: CRI server for Containerd runtime
- Docker: Docker server for Docker runtime
Architecture:
- Runtime Manager Server: Abstract interface for runtime implementations
- CRI Server/Docker Server: Runtime-specific implementations
- Dispatcher: Routes CRI calls to handlers
- Resource Executor: Applies resource policies to containers
koord-runtime-proxy Workflow:
koord-runtime-proxy
โโโ Runtime Proxy Endpoint
โโโ Backend Runtime Mode
โ โโโ Containerd Mode
โ โ โโโ CRI Server
โ โโโ Docker Mode
โ โโโ Docker Server
โโโ Runtime Hook Server Key/Val
โโโ Skip Hook Server Pods
Data Flow:
Intercept CRI Calls โ Apply Resource Policies โ Forward to Backend Runtime
Diagram sources
Section sources
Component Communication and Integrationโ
Koordinator components communicate through the Kubernetes API server and share configuration via ConfigMaps in a control plane/data plane pattern.
Primary communication patterns:
- API Server Interactions: All components watch resources, update status, and create custom resources
- Shared Configuration: ConfigMaps mounted as volumes or accessed via API server
- Webhook Integration: koord-manager webhooks called during resource creation/updates
- Metrics Collection: koordlet collects node metrics for scheduler decisions
- Event Propagation: Events and status updates through API server
Integration workflow:
- koord-manager initializes controllers and webhooks
- koord-scheduler registers with Kubernetes scheduler framework
- koordlet starts on each node and collects metrics
- koord-device-daemon discovers and labels node devices
- koord-runtime-proxy intercepts container runtime calls
- Components coordinate through API server shared state
Component Communication Sequence:
Communication Flow:
1. koord-manager โ API Server
- Watch CRDs and ConfigMaps
2. koord-scheduler โ API Server
- Register as scheduler
3. koordlet โ API Server
- Report node metrics
4. koord-device-daemon โ API Server
- Update node device labels
5. koord-runtime-proxy โ API Server
- Intercept CRI calls
6. API Server โ koord-scheduler
- Provide scheduling metrics
7. koord-scheduler โ API Server
- Schedule pods with QoS policies
8. API Server โ koordlet
- Apply QoS policies to pods
Diagram sources
Operational Considerationsโ
Effective Koordinator operation requires attention to configuration, monitoring, troubleshooting, and lifecycle management.
Configuration Management: Configuration managed through ConfigMaps mounted as volumes or via API server. Changes typically require component restarts.
Monitoring and Metrics: All components expose Prometheus metrics including:
- Component health and readiness
- API server request latencies and errors
- Resource utilization and efficiency
- Scheduling and descheduling performance
- QoS policy enforcement statistics
Common Issues:
- Webhook Timeouts: Increase timeout settings in large clusters
- Resource Starvation: Adjust QoS policies and limits
- Scheduling Failures: Verify node labels and taints/tolerations
- Metric Collection: Check koordlet connectivity and permissions
- Device Discovery: Verify device drivers and permissions
Lifecycle Management: Manage components as standard Kubernetes workloads with appropriate resources, limits, and probes. Perform rolling updates carefully, especially for koord-manager webhooks.
Section sources
Performance and Scalingโ
Performance and scalability depend on cluster size, workload characteristics, and configuration.
koord-manager Scaling: Single replica sufficient for small/medium clusters. Large clusters may need multiple replicas. Tune webhook timeouts based on load.
koord-scheduler Performance: Influenced by enabled plugins, policy complexity, and scheduling frequency. Optimize plugin configuration and use efficient informer caches.
koordlet Resource Usage: Depends on metric collection frequency, monitored resources, and QoS policy complexity. Tune collection intervals and retention based on requirements.
Scaling Guidelines:
- Monitor component resources and adjust requests/limits
- Scale koord-manager based on API server load
- Optimize scheduler plugins for workload patterns
- Tune metric collection frequency
- Use node affinity and taints for component placement
Performance and Scaling Configuration Flow:
Cluster Deployment Process:
1. Cluster Deployment
โ
2. Determine koord-manager Replicas
โโ Small Cluster โ Deploy Single Replica
โโ Large Cluster โ Deploy Multiple Replicas
โ
3. Configure Webhook Settings
โ
4. Set Webhook Timeouts
โ
5. Configure Concurrency Settings
โ
6. Configure Request Queue
โ
7. Configure Retry Policy
โ
8. Monitor Performance
โ
9. Adjust Configuration as Needed
โ
Return to Step 3 (iterate as needed)
Diagram sources
Security and Best Practicesโ
Deploy Koordinator securely with these best practices:
RBAC Configuration: Define minimum required permissions for each component following least privilege principle. RBAC configs are in config/rbac directory.
Network Security: Configure network policies to restrict component communication. Webhook server should only be accessible to API server.
Secret Management: Store sensitive configuration in Secrets, not ConfigMaps. Properly manage and rotate TLS certificates.
Production Best Practices:
- Use dedicated namespaces for components
- Implement resource requests and limits
- Configure liveness and readiness probes
- Enable leader election for control plane components
- Regularly update to latest stable versions
- Monitor logs and metrics for issues
- Test configuration changes in non-production first
- Implement backup and recovery procedures
Security Considerations:
- Audit component permissions regularly
- Keep components updated for security fixes
- Limit webhook timeouts to prevent DoS
- Use network policies to restrict communication
- Implement proper logging and monitoring
- Follow Kubernetes pod security best practices
Section sources