Koordinator defines a CRD-based Pod migration API called
PodMigrationJob, through which the descheduler or other automatic fault recovery components can evict or delete Pods more safely.
Migrating Pods is an important capability that many components (such as deschedulers) rely on, and can be used to optimize scheduling or help resolve workload runtime quality issues. We believe that pod migration is a complex process, involving steps such as auditing, resource allocation, and application startup, and is mixed with application upgrading, scaling scenarios, and resource operation and maintenance operations by cluster administrators. Therefore, how to manage the stability risk of this process to ensure that the application does not fail due to the migration of Pods is a very critical issue that must be resolved.
Based on the final state-oriented migration capability of the PodMigrationJob CRD, we can track the status of each process during the migration process, perceive scenarios such as application upgrades and scaling to ensure the stability of the workload.
- Kubernetes >= 1.18
- Koordinator >= 0.6
Please make sure Koordinator components are correctly installed in your cluster. If not, please refer to Installation.
PodMigrationJob is Enabled by default. You can use it without any modification on the koord-descheduler config.
- Create a Deployment
pod-demowith the YAML file below.
$ kubectl create -f pod-demo.yaml
- Check the scheduled result of the pod
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-demo-5f9b977566-c7lvk 1/1 Running 0 41s 10.17.0.9 node-0 <none> <none>
pod-demo-5f9b977566-c7lvk is scheduled on the node
- Create a
PodMigrationJobwith the YAML file below to migrate
$ kubectl create -f migrationjob-demo.yaml
- Query migration status
$ kubectl get podmigrationjob migrationjob-demo
NAME PHASE STATUS AGE NODE RESERVATION PODNAMESPACE POD NEWPOD TTL
migrationjob-demo Succeed Complete 37s node-1 d56659ab-ba16-47a2-821d-22d6ba49258e default pod-demo-5f9b977566-c7lvk pod-demo-5f9b977566-nxjdf 5m0s
From the above results, it can be observed that:
- PHASE is
Succeed, STATUS is
Complete, indicating that the migration is successful.
node-1indicates the node where the new Pod is scheduled after the migration.
d56659ab-ba16-47a2-821d-22d6ba49258eis the Reservation created during migration. The PodMigrationJob Controller will try to create the reserved resource for the Reservation before starting to evict the Pod. After the reservation is successful, the eviction will be initiated, which can ensure that the new Pod must be expelled. There are resources available.
defaultrepresents the namespace where the migrated Pod is located,
pod-demo-5f9b977566-c7lvkrepresents the Pod to be migrated,
pod-demo-5f9b977566-nxjdfis the newly created Pod after migration.
- TTL indicates the TTL period of the current Job.
- Query migration events
PodMigrationJob Controller will create Events for important steps in the migration process to help users diagnose migration problems
$ kubectl describe podmigrationjob migrationjob-demo
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ReservationCreated 8m33s koord-descheduler Successfully create Reservation "d56659ab-ba16-47a2-821d-22d6ba49258e"
Normal ReservationScheduled 8m33s koord-descheduler Assigned Reservation "d56659ab-ba16-47a2-821d-22d6ba49258e" to node "node-1"
Normal Evicting 8m33s koord-descheduler Try to evict Pod "default/pod-demo-5f9b977566-c7lvk"
Normal EvictComplete 8m koord-descheduler Pod "default/pod-demo-5f9b977566-c7lvk" has been evicted
Normal Complete 8m koord-descheduler Bind Pod "default/pod-demo-5f9b977566-nxjdf" in Reservation "d56659ab-ba16-47a2-821d-22d6ba49258e"
The latest API can be found in
Example: Manually confirm whether the migration is allowed
Eviction or migration operations that bring risks to the stability, so it is hoped to manually check and confirm that there is no error before initiating the migration operation, and then initiate the migration.
Therefore, when creating a PodMigrationJob, set
true, and set
false after manually confirming that execution is allowed.
If you refuse to execute, you can update
status.phase=Failed to terminate the execution of the PodMigrationJob immediately or wait for the PodMigrationJob to expire automatically.
# paused indicates whether the PodMigrationJob should to work or not.
# ttl controls the PodMigrationJob timeout duration.
Example: Just want to evict Pods, no need to reserve resources
PodMigrationJob provides two migration modes:
EvictDirectlyis directly evict Pod, no need to reserve resources,
ReservationFirstreserves resources first to ensure that resources can be allocated before initiating eviction.
If just want to evict Pods, just set
Example: Use reserved resources when migrating
In some scenarios, resources are reserved first, and then a PodMigrationJob is created after success. The arbitration mechanism provided by the PodMigrationJob Controller (BTW: will be implemented in v0.7) is reused to ensure workload stability.
# the reservation-0 created before creating PodMigrationJob
Example: Evicting Pods Gracefully
PodMigrationJob supports graceful eviction of pods.
# The duration in seconds before the object should be deleted. Value must be non-negative integer.
# The value zero indicates delete immediately. If this value is nil, the default grace period for the
# specified type will be used.
# Defaults to a per object value if not specified. zero means delete immediately.