Gang scheduling
Gang scheduling ensures that a group or gang of related jobs only start when all required resources are available. Red Hat build of Kueue enables gang scheduling by suspending jobs until the OpenShift Container Platform cluster can guarantee the capacity to start and execute all of the related jobs in the gang together. This is also known as all-or-nothing scheduling.
Gang scheduling is important if you are working with expensive, limited resources, such as GPUs. Gang scheduling can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.
Configuring gang scheduling
As a cluster administrator, you can configure gang scheduling by modifying the gangScheduling spec in the Kueue custom resource (CR).
Kueue CR with gang scheduling configuredapiVersion: kueue.openshift.io/v1
kind: Kueue
metadata:
name: cluster
labels:
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: kueue-operator
namespace: openshift-kueue-operator
spec:
config:
gangScheduling:
policy: ByWorkload
byWorkload:
admission: Parallel
# ...
- You can set the
policyvalue to enable or disable gang scheduling. The possible values areByWorkload,None, or empty ("").ByWorkload-
When the
policyvalue is set toByWorkload, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time. None-
When the
policyvalue is set toNone, gang scheduling is disabled. - Empty (
"") -
When the
policyvalue is empty or set to"", the Red Hat build of Kueue Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.
- If the
policyvalue is set toByWorkload, you must configure job admission settings. The possible values for theadmissionspec areParallel,Sequential, or empty ("").Parallel-
When the
admissionvalue is set toParallel, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity. When a deadlock occurs, the successful scheduling of pods from another job can prevent the scheduling of pods from the current job. Sequential-
When the
admissionvalue is set toSequential, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, Red Hat build of Kueue processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully. - Empty (
"") -
When the
admissionvalue is empty or set to"", the Red Hat build of Kueue Operator determines job admission settings. Currently, theadmissionvalue is set toParallelby default.