Managing distributed workloads with the Leader Worker Set Operator
You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.
Installing the Leader Worker Set Operator
You can install the Leader Worker Set Operator through the OpenShift Container Platform web console to begin managing distributed AI workloads.
-
You have access to the cluster with
cluster-adminprivileges. -
You have access to the OpenShift Container Platform web console.
-
You have installed the cert-manager Operator for Red Hat OpenShift.
-
Log in to the OpenShift Container Platform web console.
-
Verify that the cert-manager Operator for Red Hat OpenShift is installed.
-
Install the Leader Worker Set Operator.
-
Navigate to Ecosystem → Software Catalog.
-
Enter Leader Worker Set Operator into the filter box.
-
Select the Leader Worker Set Operator and click Install.
-
On the Install Operator page:
-
The Update channel is set to stable-v1.0, which installs the latest stable release of Leader Worker Set Operator 1.0.
-
Under Installation mode, select A specific namespace on the cluster.
-
Under Installed Namespace, select Operator recommended Namespace: openshift-lws-operator.
-
Under Update approval, select one of the following update strategies:
-
The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
-
The Manual strategy requires a user with appropriate credentials to approve the Operator update.
-
-
Click Install.
-
-
-
Create the custom resource (CR) for the Leader Worker Set Operator:
-
Navigate to Installed Operators → Leader Worker Set Operator.
-
Under Provided APIs, click Create instance in the LeaderWorkerSetOperator pane.
-
Click Create.
-
Deploying a leader worker set
You can use the Leader Worker Set Operator to deploy a leader worker set to assist with managing distributed workloads across nodes.
-
You have installed the Leader Worker Set Operator.
-
Create a new project by running the following command:
$ oc new-project my-namespace -
Create a file named
leader-worker-set.yamlapiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: generation: 1 name: my-lws namespace: my-namespace spec: leaderWorkerTemplate: leaderTemplate: metadata: {} spec: containers: - image: nginxinc/nginx-unprivileged:1.27 name: leader resources: {} restartPolicy: RecreateGroupOnPodRestart size: 3 workerTemplate: metadata: {} spec: containers: - image: nginxinc/nginx-unprivileged:1.27 name: worker ports: - containerPort: 8080 protocol: TCP resources: {} networkConfig: subdomainPolicy: Shared replicas: 2 rolloutStrategy: rollingUpdateConfiguration: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate startupPolicy: LeaderCreatedwhere:
metadata.name-
Specifies the name of the leader worker set resource.
metadata.namespace-
Specifies the namespace for the leader worker set to run in.
spec.leaderWorkerTemplate.leaderTemplate-
Specifies the pod template for the leader pods.
spec.leaderWorkerTemplate.restartPolicy-
Specifies the restart policy for when pod failures occur. Allowed values are
RecreateGroupOnPodRestartto restart the whole group orNoneto not restart the group. spec.leaderWorkerTemplate.size-
Specifies the number of pods to create for each group, including the leader pod. For example, a value of
3creates 1 leader pod and 2 worker pods. The default value is1. spec.leaderWorkerTemplate.workerTemplate-
Specifies the pod template for the worker pods.
spec.networkConfig.subdomainPolicy-
Specifies the policy to use when creating the headless service. Allowed values are
UniquePerReplicaorShared. The default value isShared. spec.replicas-
Specifies the number of replicas, or leader-worker groups. The default value is
1. spec.rolloutStrategy.rollingUpdateConfiguration.maxSurge-
Specifies the maximum number of replicas that can be scheduled above the
replicasvalue during rolling updates. The value can be specified as an integer or a percentage.
For more information on all available fields to configure, see LeaderWorkerSet API upstream documentation.
-
Apply the leader worker set configuration by running the following command:
$ oc apply -f leader-worker-set.yaml
-
Verify that pods were created by running the following command:
$ oc get pods -n my-namespaceExample outputNAME READY STATUS RESTARTS AGE my-lws-0 1/1 Running 0 4s my-lws-0-1 1/1 Running 0 3s my-lws-0-2 1/1 Running 0 3s my-lws-1 1/1 Running 0 7s my-lws-1-1 1/1 Running 0 6s my-lws-1-2 1/1 Running 0 6s-
my-lws-0is the leader pod for the first group. -
my-lws-1is the leader pod for the second group.
-
-
Review the stateful sets by running the following command:
$ oc get statefulsetsExample outputNAME READY AGE my-lws 4/4 111s my-lws-0 2/2 57s my-lws-1 2/2 60s-
my-lwsis the leader stateful set for all leader-worker groups. -
my-lws-0is the worker stateful set for the first group. -
my-lws-1is the worker stateful set for the second group.
-