您的位置:首页 > 其它

Coscheduling

2021-04-26 16:04 120 查看

背景

Kubernetes 目前已经广泛的应用于在线服务编排,为了提升集群的的利用率和运行效率,我们希望将 Kubernetes 作为一个统一的管理平台来管理在线服务和离线作业。默认的调度器是以 Pod 为调度单元进行依次调度,
不会考虑 Pod 之间的相互关系。但是很多数据计算类的离线作业具有组合调度的特点,即要求所有的子任务都能够成功创建后,整个作业才能正常运行。如果只有部分子任务启动的话,启动的子任务将持续等待剩余的子任务
被调度。这正是 Gang Scheduling 的场景。
在 Coscheduling 的具体实现过程中,根据是否允许“碎片”存在,可以细分为 Explicit Coscheduling,Local Coscheduling 和 Implicit Coscheduling。 其中 Explicit Coscheduling 就是大家常听
到的 Gang Scheduling。Gang Scheduling 要求完全不允许有“碎片”存在, 也就是“All or Nothing”。

前提条件

1.支持 Kubernetes 1.16 以上版本
2.选择创建 ACK 提供的标准专有集群(阿里云k8s集群、kubeadm安装集群),不支持rancher安装的k8s集群(已测试过)
3.保证集群节点可以访问公网
4.master节点安装helm v3
5.不支持cpu和memory的限制
6 支持 nvidia.com/gpu

修改配置内容

wget http://kubeflow.oss-cn-beijing.aliyuncs.com/ack-coscheduling.tar.gz # 下载chart 包
tar zxvf ack-coscheduling.tar.gz                                         #解压
cd ack-coscheduling
vim values.yaml  #修改schedulerCount 默认是3,修改为集群master节点的数量
cd templates
# 注释以下内容 (只有阿里云的scheduler pod才有component这个标签,如果阿里云k8s集群,不需要注释)
#affinity:
#  podAffinity:
#     requiredDuringSchedulingIgnoredDuringExecution:
#        - labelSelector:
#             matchExpressions:
#                - key: component
#                  operator: In
#                  values:
#                     - kube-scheduler
#          topologyKey: kubernetes.io/hostname

安装

$  helm install ack-coscheduling -n kube-system ./ack-coscheduling
$  kubectl get pods -n kube-system -w # 查看pod执行完成
scheduler-update-clusterrole-7g6pd     0/1     Completed   0          25m
scheduler-update-lrgj8                 0/1     Completed   0          25m
$ kubectl logs scheduler-update-lrgj8  -n kube-system  #查看执行日志
#出现以下内容表示安装成功
DEBUG update /etc/kubernetes/manifests/kube-scheduler.yaml succeed

验证

1.运行tfjob,通过注释和开启验证调度正确性
2.注释pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu 和 pod-group.scheduling.sigs.k8s.io/min-available: "5",  占用2块GPU卡
[root@master ~]# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
tf-smoke-gpu-ps-0       1/1     Running   0          3s
tf-smoke-gpu-worker-0   1/1     Running   0          4s
tf-smoke-gpu-worker-1   1/1     Running   0          4s
tf-smoke-gpu-worker-2   0/1     Pending   0          4s
tf-smoke-gpu-worker-3   0/1     Pending   0          4s
1. 开启pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu 和 pod-group.scheduling.sigs.k8s.io/min-available: "5",所有pod处于pending状态
[root@master ~]# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
tf-smoke-gpu-ps-0       0/1     Pending   0          29m
tf-smoke-gpu-worker-0   0/1     Pending   0          29m
tf-smoke-gpu-worker-1   0/1     Pending   0          29m
tf-smoke-gpu-worker-2   0/1     Pending   0          29m
tf-smoke-gpu-worker-3   0/1     Pending   0          29m
apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "tf-smoke-gpu"
spec:
tfReplicaSpecs:
PS:
replicas: 1
template:
metadata:
creationTimestamp: null
labels:
pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu  # podGroup 名称
pod-group.scheduling.sigs.k8s.io/min-available: "5"  # 需要的pod数量
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
- --local_parameter_device=cpu
- --device=cpu
- --data_format=NHWC
image: registry.cn-hangzhou.aliyuncs.com/kubeflow-images-public/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
name: tensorflow
ports:
- containerPort: 2222
name: tfjob-port
resources:
requests:
cpu: '3'
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
Worker:
replicas: 4
template:
metadata:
creationTimestamp: null
labels:
pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu
pod-group.scheduling.sigs.k8s.io/min-available: "5"
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
- --local_parameter_device=cpu
- --device=gpu
- --data_format=NHWC
image: registry.cn-hangzhou.aliyuncs.com/kubeflow-images-public/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
name: tensorflow
ports:
- containerPort: 2222
name: tfjob-port
resources:
requestss:
cpu: 2
memory: "3Gi"
limits:
nvidia.com/gpu: 1
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure

卸载

helm uninstall ack-coscheduling -n kube-system
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: