ubuntu18.04搭建k8s集群

上个月为小组搭建一个k8s的nvidia gpu集群,在此记录一下,以免以后忘记。

本次搭建采用的ubuntu18.04 server ,docker版本采用的19.03.2,k8s版本是1.15.2。

name  version
ubuntu server     18.04          
docker 19.03.2
k8s       1.15.2

 

 

 

 

 

搭建集群之前需要安装nvidia显卡驱动,这里就不在赘述如何安装驱动。

集群需要设置固定ip,dns,否则容器可能不能访问外网。

通过shell脚本文件自动安装,install.sh文件如下:

 1 #!/bin/bash
 2 #安装ftp客户端
 3 sudo apt-get install lftp
 4 #修改时区
 5 ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
 6 bash -c "echo 'Asia/Shanghai' > /etc/timezone"
 7 
 8 #替换apt源为阿里源,先备份
 9 echo "替换apt源为阿里源"
10 sudo mv /etc/apt/sources.list /etc/apt/sources.list.bak
11 sudo rm -f /etc/apt/sources.list.save
12 sudo cp -f sources.list /etc/apt
13 sudo apt-get update
14 
15 #安装docker
16 sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
17 curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
18 sudo add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
19 sudo apt-get update
20 sudo apt-get install -y docker-ce=5:19.03.2~3-0~ubuntu-bionic docker-ce-cli=5:19.03.2~3-0~ubuntu-bionic
21 
22 #安装nvidia-container,请确保已经安装了nvidia显卡驱动
23 distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
24 curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
25 curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
26 sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
27 apt-get install -y nvidia-container-runtime
28 
29 #docker配置文件
30 mkdir -p /etc/docker
31 cp -f daemon.json /etc/docker
32 systemctl daemon-reload
33 systemctl restart docker
34 
35 #安装k8s组件
36 curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
37 echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
38 sudo apt-get update
39 sudo apt install -y kubelet=1.15.2-00 kubeadm=1.15.2-00 kubectl=1.15.2-00
40 sudo apt-mark hold kubelet=1.15.2-00 kubeadm=1.15.2-00 kubectl=1.15.2-00
41 cp -f 10-kubeadm.conf /etc/systemd/system/kubelet.service.d/
42 
43 #dns设置
44 cp -f resolved.conf /etc/systemd/resolved.conf
45 systemctl restart systemd-resolved

以上就是安装脚本,其中阿里apt源文件如下:

#sources.list
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

docker daemon.json文件如下(如果没有GPU,将default-runtimte以及runtimes这两项删除):

{
    "exec-opts": ["native.cgroupdriver=systemd"],
    "registry-mirrors":["http://hub-mirror.c.163.com","https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://pee6w651.mirror.aliyuncs.com"],
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }   
    }   
}

kubeadm的配置文件10-kubeadm.conf如下

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice" 
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

ubuntu18.04 静态ip设置通过netplan方式,文件为50-cloud-init.yaml,格式如下:

# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        enp4s0:
            dhcp4: no
            addresses: [10.254.18.6/24]
            gateway4: 10.254.18.1 
    version: 2

dns配置文件resolved.conf,格式如下:

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See resolved.conf(5) for details

[Resolve]
DNS=192.168.110.213 114.114.114.114
#FallbackDNS=
#Domains=
LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes
#DNSStubListener=yes

将上述shell脚本文件install.sh、阿里源sources.list文件、docker的daemon.json文件、静态ip设置文件50-cloud-init.yaml、dns配置文件resolved.conf放在同一目录,然后运行bash install.sh即可自动安装。

如果需要安装其他版本软件,修改脚本文件即可。

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

以上步骤需要在每台机器上面执行,如何初始化k8s集群,以及如何添加节点到k8s集群中,可以根据https://blog.csdn.net/shykevin/article/details/98811021文章进行操作,但是文章中有一个地方需要注意,

sudo kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.15.2 --pod-network-cidr=192.169.0.0/16

 这里的pod-network-cidr使用的192.169.0.0,应该修改为192.168.0.0,因为calico网络插件配置文件(http://mirror.faasx.com/k8s/calico/v3.3.2/calico.yaml

- name: CALICO_IPV4POOL_CIDR
  value: "192.168.0.0/16"

否则,容器将无法访问外网,192.169.0.0已经是外网IP了,文章中应该是写错了。

gpu插件采用的是nvidia-device-plugin,如下:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

参考文档如下:https://feisky.gitbooks.io/kubernetes/content/plugins/device.html

 

posted @ 2019-12-10 12:25  一瞬光阴  阅读(3702)  评论(0编辑  收藏  举报