构建kubernetes集群
构建Kubernetes集群
这里使用kubeadm
方式去构建K8s集群
Kubeadm也是一个工具,提供kubeadm init和kubeadm join,用于快速部署Kubernetes集群。
官方地址:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm/
获取镜像
下载镜像
docker pull k8s.gcr.io/kube-apiserver:v1.16.1
docker pull k8s.gcr.io/kube-proxy:v1.16.1
docker pull k8s.gcr.io/kube-controller-manager:v1.16.1
docker pull k8s.gcr.io/kube-scheduler:v1.16.1
docker pull k8s.gcr.io/etcd:3.3.15
docker pull k8s.gcr.io/pause:3.1
docker pull k8s.gcr.io/coredns:1.6.2
可能会遇到谷歌的k8s镜像库下载不下来的情况,所以这一步先不做
【推荐】拉取特定 Kubernetes 版本所需的容器镜像
kubeadm config images pull --kubernetes-version=v1.22.2 --image-repository registry.aliyuncs.com/google_containers
如果使用这种方法,在使用kubeadm
初始化时,加上
--image-repository registry.aliyuncs.com/google_containers
特别说明
所有机器都必须有镜像
每次部署都会有版本更新,具体版本要求,运行初始化过程失败会有版本提示
kubeadm的版本和镜像的版本必须是对应的
安装docker (都做)
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine
yum install -y yum-utils device-mapper-persistent-data lvm2 git
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install docker-ce -y
systemctl start docker && systemctl enable docker
通过阿里仓库下载镜像 (都做)
[root@k8s-master ~]# cat pull.sh
#!/usr/bin/bash
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5
下载完了之后需要将aliyun下载下来的所有镜像打成google的tag
[root@k8s-master ~]# cat tag.sh
#!/usr/bin/bash
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.2 k8s.gcr.io/kube-controller-manager:v1.22.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.2 k8s.gcr.io/kube-proxy:v1.22.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.2 k8s.gcr.io/kube-apiserver:v1.22.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.2 k8s.gcr.io/kube-scheduler:v1.22.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.4 k8s.gcr.io/coredns/coredns:v1.8.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0 k8s.gcr.io/etcd:3.5.0-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5 k8s.gcr.io/pause:3.5
给予脚本执行权限
chmod +x pull.sh tag.sh
先执行pull.sh
镜像拉取完成后再执行tag.sh
开始安装
环境准备(都做)
我准备了三台机器
192.168.60.134 k8s-master
192.168.60.135 k8s-node-1
192.168.60.136 k8s-node-2
制作本地解析,修改主机名,相互解析
#将上面内容IP替换为自己的写入
vim /etc/hosts
所有机器系统配置
1.关闭防火墙:
# systemctl stop firewalld
# systemctl disable firewalld
2.禁用SELinux:
# setenforce 0
3.编辑文件/etc/selinux/config,将SELINUX修改为disabled,如下:
# sed -i 's/SELINUX=permissive/SELINUX=disabled/' /etc/sysconfig/selinux
SELINUX=disabled
4.时间同步
# yum install -y ntpdate
# ntpdate ntp.aliyun.com
关闭系统swap分区
1.5版本之后的新规定
Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。方法一,通过kubelet的启动参数–fail-swap-on=false更改这个限制。方法二,关闭系统的Swap
#临时关闭
[root@localhost /]# swapoff -a
修改/etc/fstab文件,注释掉SWAP的自动挂载,使用free -m确认swap已经关闭。
2.注释掉swap分区:
#永久关闭
[root@localhost /]# sed -i 's/.*swap.*/#&/' /etc/fstab
#查看
[root@localhost /]# free -m
total used free shared buff/cache available
Mem: 3935 144 3415 8 375 3518
Swap: 0 0 0
使用Kubeadm部署Kubernetes(都做)
配置源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
-
安装最新版本(不使用这个)
yum makecache fast
yum install -y kubelet kubeadm kubectl ipvsadm
-
安装对应版本(使用这个)
因为我需要安装1.22.2版本的k8s
yum install -y kubelet-1.22.2-0.x86_64 kubeadm-1.22.2-0.x86_64 kubectl-1.22.2-0.x86_64 ipvsadm
-
加载ipvs相关内核模块
如果重新开机,需要重新加载(可以写在 /etc/rc.local 中开机自动加载)
# vim /etc/rc.local ============================== modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack_ipv4 ==============================
-
给予执行权限
chmod +x /etc/rc.local
-
重启
reboot
-
配置:
配置转发相关参数,否则可能会出错
cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 vm.swappiness=0 EOF
-
使配置生效
sysctl --system
-
如果net.bridge.bridge-nf-call-iptables报错,加载br_netfilter模块
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
-
查看是否加载成功
# lsmod | grep ip_vs ip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145458 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 139264 7 ip_vs,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4 libcrc32c 12644 4 xfs,ip_vs,nf_nat,nf_conntrack
配置启动Kubelet(都做)
-
配置kubelet使用pause镜像(不推荐)【直接执行下面的第一步】
获取docker的systemd
DOCKER_CGROUPS=$(docker info | grep 'Cgroup' | cut -d' ' -f4)
echo $DOCKER_CGROUPS
-
配置变量:
DOCKER_CGROUPS=`docker info |grep 'Cgroup' | awk ' NR==1 {print $3}'`
[root@k8s-master ~]# echo $DOCKER_CGROUPS cgroupfs
-
配置kubelet的cgroups
-
国内的源会报错,我们用谷歌的
-
国内(不推荐)
cat >/etc/sysconfig/kubelet<<EOF KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1" EOF
-
谷歌(推荐)
cat >/etc/sysconfig/kubelet<<EOF KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=k8s.gcr.io/pause:3.5" EOF
-
-
启动
systemctl daemon-reload
systemctl enable kubelet && systemctl restart kubelet
-
注意!!!(并不是有问题)
在这里使用 # systemctl status kubelet,你会发现报错误信息; 10月 11 00:26:43 node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a 10月 11 00:26:43 node1 systemd[1]: Unit kubelet.service entered failed state. 10月 11 00:26:43 node1 systemd[1]: kubelet.service failed. 运行 # journalctl -xefu kubelet 命令查看systemd日志才发现,真正的错误是: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory #这个错误在运行kubeadm init 生成CA证书后会被自动解决,此处可先忽略。 #简单地说就是在kubeadm init 之前kubelet会不断重启。
-
配置master节点
运行初始化
[root@master]# kubeadm init --kubernetes-version=v1.22.2 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.60.134 --ignore-preflight-errors=Swap
注意:
apiserver-advertise-address=192.168.60.134 ---master的ip地址。
--kubernetes-version=v1.22.2 --更具具体版本进行修改
注意在检查一下swap分区是否关闭
如果报错会有版本提示,那就是有更新新版本了
[init] Using Kubernetes version: v1.22.2
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 24.0.7. Latest validated version: 20.10
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.60.134]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.60.134 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.60.134 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 7.004532 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: lcuyzk.xydd3lli4u74a0uc
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.60.134:6443 --token lcuyzk.xydd3lli4u74a0uc \
--discovery-token-ca-cert-hash sha256:5dd1dabc5c8843adf70384a3775559c453b47b8b53af071a96eae11390f4ec8a
上面记录了完成的初始化输出的内容,根据输出的内容基本上可以看出手动初始化安装一个Kubernetes集群所需要的关键步骤。
其中有以下关键内容:
[kubelet] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”
[certificates]生成相关的各种证书
[kubeconfig]生成相关的kubeconfig文件
[bootstraptoken]生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
打印的内容最后有这一段tokens
kubeadm join 192.168.60.134:6443 --token lcuyzk.xydd3lli4u74a0uc \
--discovery-token-ca-cert-hash sha256:5dd1dabc5c8843adf70384a3775559c453b47b8b53af071a96eae11390f4ec8a
记得保存
下来,每个人都不一样,且存在时间只有24小时
配置使用kubectl
在master节点
中操作:
其实可以在上面的内容中看到操作步骤
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
配置使用网络插件
在master节点
操作
-
下载配置
cd ~ && mkdir flannel && cd flannel
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
下载flannel文件,由于网站被墙了,所以可以试试我的加速器
curl -O https://github.muou666.com/https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
-
进入配置文件
vim kube-flannel.yml
#搜索 image 找到下面这个,版本可能会不一样,根据配置文件内的版本来 docker.io/flannel/flannel:v0.24.1 #一般是在containers下 containers: - name: kube-flannel 👉 image: docker.io/flannel/flannel:v0.24.1 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr #这里注意kube-flannel.yml这个文件里的flannel的镜像是docker.io/flannel/flannel:v0.24.1 需要提前pull下来
找到这个镜像,然后在
所有机器
上执行docker pull docker.io/flannel/flannel:v0.24.1
配置网卡参数
# 如果Node有多个网卡的话,参考https://github.com/kubernetes/kubernetes/issues/39701 # 目前需要在kube-flannel.yml中使用--iface参数指定集群主机内网网卡的名称,否则可能会出现dns无法解析。容器无法通信的情况。 #需要将kube-flannel.yml下载到本地, # flanneld启动参数加上--iface=<iface-name>
containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.12.0-amd64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr =============================忽视 - --iface=ens33 #这个就是要填入的网卡,我的是ens33,每个人都不一样 =============================忽视 ⚠️⚠️⚠️--iface=ens33 的值,是你当前的网卡,或者可以指定多网卡
# 1.12版本的kubeadm额外给node1节点设置了一个污点(Taint):node.kubernetes.io/not-ready:NoSchedule, # 很容易理解,即如果节点还没有ready之前,是不接受调度的。可是如果Kubernetes的网络插件还没有部署的话,节点是不会进入ready状态的。 # 因此修改以下kube-flannel.yaml的内容,加入对node.kubernetes.io/not-ready:NoSchedule这个污点的容忍:
- key: beta.kubernetes.io/arch operator: In values: - arm64 hostNetwork: true tolerations: - operator: Exists effect: NoSchedule =============================忽视这行线 - key: node.kubernetes.io/not-ready #添加如下三行---在165行左右 operator: Exists effect: NoSchedule =============================忽视这行线 serviceAccountName: flannel
- 启动:
# kubectl apply -f ~/flannel/kube-flannel.yml #启动完成之后需要等待一会
- 查看:
# kubectl get pods --namespace kube-system NAME READY STATUS RESTARTS AGE coredns-78fcd69978-4pwrd 1/1 Running 0 44m coredns-78fcd69978-cdcnh 1/1 Running 0 44m etcd-k8s-master 1/1 Running 0 44m kube-apiserver-k8s-master 1/1 Running 0 44m kube-controller-manager-k8s-master 1/1 Running 0 44m kube-proxy-bwpsl 1/1 Running 0 44m kube-scheduler-k8s-master 1/1 Running 0 44m # kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 45m # kubectl get svc --namespace kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 45m 只有网络插件也安装配置完成之后,才能会显示为ready状态
配置node节点加入集群
注意是node节点
不包括master节点
-
在所有node节点操作,此命令为初始化master成功后返回的结果
也就是之前保存的tokens,
每个人都不一样
kubeadm join 192.168.60.134:6443 --token lcuyzk.xydd3lli4u74a0uc \ --discovery-token-ca-cert-hash sha256:5dd1dabc5c8843adf70384a3775559c453b47b8b53af071a96eae11390f4ec8a
-
如果报错开启ip转发:
sysctl -w net.ipv4.ip_forward=1
在master上操作
各种检测
-
查看
pods
[root@kub-k8s-master ~]# kubectl get pods -n kube-system
-
查看
异常pod信息
[root@kub-k8s-master ~]# kubectl describe pods kube-flannel-ds-sr6tq -n kube-system
-
查看
节点
[root@k8s-master flannel]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane,master 56m v1.22.2 k8s-node-1 Ready <none> 8m12s v1.22.2 k8s-node-2 Ready <none> 7m49s v1.22.2
到此,集群配置完成!!!!!!!!!!!!!!!
错误整理
从网站搜集到的一些问题
错误
问题1:服务器时间不一致会报错
查看服务器时间
=====================================
问题2:kubeadm init不成功,发现如下提示,然后超时报错
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
查看kubelet状态发现如下错误,主机master找不到和镜像下载失败,发现pause镜像是从aliyuncs下载的,其实我已经下载好了官方的pause镜像,按着提示的镜像名称重新给pause镜像打个ali的tag,最后重置kubeadm的环境重新初始化,错误解决
[root@master manifests]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2019-01-31 15:20:32 CST; 5min ago
Docs: https://kubernetes.io/docs/
Main PID: 23908 (kubelet)
Tasks: 19
Memory: 30.8M
CGroup: /system.slice/kubelet.service
└─23908 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --cgroup-driver=cgroupfs --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.432357 23908 kubelet.go:2266] node "master" not found
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.532928 23908 kubelet.go:2266] node "master" not found
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.633192 23908 kubelet.go:2266] node "master" not found
1月 31 15:25:41 master kubelet[23908]: I0131 15:25:41.729296 23908 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.733396 23908 kubelet.go:2266] node "master" not found
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.740110 23908 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1": Error response from daemon: Get https://registry.cn-hangzhou.aliyuncs.com/v2/: dial tcp 0.0.0.80:443: connect: invalid argument
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.740153 23908 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-controller-manager-master_kube-system(e8f43404e60ae844e375d50b1e39d91e)" failed: rpc error: code = Unknown desc = failed pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1": Error response from daemon: Get https://registry.cn-hangzhou.aliyuncs.com/v2/: dial tcp 0.0.0.80:443: connect: invalid argument
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.740166 23908 kuberuntime_manager.go:662] createPodSandbox for pod "kube-controller-manager-master_kube-system(e8f43404e60ae844e375d50b1e39d91e)" failed: rpc error: code = Unknown desc = failed pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1": Error response from daemon: Get https://registry.cn-hangzhou.aliyuncs.com/v2/: dial tcp 0.0.0.80:443: connect: invalid argument
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.740207 23908 pod_workers.go:190] Error syncing pod e8f43404e60ae844e375d50b1e39d91e ("kube-controller-manager-master_kube-system(e8f43404e60ae844e375d50b1e39d91e)"), skipping: failed to "CreatePodSandbox" for "kube-controller-manager-master_kube-system(e8f43404e60ae844e375d50b1e39d91e)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-controller-manager-master_kube-system(e8f43404e60ae844e375d50b1e39d91e)\" failed: rpc error: code = Unknown desc = failed pulling image \"registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1\": Error response from daemon: Get https://registry.cn-hangzhou.aliyuncs.com/v2/: dial tcp 0.0.0.80:443: connect: invalid argument"
1月 31 15:25:41 master kubelet[23908]: E0131 15:25:41.833981 23908 kubelet.go:2266] node "master" not found
解决方式:
重置kubeadm环境
整个集群所有节点(包括master)重置/移除节点
1.驱离k8s-node-1节点上的pod(master上)
[root@kub-k8s-master ~]# kubectl drain kub-k8s-node1 --delete-local-data --force --ignore-daemonsets
2.删除节点(master上)
[root@kub-k8s-master ~]# kubectl delete node kub-k8s-node1
3.重置节点(node上-也就是在被删除的节点上)
[root@kub-k8s-node1 ~]# kubeadm reset
注1:需要把master也驱离、删除、重置,这里给我坑死了,第一次没有驱离和删除master,最后的结果是查看结果一切正常,但coredns死活不能用,搞了整整1天,切勿尝试
注2:master上在reset之后需要删除如下文件
# rm -rf /var/lib/cni/ $HOME/.kube/config
###注意:如果整个k8s集群都做完了,需要重置按照上面步骤操作。如果是在初始化出错只需要操作第三步
重新生成token
kubeadm 生成的token过期后,集群增加节点
通过kubeadm初始化后,都会提供node加入的token:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 192.168.246.166:6443 --token n38l80.y2icehgzsyuzkthi \
--discovery-token-ca-cert-hash sha256:5fb6576ef82b5655dee285e0c93432aee54d38779bc8488c32f5cbbb90874bac
默认token的有效期为24小时,当过期之后,该token就不可用了。
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
解决方法:
1. 重新生成新的token:
[root@kub-k8s-master]# kubeadm token create
kiyfhw.xiacqbch8o8fa8qj
[root@kub-k8s-master]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
gvvqwk.hn56nlsgsv11mik6 <invalid> 2018-10-25T14:16:06+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
kiyfhw.xiacqbch8o8fa8qj 23h 2018-10-27T06:39:24+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
2. 获取ca证书sha256编码hash值:
[root@kub-k8s-master]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
3. 节点加入集群:
kubeadm join 18.16.202.35:6443 --token kiyfhw.xiacqbch8o8fa8qj --discovery-token-ca-cert-hash sha256:5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
几秒钟后,您应该注意到kubectl get nodes在主服务器上运行时输出中的此节点。
上面的方法比较繁琐,一步到位:
[root@kub-k8s-master ~]# kubeadm token create --print-join-command
第二种方法:
[root@kub-k8s-master ~]# token=$(kubeadm token generate)
kubeadm token create $token --print-join-command --ttl=0
然后在node节点执行
[root@kub-k8s-node1 ~]# kubeadm reset
[root@kub-k8s-node1 ~]# 执行生成得token