Openshift 3.6 安装

因为有客户需求,所以必须尝试一下,可悲的是手里只有3.7的离线安装文档,加上之前3.11安装因为同事文档写得太好,基本没遇到什么坑,所以就没仔细研究就开始搞了。

结果果然是因为/etc/ansible/host文件写得有问题,遇到一堆问题,记录一下了。

 

1.遇到问题记录

 

  • 镜像不ready

镜像不ready,虽然都pull下来了,可是没仔细看文档,就save -o了文档中的那几个,所以就造成下面的错误,只好重新开始下载

One or more required container images are not available:
                   openshift3/registry-console:v3.6,
                   registry.example.com/openshift3/ose-deployer:v3.6.173.0.130,
                   registry.example.com/openshift3/ose-docker-registry:v3.6.173.0.130,
                   registry.example.com/openshift3/ose-haproxy-router:v3.6.173.0.130,
                   registry.example.com/openshift3/ose-pod:v3.6.173.0.130
               Checked with: skopeo inspect [--tls-verify=false] [--creds=<user>:<pass>] docker://<registry>/<image>
               Default registries searched: registry.example.com, registry.access.redhat.com
               Failed connecting to: registry.example.com, registry.access.redhat.com
  • registry 443端口没配

学3.11安装配了个80以为可以绕信过关,结果就报错了

[root@master ~]# oc logs  registry-console-1-deploy -n default
--> Scaling registry-console-1 to 1
--> Waiting up to 10m0s for pods in rc registry-console-1 to become ready
E1114 13:34:58.912499       1 reflector.go:304] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:509: Failed to watch *api.Pod: Get https://172.30.0.1:443/api/v1/namespaces/default/pods?labelSelector=deployment%3Dregistry-console-1%2Cdeploymentconfig%3Dregistry-console%2Cname%3Dregistry-console&resourceVersion=1981&timeoutSeconds=412&watch=true: dial tcp 172.30.0.1:443: getsockopt: connection refused
  • registry-catalog需要retag一下

pull service-catalog的镜像出问题,这个是个大坑啊,每次一装就需要1个多钟头,类似错误如下

15m        13m        4    kubelet, master.example.com    spec.containers{apiserver}    Normal        Pulling        pulling image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6"
  15m        13m        4    kubelet, master.example.com    spec.containers{apiserver}    Warning        Failed        Failed to pull image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6": rpc error: code = 2 desc = All endpoints blocked.
  15m        13m        6    kubelet, master.example.com    spec.containers{apiserver}    Normal        BackOff        Back-off pulling image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6"
  15m        4m        46    kubelet, master.example.com                    Warning        FailedSync    Error syncing pod
  

解决办法如下:

docker pull registry.example.com/openshift3/registry-console:v3.6.173.0.130
docker tag registry.example.com/openshift3/registry-console:v3.6.173.0.130 registry.example.com/openshift3/registry-console:v3.6

docker push registry.example.com/openshift3/registry-console:v3.6

 

  • 配置了yum但找不到docker

master上安装docker找不到,大家都是配置同样的yum repository,后来只好通过联网方式的subscription-manager注册解决。

  • apiserver的pod虽然启动,但是无法连上,报错信息
curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Unknown error

         通过修改./etc/resolv.conf为

[root@node2 ~]# cat /etc/resolv.conf 
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local example.com
nameserver 192.168.0.105

 

3.6不像3.11有一个Prequrest的check,这个直接安装上来,就需要一直等他是否出错的信息了,所以每次安装很长时间。

host文件的选项可以参考,踩坑必看啊。

https://docs.okd.io/3.6/install_config/install/advanced_install.html#enabling-service-catalog

  • 安装完成没有看到metrics等组件 

安装完成最后的log

TASK [openshift_excluder : Enable openshift excluder] *******************************************************************************************************************
changed: [node1.example.com]
changed: [master.example.com]
changed: [node2.example.com]

PLAY RECAP **************************************************************************************************************************************************************
localhost                  : ok=15   changed=0    unreachable=0    failed=0   
master.example.com         : ok=740  changed=72   unreachable=0    failed=0   
nfs.example.com            : ok=91   changed=3    unreachable=0    failed=0   
node1.example.com          : ok=250  changed=18   unreachable=0    failed=0   
node2.example.com          : ok=250  changed=18   unreachable=0    failed=0   

检查只有这么几个pod,设置的metrics都没有上来,一定是hosts文件出了问题。

[root@master ~]# oc get pods --all-namespaces
NAMESPACE              NAME                       READY     STATUS              RESTARTS   AGE
default                docker-registry-1-x0hlq    1/1       Running             7          2d
default                registry-console-2-p84p6   1/1       Running             2          1d
default                router-10-ttqq9            0/1       MatchNodeSelector   0          1d
default                router-12-rfpxc            1/1       Running             1          1d
kube-service-catalog   apiserver-3ls5x            1/1       Running             1          1d
kube-service-catalog   controller-manager-7zdbc   0/1       CrashLoopBackOff    1          1d

 

[root@master ~]# oc get nodes
NAME                 STATUS    AGE       VERSION
master.example.com   Ready     2d        v1.6.1+5115d708d7
node1.example.com    Ready     2d        v1.6.1+5115d708d7
node2.example.com    Ready     2d        v1.6.1+5115d708d7

 

  •  卸载脚本  
ansible-playbook  /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml;

 

  • DNS无法启动导致atomic-openshift-node.service服务启动失败
Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787479   32772 mount_linux.go:203] Detected OS with systemd
Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787497   32772 docker.go:364] Connecting to docker on unix:///var/run/docker.sock
Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787510   32772 docker.go:384] Start docker client with request timeout=2m0s
Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: W1117 18:55:51.789279   32772 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: F1117 18:55:51.798668   32772 start_node.go:140] could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory
Nov 17 18:55:51 master.example.com systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Nov 17 18:55:51 master.example.com systemd[1]: Failed to start OpenShift Node.
-- Subject: Unit atomic-openshift-node.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit atomic-openshift-node.service has failed.

解决方案,拷贝一个resolv.conf文件

[root@master ansible]# cd /etc/origin/node
[root@master node]# ls
ca.crt            node-dnsmasq.conf  server.key                          system:node:master.example.com.key
node-config.yaml  server.crt         system:node:master.example.com.crt  system:node:master.example.com.kubeconfig
[root@master node]# cp /etc/resolv.conf .

 

  • Router启动失败,经过分析,发现deploy到node2.example.com的时候失败,原因是bind不到443端口
[root@master node]# oc get pods -o wide 
NAME              READY     STATUS             RESTARTS   AGE       IP              NODE
router-1-deploy   0/1       Error              0          30m       10.129.0.14     node2.example.com
router-2-55bpf    1/1       Running            0          5m        192.168.0.104   node1.example.com
router-2-deploy   1/1       Running            0          5m        10.128.0.14     node1.example.com
router-2-dw31q    1/1       Running            0          5m        192.168.0.103   master.example.com
router-2-xn9cp    0/1       CrashLoopBackOff   6          5m        192.168.0.105   node2.example.com
[root@master node]# oc logs router-2-xn9cp  
I1117 12:19:27.665452       1 template.go:246] Starting template router (v3.6.173.0.130)
I1117 12:19:27.679413       1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1936
I1117 12:19:27.700732       1 router.go:240] Router is including routes in all namespaces
E1117 12:19:27.777551       1 ratelimiter.go:52] error reloading router: exit status 1
[ALERT] 320/121927 (45) : Starting frontend public_ssl: cannot bind socket [0.0.0.0:443]

问题分析: registry在node2上也是bind 443端口,估计冲突了,所以修改ansible,删除node2的route属性。

把监控功能上上去,又修改了一把hosts文件,最后安装成功的hosts文件参考如下:

 

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
nfs

[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=openshift-enterprise

osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16
openshift_master_api_port=8443
openshift_master_console_port=8443

openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/exports
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=10Gi
oreg_url=registry.example.com/openshift3/ose-\${component}:\${version}
openshift_docker_additional_registries=registry.example.com
openshift_docker_insecure_registries=registry.example.com
openshift_docker_blocked_registries=registry.access.redhat.com,docker.io
openshift_image_tag=v3.6.173.0.130

openshift_enable_service_catalog=true
openshift_service_catalog_image_prefix=registry.example.com/openshift3/ose-
openshift_service_catalog_image_version=v3.6.173.0.130
ansible_service_broker_image_prefix=registry.example.com/openshift3/ose-
ansible_service_broker_etcd_image_prefix=registry.example.com/rhel7/
template_service_broker_prefix=registry.example.com/openshift3/
oreg_url=registry.example.com/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true
openshift_clock_enabled=true

openshift_metrics_storage_kind=nfs
openshift_metrics_install_metrics=true
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_host=nfs.example.com
openshift_metrics_storage_nfs_directory=/exports
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=10Gi
openshift_metrics_hawkular_hostname=hawkular-metrics.apps.example.com
#openshift_metrics_cassandra_storage_type=emptydir
openshift_metrics_image_prefix=registry.example.com/openshift3/
openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_public_url=https://hawkular-metrics.apps.example.com/hawkular/metrics
openshift_metrics_image_version=v3.6.173.0.130


openshift_template_service_broker_namespaces=['openshift']
template_service_broker_selector={"node": "true"}
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
# Default login account: admin / handhand
openshift_master_htpasswd_users={'admin': '$apr1$gfaL16Jf$c.5LAvg3xNDVQTkk6HpGB1'}


#openshift_repos_enable_testing=true
openshift_disable_check=docker_image_availability,disk_availability,memory_availability,docker_storage

docker_selinux_enabled=false
openshift_docker_options=" --selinux-enabled --insecure-registry 172.30.0.0/16 --log-driver json-file --log-opt max-size=50M --log-opt max-file=3 --insecure-registry registry.example.com --add-registry registry.example.com"
osm_etcd_image=rhel7/etcd
openshift_logging_image_prefix=registry.example.com/openshift3/

openshift_hosted_router_selector='region=infra,router=true'
openshift_master_default_subdomain=app.example.com


# host group for masters
[masters]
master.example.com
# host group for etcd
[etcd]
master.example.com

# host group for nodes, includes region info
[nodes]
master.example.com openshift_node_labels="{'region': 'infra', 'router': 'true', 'zone': 'default'}" openshift_schedulable=true
node1.example.com openshift_node_labels="{'region': 'infra', 'router': 'true', 'zone': 'default'}" openshift_schedulable=true
node2.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true

[nfs]
nfs.example.com

 

安装完成后拿最后的hosts文件又装了一遍,这次终于全部都出来了

[root@master ~]# oc get pods --all-namespaces 
NAMESPACE              NAME                         READY     STATUS    RESTARTS   AGE
default                docker-registry-1-p8p0s      1/1       Running   2          2h
default                registry-console-1-t4bw2     1/1       Running   0          1h
default                router-1-1nnt3               1/1       Running   2          2h
default                router-1-4h8tg               1/1       Running   3          2h
kube-service-catalog   apiserver-z6nmz              1/1       Running   2          1h
kube-service-catalog   controller-manager-d2jgc     1/1       Running   0          1h
openshift-infra        hawkular-cassandra-1-m6r4x   1/1       Running   0          1h
openshift-infra        hawkular-metrics-4j828       1/1       Running   1          1h
openshift-infra        heapster-rgwrw               1/1       Running   6          2h

 

查看pv,pvc

[root@master ~]# oc get pv,pvc
NAME                 CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                    STORAGECLASS   REASON    AGE
pv/registry-volume   10Gi       RWX           Retain          Bound     default/registry-claim                            26m

NAME                 STATUS    VOLUME            CAPACITY   ACCESSMODES   STORAGECLASS   AGE
pvc/registry-claim   Bound     registry-volume   10Gi       RWX                          26m

 

2.批量存镜像脚本

for i in $(docker images |awk '{print $1":"$2}'); do
        imagename=$(echo $i | awk -F '/' {'print $3'} | awk -F ':' {'print $1'});
  #      imagename=$($i |awk -F '/' {'print $3'} | awk -F ':' {'print $1'});
        echo $imagename;
       # echo docker save $1 | gzip -c > /root/images/$imagename.tar.gz;
        docker save $i | gzip -c > /root/images/$imagename.tar.gz;
done;

 

 

3. 镜像放在单独的盘

Virtualbox 添加一个新盘,然后通过

fdisk -l

找到相应的设备,比如/dev/sdb

格式化

echo "n
p
1


w" | fdisk /dev/sdb;

创建vg

pvcreate /dev/sdb1;
vgcreate docker-vg /dev/sdb1;

docker使用docker-vg

vgs;
 
cat <<EOF > /etc/sysconfig/docker-storage-setup
VG=docker-vg
EOF

docker-storage-setup

lvextend -l 100%VG /dev/docker-vg/docker-pool
touch /etc/containers/registries.conf
systemctl start docker
systemctl enable docker

lvs  
getenforce

 

4.ocp.repo文件

[root@master ~]# cat /etc/yum.repos.d/ocp.repo 
[server]
name=server
baseurl=http://192.168.56.103:8080/repo/rhel-7-server-rpms/
enabled=1
gpgcheck=0
[datapath]
name=datapath
baseurl=http://192.168.56.103:8080/repo/rhel-7-fast-datapath-rpms/
enabled=1
gpgcheck=0
[extra]
name=extra
baseurl=http://192.168.56.103:8080/repo/rhel-7-server-extras-rpms/
enabled=1
gpgcheck=0
[ose]
name=ose
baseurl=http://192.168.56.103:8080/repo/rhel-7-server-ose-3.6-rpms/
enabled=1
gpgcheck=0

 

5.主要安装步骤记录

systemctl stop firewalld
systemctl disable firewalld
systemctl mask firewalld
setenforce 0;
sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config

yum clean all
yum repolist

yum install -y docker


yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion vim atomic-openshift-excluder atomic-openshift-docker-excluder lrzsz unzip atomic-openshift-utils;
yum -y install python-setuptools

yum -y update;


ssh-keygen

ssh-copy-id root@master.example.com
ssh-copy-id root@node1.example.com
ssh-copy-id root@node2.example.com

echo "n
p
1


w" | fdisk /dev/sdb;

pvcreate /dev/sdb1;
vgcreate docker-vg /dev/sdb1;

vgs;
 
cat <<EOF > /etc/sysconfig/docker-storage-setup
VG=docker-vg
EOF

docker-storage-setup

lvextend -l 100%VG /dev/docker-vg/docker-pool
touch /etc/containers/registries.conf
systemctl start docker
systemctl enable docker

lvs  
getenforce


yum -y install docker-distribution;
systemctl enable docker-distribution;
systemctl start docker-distribution;

 

 

 

service catalog灰色,technology preview版本一望就知。

 

posted @ 2018-11-15 22:05  ericnie  阅读(2441)  评论(0编辑  收藏  举报