离线安装ocp3.11需要注意的事情

 

检查阶段

运行部署前检查的时候

# ansible-playbook -vv playbooks/prerequisites.yml

需要看看play recap是否全过,如果不过需要定位原因,反复执行

之前在检查阶段,因为node1,node2经常连接不上master(设置为yum源)的repo/base,也就是RHEL7.6的包,暂时解决办法是在repo中分别挂在自己本地的源绕开错误。 

 

 

 部署阶段

# ansible-playbook -vv /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

安装完成后的提示,如果有不成功,解决问题以后反复执行。

 

检查安装

[root@master yum.repos.d]# oc login -u system:admin
Logged into "https://master.example.com:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    kube-public
    kube-system
    management-infra
    openshift
    openshift-console
    openshift-infra
    openshift-logging
    openshift-metrics-server
    openshift-monitoring
    openshift-node
    openshift-sdn
    openshift-web-console

Using project "default".
[root@master yum.repos.d]# oc get nodes
NAME                 STATUS    ROLES     AGE       VERSION
master.example.com   Ready     master    23m       v1.11.0+d4cacc0
node1.example.com    Ready     infra     18m       v1.11.0+d4cacc0
node2.example.com    Ready     compute   18m       v1.11.0+d4cacc0

 

[root@master yum.repos.d]# oc get pods --all-namespaces
NAMESPACE                  NAME                                           READY     STATUS         RESTARTS   AGE
default                    docker-registry-1-9q962                        1/1       Running        0          17m
default                    registry-console-1-4mb7d                       1/1       Running        0          17m
default                    router-1-74pr6                                 1/1       Running        0          17m
kube-system                master-api-master.example.com                  1/1       Running        0          22m
kube-system                master-controllers-master.example.com          1/1       Running        1          22m
kube-system                master-etcd-master.example.com                 1/1       Running        0          22m
openshift-console          console-5896bbb547-df6p2                       1/1       Running        0          15m
openshift-infra            hawkular-cassandra-1-k5bg2                     1/1       Running        0          12m
openshift-infra            hawkular-metrics-6ldrw                         0/1       Pending        0          6m
openshift-infra            hawkular-metrics-858mh                         0/1       Preempting     0          12m
openshift-infra            hawkular-metrics-schema-sd7c5                  0/1       Completed      0          13m
openshift-infra            heapster-tvn6t                                 1/1       Running        0          12m
openshift-logging          logging-es-data-master-4g5tbuou-1-bcnsx        0/2       Pending        0          5m
openshift-logging          logging-es-data-master-4g5tbuou-1-deploy       1/1       Running        0          5m
openshift-logging          logging-fluentd-m5rbg                          1/1       Running        0          6m
openshift-logging          logging-fluentd-m64sn                          1/1       Running        0          6m
openshift-logging          logging-fluentd-nqpz4                          1/1       Running        0          6m
openshift-logging          logging-kibana-1-wpf2t                         2/2       Running        0          7m
openshift-metrics-server   metrics-server-845b478887-vcbkd                0/1       ErrImagePull   0          11m
openshift-monitoring       alertmanager-main-0                            3/3       Running        0          14m
openshift-monitoring       alertmanager-main-1                            3/3       Running        0          14m
openshift-monitoring       alertmanager-main-2                            3/3       Running        0          14m
openshift-monitoring       cluster-monitoring-operator-674969789d-65rxn   1/1       Running        0          16m
openshift-monitoring       grafana-7594d8dd75-cwr6p                       2/2       Running        0          15m
openshift-monitoring       kube-state-metrics-787f69cf4d-xjh76            3/3       Running        0          14m
openshift-monitoring       node-exporter-bwvpv                            2/2       Running        0          14m
openshift-monitoring       node-exporter-hzbb8                            2/2       Running        0          14m
openshift-monitoring       node-exporter-rdzlp                            2/2       Running        0          14m
openshift-monitoring       prometheus-k8s-0                               4/4       Running        1          15m
openshift-monitoring       prometheus-k8s-1                               4/4       Running        1          15m
openshift-monitoring       prometheus-operator-8544897d54-z7249           1/1       Running        0          16m
openshift-node             sync-6xthq                                     1/1       Running        0          20m
openshift-node             sync-rsgz9                                     1/1       Running        0          19m
openshift-node             sync-vsbws                                     1/1       Running        0          19m
openshift-sdn              ovs-5d2dl                                      1/1       Running        0          20m
openshift-sdn              ovs-gd4gw                                      1/1       Running        0          19m
openshift-sdn              ovs-ktpt6                                      1/1       Running        0          19m
openshift-sdn              sdn-dz8kv                                      1/1       Running        0          19m
openshift-sdn              sdn-mhbkg                                      1/1       Running        0          19m
openshift-sdn              sdn-x7tq9                                      1/1       Running        0          20m
openshift-web-console      webconsole-5db89b6cd4-5sm9d                    1/1       Running        2          16m

metrics还出不来

在master节点执行创建admin用户

# htpasswd /etc/origin/master/htpasswd admin

同时赋予admin用户权限

# oc adm policy add-cluster-role-to-user cluster-admin admin

 

在hosts文件中加入

192.168.0.103 master.example.com
192.168.0.104 console.apps.example.com
192.168.0.104 prometheus-k8s-openshift-monitoring.apps.example.com
192.168.0.104 grafana-openshift-monitoring.apps.example.com
192.168.0.104 hawkular-metrics.apps.example.com

 

 访问https://master.example.com:8443,转到cluster console下,可以访问到集群相关的监控信息

 

 

修改错误

  • Metrics

经过定位,metrics启动不了的原因主要是两点:

1.ose-metrics-server的镜像缺失,这个重新导入后解决

2.openshift-monitoring下的node2下的node-exporter-sbddr一直启动出错,经过定位发现是安装了一个gitlab软件造成的端口冲突问题,把gitlab停掉后启动成功

[root@master ~]# oc get pods  -n openshift-monitoring -o wide
NAME                                           READY     STATUS    RESTARTS   AGE       IP              NODE                 NOMINATED NODE
alertmanager-main-0                            3/3       Running   23         21h       10.129.0.69     node1.example.com    <none>
alertmanager-main-1                            3/3       Running   20         21h       10.129.0.66     node1.example.com    <none>
alertmanager-main-2                            3/3       Running   20         21h       10.129.0.68     node1.example.com    <none>
cluster-monitoring-operator-674969789d-65rxn   1/1       Running   10         21h       10.129.0.65     node1.example.com    <none>
grafana-7594d8dd75-cwr6p                       2/2       Running   18         21h       10.129.0.64     node1.example.com    <none>
kube-state-metrics-787f69cf4d-xjh76            3/3       Running   20         21h       10.129.0.71     node1.example.com    <none>
node-exporter-bwvpv                            2/2       Running   8          21h       192.168.0.104   node1.example.com    <none>
node-exporter-hzbb8                            2/2       Running   14         21h       192.168.0.103   master.example.com   <none>
node-exporter-sbddr                            2/2       Running   0          13m       192.168.0.105   node2.example.com    <none>
prometheus-k8s-0                               4/4       Running   22         21h       10.129.0.70     node1.example.com    <none>
prometheus-k8s-1                               4/4       Running   22         21h       10.129.0.67     node1.example.com    <none>
prometheus-operator-8544897d54-z7249           1/1       Running   8          21h       10.129.0.63     node1.example.com    <none>

3.openshift-infra下面的hawkular-metrics-9r5nc pod一直在pending状态,describe一下发现需要1.5G的内存,修改rc hawkular-metrics request为500m,后启动成功

[root@master ~]# oc get pods -n openshift-infra -o wide
NAME                            READY     STATUS      RESTARTS   AGE       IP            NODE                 NOMINATED NODE
hawkular-cassandra-1-k5bg2      1/1       Running     4          21h       10.130.0.42   node2.example.com    <none>
hawkular-metrics-9r5nc          1/1       Running     0          11m       10.129.0.75   node1.example.com    <none>
hawkular-metrics-schema-sd7c5   0/1       Completed   0          21h       10.130.0.3    node2.example.com    <none>
heapster-tvn6t                  1/1       Running     39         21h       10.128.0.53   master.example.com   <none>

终于也能截图展示一下了。

 

  •  EFK

经过定位主要是内存不够问题导致,所以现有的16G机器无法折腾了,看了pod启动命令,一个启动起来居然就要8G.令人发指啊!

 

posted @ 2018-11-04 22:40  ericnie  阅读(1405)  评论(0编辑  收藏  举报