Openshift 4.3环境的离线Operatorhub安装

这几天在客户环境中搞Operatorhub的离线,因为已经安装了OpenShift 4.3的集群,所以目标是只将考核的Service Mesh和Serverless模块安装上去即刻,因为前期工作关系,我曾在离线的4.2环境安装过类似组件,所以稍作准备就出发了,但这几天遇到的问题和坑确实不少,4.3和4.2相比在离线方面有很大的改进,但又埋了另外一些坑,本文算是大致的一个记录。

另外感谢各位前辈及前浪的指引,让我在一片混乱中清晰了思路。

1.制作catalog的镜像

因为网络环境太慢,所以建议大家直接mirror到本地的仓库然后再进行

oc image mirror  registry.redhat.io/openshift4/ose-operator-registry:v4.3 registry.example.com/openshift4/ose-operator-registry

形成本地的catalog镜像

oc adm catalog build --appregistry-org redhat-operators  --from=registry.example.com/openshift4/ose-operator-registry:v4.3 --to=registry.example.com/olm/redhat-operators:v1 --insecure

形成要mirror下载的镜像文件

oc adm  catalog mirror  --manifests-only   registry.example.com/olm/redhat-operators:v1  registry.example.com  --insecure 

形成的目录结构如下

[root@registry test]# tree redhat-operators-manifests/
redhat-operators-manifests/
├── imageContentSourcePolicy.yaml
└── mapping.txt

打开mapping.txt文件看一下

registry.redhat.io/openshift-service-mesh/istio-rhel8-operator:1.0.5=registry.example.com/openshift-service-mesh/istio-rhel8-operator:1.0.5
registry.redhat.io/openshift-service-mesh/3scale-istio-adapter-rhel8@sha256:00fb544a95b16c652cc571396679c65d5889b2cfe6f1a0176f560a1678309a35=registry.example.com/openshift-service-mesh/3scale-istio-adapter-rhel8
registry.redhat.io/container-native-virtualization/kubevirt-kvm-info-nfd-plugin@sha256:bb120df34c6eef21431a074f11a1aab80e019621e86b3ffef4d10d24cb64d2df=registry.example.com/container-native-virtualization/kubevirt-kvm-info-nfd-plugin

基本上全是安装operator需要的sha256码的镜像,以及和本地register server的对应关系了。

最好的做法是基于下面的语句把所有的镜像都下载下来,但因为我们只需要两个模块,所以采用了手工的模式。(这也就注定了大量的工作时间和反复的镜像导入)

oc apply -f ./redhat-operators-manifests

 上面命令是官方的做法,下午验证了一下,发现需要具备集群环境,我自己写了一个脚本进行批量的下载,首先可以缩减需要下载的镜像,按照命名空间,然后再通过脚本批量mirror

[root@registry redhat-operators-manifests]# cat batchmirror.sh 
#!/bin/bash
i=0
while IFS= read -r line
do
  i=$((i + 1))
  echo $i;
  source=$(echo $line | cut -d'=' -f 1)
  echo $source
  target=$(echo $line | cut -d'=' -f 2)
  echo $target
  skopeo copy --all docker://$source docker://$target
  sleep 20
done < eventing.txt

 

2.形成离线的Operatorhub Catalog.

这个步骤比较容易。主要是

oc patch OperatorHub cluster --type json \
    -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'

然后建立一个文件catalogsource.yaml

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: my-operator-catalog
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: registry.example.com/olm/redhat-operators:v1
  displayName: My Operator Catalog
  publisher: grpc

建立完成后检查,operatorhub界面里面应该有所有红帽的镜像

oc create -f catalogsource.yaml

oc get pods -n openshift-marketplace
oc get catalogsource -n openshift-marketplace
oc describe catalogsource internal-mirrored-operatorhub-catalog -n openshift-marketplace

 

3.基于模块下载Operator及组件镜像

到了这一步就满满的坑了,先安装一个ElasticSearch Operator,然后发现Image Pull Error,再mapping中找到具体的sha256码,比如

registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:aa0c7b11a655454c5ac6cbc772bc16e51ca5004eedccf03c52971e8228832370

按照4.2的做法,只是需要运行

oc image mirror registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f registry.example.com/openshift4/ose-elasticsearch-operator

然后运行成功后,为了验证,需要在本地拉取一下

podman pull registry.example.com/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f 

你会发现根本拉不下来,据说这是因为在4.3中某些镜像属于多层的sh256码,而解决办法是

skopeo copy --all docker://registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f docker://registry.example.com/openshift4/ose-elasticsearch-operator

 

然后将registry的存放地址打成tar包,在离线环境解开就可。

因为大部分的operator的镜像都是sha256模式,所以需要一个一个的skopeo。此处消耗大量时间。

 

4. sample-registres.conf文件

这个文件的目的是为了将源地址和目标地址进行映射,并且让ocp的crio知道如何去下载源地址的镜像。

unqualified-search-registries = ["docker.io"]

[[registry]]
  location = "quay.io/openshift-release-dev/ocp-release"
  insecure = false
  blocked = false
  mirror-by-digest-only = false
  prefix = ""

  [[registry.mirror]]
    location = "YOUR_REGISTRY_URL/ocp4/openshift4"
    insecure = false

[[registry]]
  location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
  insecure = false
  blocked = false
  mirror-by-digest-only = false
  prefix = ""

  [[registry.mirror]]
    location = "YOUR_REGISTRY_URL/ocp4/openshift4"
    insecure = false

[[registry]]
  location = "registry.redhat.io/distributed-tracing"
  insecure = false
  blocked = false
  mirror-by-digest-only = false
  prefix = ""

  [[registry.mirror]]
    location = "YOUR_REGISTRY_URL/distributed-tracing"
    insecure = false

[[registry]]
  location = "registry.redhat.io/openshift-service-mesh"
  insecure = false
  blocked = false
  mirror-by-digest-only = false
  prefix = ""

  [[registry.mirror]]
    location = "YOUR_REGISTRY_URL/openshift-service-mesh"
    insecure = false

[[registry]]
  location = "registry.redhat.io/openshift4"
  insecure = false
  blocked = false
  mirror-by-digest-only = false
  prefix = ""

  [[registry.mirror]]
    location = "YOUR_REGISTRY_URL/openshift4"
    insecure = false

而这个配置需要刷到集群的每台机器上去,这个刷机的动作是由machine-config这个cluster operator完成的,正常步骤是

创建一个machineconfig.yaml,然后运行刷机。。。。

cat sample-registries.conf | base64 | tr -d '\n'

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  annotations:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-worker-container-registry-conf
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,${YOUR_FILE_CONTENT_IN_BASE64}
          verification: {}
        filesystem: root
        mode: 420
        path: /etc/containers/registries.conf

oc apply -f machineconfig.yaml

然后当前集群的machine-config的Cluster Operator的状态为false,尝试修复未果,心生一计,直接将这个sample-registres.conf覆盖每一台机器的registries.conf,覆盖完成记得重新启动crio

systemctl restart crio

如果不放心,可以直接在node上运行,如果正常,应该可以出来。

podman pull registry.redhat.io/.....@sha256....

 

5. Knative

一切安装就绪,在尝试helloworld-go的时候,又出现了X509的问题,找了半天,发现是一个已知问题,之前一直在aws公有云上尝试,所以没遇到,但如果将例子程序放在本地的镜像仓库中就必现了,

客官可见: https://github.com/knative/serving/issues/5126

解决办法也很野蛮,直接在configmap中跳过tag解析,(下面代码仅作参考,我是基于图形界面修改的)

oc -n knative-serving edit configmap config-deployment
 apiVersion: v1
 data:
   queueSidecarImage: gcr.azk8s.cn/knative-releases/knative.dev/serving/cmd/queue@sha256:5ff357b66622c98f24c56bba0a866be5e097306b83c5e6c41c28b6e87ec64c7c
   registriesSkippingTagResolving: registry.example.com

一切正常后,发现event的source的创建方式变了,cronjobsource已经deprecated,不让创建,只好通过下面命令

$ oc get inmemorychannel

NAME         READY   REASON   URL                                                      AGE
imc-msgtxr   True             http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local   24s

kn source ping create msgtxr-pingsource \
--schedule="* * * * *" \
--data="This message is from PingSource" \
--sink=http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local

创建完成后终于一切正常,而我也终于有机会苟延残喘,记录一下。 :(

 

posted @ 2020-05-21 22:44  ericnie  阅读(1684)  评论(0编辑  收藏  举报