步步踩坑distribution镜像仓库代理缓存

本文在《学习distribution》之后，梳理一份基础的用于代理远端仓库的基础配置。

省流：

以下对于proxy仓库

proxy之后不接受push，不接受delete api请求

当源端仓库存在镜像，代理仓库获取列表可以得到这些信息

对代理仓库执行镜像hash查看操作会触发拉取缓存

对代理仓库推送源仓库同hash镜像会触发拉取缓存，不会接受push

代理缓存的镜像的过期时间是根据当时配置的ttl决定的，修改后只对新镜像生效

v2.8.3中ttl不生效，v3.0.0-alpha.1解决，v2版本不再更新，所以这个问题暂时无解，只能上v3

imagepullsecret的账号密码需要和auth配置的账号密码一致（对于k8s而言不关心是否会链接代理镜像）

containerd代理逻辑在一个hosts.toml中会尝试所有代理仓库，都失败就会直连源仓库。

以下对于标准仓库（不开启proxy）

可以接受push和delete api请求，但只删除tags，blobs数据保留

使用/bin/registry garbage-collect才能实际删除数据

配置需求

镜像地址在远端
定期清理缓存
优先没有任何中间件服务
需要健康检查
需要暴露prometheus指标
优先外部正式的HTTPS证书

以下记录均为理论

配置明细

version:0.1
log:
  level: debug
  fields:
    service: registry
    environment: development
storage:
  filesystem:
    rootdirectory: /opt/registry
  delete:
    enabled: true    
  cache:
    blobdescriptor: inmemory
    blobdescriptorsize: 10000
    # 如果用外部redis，inmemory -> redis ，删除blobdescriptorsize
  maintenance:
    uploadpurging:
      enabled: true
      age: 168h
      interval: 24h
      dryrun: false  
    readonly:
      enabled: false    
# 如果用外部redis      
# redis:
#   addr: redis—ip:redis-port
#   password: pw
#   pool:
#     maxidle: 16
#     maxactive: 64
#     idletimeout: 300s
#   dialtimeout: 10ms
#   readtimeout: 10ms
#   writetimeout: 10ms  
proxy:
  remoteurl: https://registry-1.docker.io
  username: username
  password: password    
  ttl: 48h  
http:
  addr: 80
  host: http://mirror-registry-1.docker.io
  secret: mirror-registry-1.docker.io
  debug:
    addr: 5001
    prometheus:
        enabled: true
        path: /metrics  
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3    
# 需要配置用户认证，否则容器运行时需要配置忽略权限验证
# auth:
#   htpasswd:
#     realm: basic-realm
#     path: /opt/htpasswd

htpaasswd配置密钥

mkdir /opt/auth/htpasswd
docker run \
  --entrypoint htpasswd \
  httpd:2 -Bbn testuser testpassword > /opt/auth/htpasswd

运行服务

docker

docker run -itd -p 80:80 -p 5001:5001 \
 --restart=always --name registry \
 -v /opt/registry:/opt/registry \
 -v /opt/auth/htpasswd:/opt/auth/htpasswd \
 -v /opt/docker/registry/config.yml:/etc/docker/registry/config.yml \
 registry:2

docker compose

registry:
  restart: always
  image: registry:2
  ports:
    - 80:80
    - 5001:5001
  volumes:
    - /opt/auth/htpasswd:/opt/auth/htpasswd
    - /opt/registry:/opt/registry
    - /opt/docker/registry/config.yml:/etc/docker/registry/config.yml

docker compose up -d

垃圾清理

清理过程中会阻碍上传镜像，理论上作为代理仓库不会有上传情况

docker exec registry /bin/registry garbage-collect /etc/docker/registry/config.yml

指定清理镜像

该步骤需要调用registry的http API
参考链接：
https://distribution.github.io/distribution/spec/api/#deleting-a-layer
https://www.yoyoask.com/?p=2843

镜像加速配置

containerd

参考链接：https://github.com/containerd/containerd/blob/main/docs/hosts.md

官方说明hosts.toml所在的配置目录被更新可以不用重启containerd
但是没有说明是/etc/containerd/certs.d还是/etc/containerd/certs.d/registry-1，更认为是后者

# /etc/containerd/config.toml
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
          [plugins."io.containerd.grpc.v1.cri".registry.configs."mirror-registry-1".auth]
            username = "testuser"
            password = "testpassword"

# /etc/containerd/certs.d/mirror-registry-1/hosts.toml
server = "https://registry-1.docker.io"

[host."https://mirror-registry-1.docker.io"]
  capabilities = ["pull", "resolve"]
  skip_verify = true

systemctl restart containerd

实战配置

containerd > 1.6

配置明细

version: 0.1
log:
  level: debug
  fields:
    service: registry
    environment: development
storage:
  filesystem:
    rootdirectory: /opt/registry/registry-data
  delete:
    enabled: true   
  cache:
    blobdescriptor: inmemory
    blobdescriptorsize: 10000
  maintenance:
    uploadpurging:
      enabled: true
      age: 168h
      interval: 24h
      dryrun: false 
    readonly:
      enabled: false   
proxy:
  remoteurl: https://ccr.ccs.tencentyun.com
  username: xxxx
  password: xxxx
  ttl: 48h 
http:
  addr: :5000
  secret: ccr.ccs.tencentyun.com
  debug:
    addr: :5001
    prometheus:
        enabled: true
        path: /metrics 
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3   
auth:
  htpasswd:
    realm: basic-realm
    path: /opt/registry/auth/htpasswd

运行

htpasswd 采取上文步骤
运行命令采取下面的步骤，config.yml是registry使用的配置文件

docker run -itd -p 5000:5000 -p 5001:5001  --name mharbor  -v /opt/registry/registry-data:/opt/registry/registry-data  -v /opt/registry/auth:/opt/registry/auth  -v /opt/registry/config/config.yml:/etc/docker/registry/config.yml  registry:2

镜像存储校验

docker version 26.1.2

$ docker pull ccr.ccs.tencentyun.com/proxmox/image:py-jdk
# ip:port是个人仓库地址,实际镜像也是个人镜像，此处是举例而已
$ docker tag ccr.ccs.tencentyun.com/proxmox/image:py-jdk 192.168.3.121:5000/proxmox/image:py-jdk
# 安全校验放行，否则无法登录私有仓库报错：server gave HTTP response to HTTPS client 注意检查本身是否有内容
$ echo '{"insecure-registries":["192.168.3.121:5000"]}' > /etc/docker/daemon.json
$ systemctl restart docker 
# 会有正常的推送结果显示的 *** 此处的结果有问题，一定要看最后的对比测试章节
$ docker push 192.168.3.121:5000/proxmox/image:py-jdk
The push refers to repository [192.168.3.121:5000/proxmox/image]
f333522b6436: Layer already exists 
1453d25d153a: Layer already exists 
ace481fa6a3e: Layer already exists 
ea4538b98d90: Layer already exists 
86388e04a96b: Layer already exists 
893507f6057f: Layer already exists 
2353f7120e0e: Layer already exists 
51a9318e6edf: Layer already exists 
c5bb35826823: Layer already exists


# 通过宿主机存储变化判断接受结果
$ du -sh /opt/registry/registry-data
# 得到结果是镜像的大小 个人这里是 400M左右

# 通过HTTP API接口方式
# 该接口指说明当前的镜像项目库，比如 docker.io/python，而非具体到镜像版本
$ curl -X GET 192.168.3.121:5000/v2/_catalog -u testuser:testpassword -v 
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 192.168.3.121...
* TCP_NODELAY set
* Connected to 192.168.3.121 (192.168.3.121) port 5000 (#0)
* Server auth using Basic with user 'testuser'
> GET /v2/_catalog HTTP/1.1
> Host: 192.168.3.121:5000
> Authorization: Basic dGVzdHVzZXI6dGVzdHBhc3N3b3Jk
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< Date: Tue, 14 May 2024 11:18:10 GMT
< Content-Length: 35
< 
{"repositories":["proxmox/image"]}

综上服务已正常启动。

K8s集群节点containerd配置拉取

代理配置

如果只是为了拉取私有仓库的镜像，改动如下。

个人集群用的sealos部署，所以会有sealos.hub这个经典配置。

# /etc/containerd/config.toml
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
          [plugins."io.containerd.grpc.v1.cri".registry.configs."sealos.hub:5000".auth]
            username = "admin"
            password = "passw0rd"
          [plugins."io.containerd.grpc.v1.cri".registry.configs."192.168.3.121:5000".auth]
            username = "testuser"
            password = "testpassword"

# 新增 /etc/containerd/certs.d/192.168.3.121:5000/hosts.toml
# 内容其实和sealos的一样，只是对象变化了
server = "http://192.168.3.121:5000"

[host."http://192.168.3.121:5000"]
  capabilities = ["pull", "resolve", "push"]
  skip_verify = true

执行systemctl restart containerd。

如果config.toml中配置了config_path = "/etc/containerd/certs.d"那么该目录下新增配置都是实时刷新的。
但是config.toml变更是必须要重启containerd。

# 可以拉取
crictl pull 192.168.3.121:5000/proxmox/image:py-jdk
# 不要用ctr i pull 
# 因为[plugins."io.containerd.grpc.v1.cri"]部分特定于 CRI，并且不被其他 Containerd 客户端（例如ctr、nerdctl和 Docker/Moby）识别。参见 https://github.com/containerd/containerd/blob/main/docs/cri/config.md
# 如果使用的话，会依然提示你：http: server gave HTTP response to HTTPS client

此时拉取源镜像地址不会成功，得到报错如下：

这说明当前配置只是支持了下载镜像仓库，但是下载不到源地址仓库。
所以为了支持源镜像仓库下载还需要复制192.168.3.121:5000目录为ccr.ccs.tencentyun.com，将hosts.toml的server的地址改成https://ccr.ccs.tencentyun.com。

此时再拉取镜像：

完成后可以用其他镜像测试，会得到同样结果，此时可以用curl -X GET 192.168.3.121:5000/v2/proxmox/image/tags/list -u testuser:testpassword -v 得到当前镜像库里的清单即可确认得到了缓存，或者宿主机du -sh目录或者遍历看看目录下文件都可以确定。

最终确定：
[plugins."io.containerd.grpc.v1.cri".registry.configs."registry-domain".auth]是为registry-domain提供配置，而非匹配/etc/containerd/certs.d/xxx这个文件夹，最开始看文档一直以为要匹配上，其实根本不需要，所以在实战中删除/etc/containerd/certs.d/192.168.3.121:5000只保留/etc/containerd/certs.d/ccr.ccs.tencentyun.com仍然有效。

深入调整

回顾上文就会发现必需要修改/etc/containerd/config.toml，虽然containerd启动pod是交付给containerd-shim，重启containerd不会影响现有Pod，但尽可能还是不要修改，所以就要尝试换个方案解决。

imagePullSecrets

https://kubernetes.io/zh-cn/docs/tasks/configure-pod-container/pull-image-private-registry/

使用该方法在deployment中引入密钥以告知下载镜像时使用的密钥。

registry="192.168.3.121:5000"
user="testuser"
pw="testpassword"
BASE64_AUTH=`echo -n "$user:$pw" | base64`
echo "{\"auths\": {\"$registry\": {\"auth\": \"$BASE64_AUTH\"}}}" > config.json

指定调度到调整了containerd的节点，且先删除config.toml中[plugins."io.containerd.grpc.v1.cri".registry.configs."192.168.3.121:5000".auth]部分并重启contianerd。

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: my-dep
  name: my-dep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-dep
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: my-dep
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: level
                operator: In
                values:
                - high
      containers:
      - image: ccr.ccs.tencentyun.com/proxmox/image:py-jdk 
        name: nginx
      imagePullSecrets:
      - name: regcred

得到的结果是失败：

修改上面的仓库地址为ccr.ccs.tencentyun.com重新生产config.json和secret。

kubectl create secret generic regcred --from-file=.dockerconfigjson=config.json --type=kubernetes.io/dockerconfigjson --dry-run -oyaml | kubectl apply -f -

kubectl rollout restart deployment my-dep

结果成功！

所以imagePullSecrets生效配置中的镜像仓库源需要和容器使用的image镜像仓库源相同。即使确实会走代理镜像仓库地址，即使配置的账户密码确实是代理镜像仓库的，但就是没有用！！

缓存清理

前文设置了proxy.ttl为48h，此时缩减时间重启registry，挂一个小时，借此确认过期后是自动清理呢还是只是标记为过期呢？先睡觉，次日起来再说。

次日，最开始拉取的镜像时ttl是48h，后删除容器修改配置ttl为1h，重开容器已经过了一个晚上，早上查看时本地数据依然存在，且早上执行registry garbage-collect /etc/docker/registry/config.yml得到结果只是标记没有删除。

而又在使用curl -X GET 192.168.3.121:5000/v2/proxmox/image/tags/list -u testuser:testpassword -v时得到了很大一批量的镜像tag结果，只有几个镜像仍保留在腾讯云私有仓库，其余都是很早以前被删过的。在测试推送镜像到腾讯云仓库后，本地没有调用相关应用部署，实际上再查该接口会发现新镜像也被查询到了。

du -sh本地和遍历目录/opt/registry/registry-data/docker/registry/v2/repositories/proxmox/image/_manifests/tags（需自行调整目录）确认没有拉取缓存到本地。

目录下的三个tag的镜像是测试部署Pod时代理生效性使用的镜像，除此之外的源仓库没有任何镜像。

发现上述的curl结果有历史已删除的tag后推送了golang镜像，但是不触发部署，curl结果包含了golang，且磁盘目录大小/数据没有变化。

大胆尝试：如果本地curl删除会如何？

首先分别获取rust和golang的hash值。

分别执行删除。

都被得到了405 method not allowed！而且此时再检查目录发现自动拉取了golang！

查了一下文档得到下图，即当delete被禁用或对象为拉取式缓存时会无效。

在前一篇《学习distribution》翻译了配置文件中也写了这点，记不住啊，白浪费时间跑对比测试。

回顾前文我们配置了开始delete，所以这是因为这是proxy的缘故吗？

正如官方所述，proxy就是拉取式缓存，有说明「当作为拉取缓存运行时，注册表会定期删除旧内容以节省磁盘空间。对已删除内容的后续请求会导致远程获取和本地重新缓存。」

又在文档的全量配置解析proxy块解释到ttl就固定了规划镜像过期时间，两个文档搭配理解，ttl到期后proxy会自动删除镜像且不是单纯低删除引用依赖人力执行gc的方式。

难道下载镜像的过期时间是在下载时候就被标记了，即使后续改了ttl也是针对新镜像？

对比测试

既然现在的代理仓库没办法执行清理命令那就开对此测试，只需要将proxy块关闭，修改端口、目录即可，操作命令如下。

# 启动
$ docker run -itd -p 4000:4000 -p 4001:4001  --name mharbor2  -v /opt/more-registry/registry-data:/opt/more-registry/registry-data  -v /opt/more-registry/auth:/opt/more-registry/auth  -v /opt/more-registry/config/config.yml:/etc/docker/registry/config.yml  registry:2

# 登录
$ docker login 192.168.3.121:4000 -u testuser -p testpassword

# 推送镜像
$ docker tag golang 192.168.3.121:4000/proxmox/image:golang
$ docker push 192.168.3.121:4000/proxmox/image:golang
The push refers to repository [192.168.3.121:4000/proxmox/image]
5f70bf18a086: Pushed 
f740e99d0e9d: Pushed 
abdd734e88ba: Pushed 
9c70feb70723: Pushed 
8845ab872c1c: Pushed 
d7d4c2f9d26b: Pushed 
bbe1a212f7e9: Pushed

此时发现问题，正常推送确实该报pushed而不是前文的Layer already exists，这时候就想起来当时是已经创建deployment触发了镜像代理后再去跑的docker push，此时我再对192.168.3.121:5000推送仓库里没有镜像则会报错如下：

$ docker push 192.168.3.121:5000/proxmox/image:centos
The push refers to repository [192.168.3.121:5000/proxmox/image]
74ddd0ec08fa: Retrying in 3 seconds

即使我将镜像改名192.168.3.121:5000/push-test/push-test:centos同理。
会一直处理重试状态，但不会成功。所以registry开始proxy后作为拉取式缓存是没有办法推送镜像到被代理来的仓库。

于是又把centos推送到了腾讯云仓库，还是curl -X GET 192.168.3.121:5000/v2/proxmox/image/tags/list -u testuser:testpassword -v,能在结果中找到centos，同时检查文件！

并不会主动拉取源镜像仓库新被推送的镜像，之前执行了curl -I -XGET --header "Accept:application/vnd.docker.distribution.manifest.v2+json" -u testuser:testpassword 192.168.3.121:5000/v2/proxmox/image/manifests/golang或curl -I -X DELETE -u testuser:testpassword -v 192.168.3.121:5000/v2/proxmox/image/manifests/sha256:5ea84c97e29bbbf597f48c8764ab85fdf4d5c348f72b4876ae2af5af6e2767c6后本地就存在了镜像，所以做了以下测试。

# 既然远端有centos镜像，那本地直接推送到本地仓库会怎么样？
docker push 192.168.3.121:5000/proxmox/image:centos

先显示74ddd0ec08fa:Preparing然后就变成了下图。

所以这里就证明了推送源端没有的镜像是不被允许的，但是推送源端存在的镜像就会触发拉取源端镜像，最后提示仓库中Layer already exists。所以上文说的查镜像hash值就应该主动触发下载了。（应该是，懒得侧了，嘻嘻）

回到对比测试的初衷，删除没有开启proxy的192.168.3.121:4000的镜像192.168.3.121:4000/proxmox/image:golang。

curl -I -XGET --header "Accept:application/vnd.docker.distribution.manifest.v2+json"  -u testuser:testpassword  192.168.3.121:4000/v2/proxmox/image/manifests/golang 

curl -I -X DELETE -u testuser:testpassword -v 192.168.3.121:4000/v2/proxmox/image/manifests/sha256:5ea84c97e29bbbf597f48c8764ab85fdf4d5c348f72b4876ae2af5af6e2767c6

然后就得到了202正确处理请求的反馈，标记删除成功了。

但是，这只是标记删除!!
仔细一看就发现tags目录下没有了应该有的标签文件夹了，但是blobs下依然存在数据

执行真正的删除

docker exec -it mharbor2 sh

/bin/registry garbage-collect /etc/docker/registry/config.yml

得到日志（开了debug日志太多，只截取这一行，但还有很多deleting blob日志）, 宿主机上du -sh证明这次是真的删除了。

所以，问题回来最初的起点，proxy镜像仓库的过期镜像数据怎么删除？
查了一下github，首先/opt/registry/registry-data/scheduler-state.json中会记录每个镜像的过期时间，如下：

{
	"proxmox/image@sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd": {
		"Key": "proxmox/image@sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e0d007bd",
		"ExpiryData": "2024-05-22T06:25:22.262331442Z",
		"EntryType": 0
	},
}

就可以发现

这默认计算的168h即7d，配置文件根本没有生效。（调试日期5月15日）
需要给容器指定timezone修改UTC时区。

在issue发现评论v2.x版本不再修复问题，推荐使用v3.0。且关于ttl的issue都是已解决，感觉可能踩到坑里了。

删除容器，选择registry:3.0.0-alpha.1版本，推送一个镜像再试一下，检查scheduler-state.json，得到解决。

{
  	"proxmox/image@sha256:28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b": {
		"Key": "proxmox/image@sha256:28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b",
		"ExpiryData": "2024-05-15T07:35:02.264982083Z",
		"EntryType": 1
	},
}

过期时间终于是以1h计算了，只是v3.0只有测试版本，生产环境可不敢上测试版本，还是提供issue问下吧。

再等一个小时看看是否会自动删除文件吧。

当前目录/opt/registry/registry-data大小：1348784。

1h 之后。

当前目录/opt/registry/registry-data大小：1348828；但是scheduler-state.json中找不到28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b的记录。

垃圾清理

$ docker exec -it mharbor sh
$ registry garbage-collect --dry-run /etc/docker/registry/config.yml
# 梳理删除信息，得到日志说明有可被删除的
57 blobs marked, 1 blobs and 0 manifests eligible for deletion
blob eligible for deletion: sha256:28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b

$ registry garbage-collect /etc/docker/registry/config.yml
# 正式删除，日志会多一条删除日志
57 blobs marked, 1 blobs and 0 manifests eligible for deletion
blob eligible for deletion: sha256:28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b
INFO[0000] Deleting blob: /docker/registry/v2/blobs/sha256/28/28e01ab32c9dbcbaae96cf0d5b472f22e231d9e603811857b295e61197e40a9b  environment=development go.version=go1.19.9 instance.id=a065de88-e1eb-4c60-abfc-06ba71ef0c70 service=registry

posted @ 2024-05-13 12:02 冰豆花阅读(383) 评论(0) 收藏举报

刷新页面返回顶部

冰豆花

步步踩坑distribution镜像仓库代理缓存

配置需求

以下记录均为理论

配置明细

htpaasswd配置密钥

运行服务

docker

docker compose

垃圾清理

指定清理镜像

镜像加速配置

containerd

实战配置

配置明细

运行

镜像存储校验

K8s集群节点containerd配置拉取

代理配置

深入调整

imagePullSecrets

缓存清理

对比测试

公告