argo rollout

argo rollouts

1、简介

基于crd开发的对K8s部署功能的一种补充
1、支持传统架构的蓝绿,金丝雀等部署模式
2、具备金丝雀分析,渐进式交付等功能
3、可以被istio和ingress集成完成复杂的流量管理功能
4、细粒度、加权的流量转移
5、自动回滚和升级
6、人工判断
7、可定制的度量查询和业务 KPI 分析

2、基础概念

2.1 rollout

一个crd,类似于deployment,属于上位替代,实现更为复杂的功能

2.2 Progressive Delivery

渐进式交付,是cicd系统的功能升级,以可控和渐进的方式发布产品更新的过程,从而降低发布风险,通常结合自动化和度量分析来驱动更新的自动升级或回滚,比如通过暴露prom指标来进行观察以决定是否继续升级或回滚,来实现高度自动化

2.3 Deployment Strategies

部署策略
1、Rolling Update
用旧版本代替新版本,也就是我们常说的滚动升级,是deployment的默认策略
2、Recreate
部署之前删除旧版本
3、蓝绿
同时部署两套环境
4、Canary 金丝雀
将固定副本的应用以比例部署新旧版本,新版本被称为金丝雀,可以通过istio等进行流量比例切分实现同时具有两个版本和可控流量,再通过渐进式交付逐渐将版本更新到新版本

3、架构

3.1 rollout controller

crd控制器,实现crd定义的一系列逻辑

3.2 rollout

crd 通deployment可以实现更复杂的部署功能

3.3 ingress

可以集成istio等服务网格,来对服务的流量进行精细化的控制

3.4 AnalysisTemplate 和 AnalysisRun

Analysis 
将rollout连接到prom等指标度量工具,并未这些指标提供一个阈值,可以根据此来判断更新是否成功,以此来决定是否将继续部署或将其回滚
AnalysisTemplate和ClusterAnalysisTemplate
一个是针对某一资源的资源,一个是集群级别的资源,所有集群下的资源都可以使用
主要包含了查询指标的指令,比如promql

3.5 Metric providers

指标度量提供者,一般使用prom

4、安装

5、使用

5.1 基本功能介绍

5.1.1 部署rollout

apiVersion: argoproj.io/v1alpha1 
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 5 # 指定副本数
  strategy: # 指定发布策略
    canary: # 指定为金丝雀模式
      steps: # 指定金丝雀步骤
      - setWeight: 20 # 指定金丝雀版本流量比例
      - pause: {} # 可以理解为暂停发布,进行观察的时间,若为空则认为永久暂停,若初次进行部署,默认使用100%的流量
      - setWeight: 40
      - pause: {duration: 10}
      - setWeight: 60
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}
  revisionHistoryLimit: 2 # 保留几个版本配置,常用来作为回滚
  selector: # 和deploy类似的标签选择机制
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m
kubectl argo rollouts get rollout rollouts-demo --watch # 观察rollout的发布过程

5.1.2 更新

kubectl argo rollouts set image rollouts-demo rollouts-demo=argoproj/rollouts-demo:yellow
# 将按照我们5.1定义的策略进行滚动更新

5.1.3 promote

# 上面的更新例子我们可以看到定义的策略为20%的金丝雀流量完成后会暂停发布,这个命令相当于手动确认后面的发布过程
kubectl argo rollouts promote rollouts-demo

5.1.4 abort

# 和5.3类似,将放弃此次更新,回滚到stable(稳定版本) 也就是旧版本
kubectl argo rollouts abort rollouts-demo

5.2 集成istio

5.2.1 配置rollout

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  strategy: # 指定发布模式
    canary: # 金丝雀设置
      canaryService: rollouts-demo-canary # 金丝雀流量使用的svc
      stableService: rollouts-demo-stable # 旧版本流量使用的svc
      trafficRouting: # 流量路由设置
        istio: # 使用istio
          virtualServices: # 指定vs
          - name: rollouts-demo-vsvc1 
            routes: # http流量配置
            - http-primary
            tlsRoutes: # https/tls 配置
            - port: 443 # 端口
              sniHosts: # 允许匹配哪些域名访问
              - reviews.bookinfo.com
              - localhost
          # 同上面配置
          - name: rollouts-demo-vsvc2
            routes:
              - http-secondary
            tlsRoutes:
              - port: 443
                sniHosts:
                  - reviews.bookinfo.com
                  - localhost
            tcpRoutes:
              - port: 8020

5.2.2 生成vs

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: rollouts-demo-vsvc1 # 我们配置的vs之一
spec:
  gateways:
  - rollouts-demo-gateway # 指定vs使用的gw
  hosts:
  - rollouts-demo-vsvc1.local # 允许哪些域名访问
  http:
  - name: http-primary  # 定义后端svc
    route: # 路由规则(一个列表)
    - destination:
        host: rollouts-demo-stable # svc名称
        port:
          number: 15372
      weight: 100 # 转发到此svc的流量比例
    - destination:  
        host: rollouts-demo-canary # svc名称  
        port: 
          number: 15372
      weight: 0 # 转发到此svc的流量比例
  tls: # tls配置
  - match: 
    - port: 443  
      sniHosts: 
      - reviews.bookinfo.com
      - localhost
    route:
    - destination:
        host: rollouts-demo-stable  
      weight: 100
    - destination:
        host: rollouts-demo-canary  
      weight: 0
  tcp:
  - match:
      - port: 8020 
    route:
    - destination:
        host: rollouts-demo-stable 
      weight: 100
    - destination:
        host: rollouts-demo-canary 
      weight: 0

5.3 测试

可以看到初始流量只有老版本有流量,当我们修改rollout时 istio的配置也会随之更改,以适应流量规则

6、资源详解

6.1 rollout

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout-canary
spec:
  # Number of desired pods.
  # Defaults to 1.
  replicas: 5
  analysis:
    # 成功分析保存的数量,默认为5
    successfulRunHistoryLimit: 10
    # 没有成功分析保存的数量,默认为5
    unsuccessfulRunHistoryLimit: 10
  # 和下面的selector二选一,可以选择直接引用deploy
  workloadRef: 
    apiVersion: apps/v1
    kind: Deployment
    name: rollout-ref-deployment
    # "never": deploy不进行缩减
    # "onsuccess": rollout正常后deploy将被缩减
    # "progressively": 动态进行缩减
    scaleDown: never|onsuccess|progressively
  
  
  
  # 和工作负载一样,主要是选择一个pod,和其二选一
  # 匹配pod模板
  selector:
    matchLabels:
      app: guestbook
  template:
    spec:
      containers:
      - name: guestbook
        image: argoproj/rollouts-demo:blue

  # 新pod running后多久开始视为正常接受流量
  minReadySeconds: 30

  # 要保留的旧的rs副本数,默认为10
  revisionHistoryLimit: 3

  # 是否允许用户使用命令行随时手动暂停.
  paused: true

  # 更新程序的最长时间,默认600s超时则返回失败
  progressDeadlineSeconds: 600

  # 超过deadline的时间是否停止更新默认为否
  progressDeadlineAbort: false

  # 定义一个时间戳 按顺序重启所有pod 并且控制器确保创建时间都大于此值
  restartAt: "2020-03-30T21:19:35Z"

  # 定义回滚版本,默认上一个
  rollbackWindow:
    revisions: 3
  # 发布策略
  strategy:

    # 蓝绿部署
    blueGreen:

      # Reference to service that the rollout modifies as the active service.
      # Required.
      activeService: active-service

      # Pre-promotion analysis run which performs analysis before the service
      # cutover. +optional
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: guestbook-svc.default.svc.cluster.local

      # Post-promotion analysis run which performs analysis after the service
      # cutover. +optional
      postPromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: guestbook-svc.default.svc.cluster.local

      # Name of the service that the rollout modifies as the preview service.
      # +optional
      previewService: preview-service

      # The number of replicas to run under the preview service before the
      # switchover. Once the rollout is resumed the new ReplicaSet will be fully
      # scaled up before the switch occurs +optional
      previewReplicaCount: 1

      # Indicates if the rollout should automatically promote the new ReplicaSet
      # to the active service or enter a paused state. If not specified, the
      # default value is true. +optional
      autoPromotionEnabled: false

      # Automatically promotes the current ReplicaSet to active after the
      # specified pause delay in seconds after the ReplicaSet becomes ready.
      # If omitted, the Rollout enters and remains in a paused state until
      # manually resumed by resetting spec.Paused to false. +optional
      autoPromotionSeconds: 30

      # Adds a delay before scaling down the previous ReplicaSet. If omitted,
      # the Rollout waits 30 seconds before scaling down the previous ReplicaSet.
      # A minimum of 30 seconds is recommended to ensure IP table propagation
      # across the nodes in a cluster.
      scaleDownDelaySeconds: 30

      # Limits the number of old RS that can run at once before getting scaled
      # down. Defaults to nil
      scaleDownDelayRevisionLimit: 2

      # Add a delay in second before scaling down the preview replicaset
      # if update is aborted. 0 means not to scale down. Default is 30 second
      abortScaleDownDelaySeconds: 30

      # Anti Affinity configuration between desired and previous ReplicaSet.
      # Only one must be specified
      antiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution: {}
        preferredDuringSchedulingIgnoredDuringExecution:
          weight: 1 # Between 1 - 100

      # activeMetadata will be merged and updated in-place into the ReplicaSet's spec.template.metadata
      # of the active pods. +optional
      activeMetadata:
        labels:
          role: active

      # Metadata which will be attached to the preview pods only during their preview phase.
      # +optional
      previewMetadata:
        labels:
          role: preview

    # 金丝雀部署
    canary:

      # 金丝雀服务所使用的svc
      canaryService: canary-service

      # 老版本使用的svc
      stableService: stable-service

      # 附加到金丝雀pod的元数据,类似webhook注入
      canaryMetadata:
        annotations:
          role: canary
        labels:
          role: canary

      # 附加到老版本pod的元数据,类似webhook注入
      stableMetadata:
        annotations:
          role: stable
        labels:
          role: stable

      # 更新期间不可用的pod最大数量,若不满足条件会暂停
      maxUnavailable: 1

      # T每次更新的百分比,向下取整
      maxSurge: "20%"

      # 缩减旧的版本rs的延迟时间
      scaleDownDelaySeconds: 30

      # 可选:路由到金丝雀的流量的rs最小pod数,主要是保障高可用,默认为1
      minPodsPerReplicaSet: 2

      # 旧的rs每次可缩减的pod数量
      scaleDownDelayRevisionLimit: 2

      # 分析所用的模板名称和应用名称,相当于使用prom SQL对服务进行查询
      analysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: guestbook-svc.default.svc.cluster.local

        # 提供旧版本或者最后一个版本的podTemplateHashValue
        - name: stable-hash
          valueFrom:
            podTemplateHashValue: Stable
        - name: latest-hash
          valueFrom:
            podTemplateHashValue: Latest

        # 允许引用元数据label给分析
        - name: region
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['region']

      # 定义了金丝雀更新时的步骤列表
      steps:

      # 设置金丝雀比例为20%
      - setWeight: 20

      # 暂停更新,支持单位: s, m, h
      - pause:
          duration: 1h

      # 永久暂停直到手动进行更新
      - pause: {}

      #  将金丝雀比例设置为显示技术而非按流量权重,也就是说值负责扩容pod而流量另算
      - setCanaryScale:
          replicas: 3

      # 通上,将数量改为比例,只负责更新流量比例另算
      - setCanaryScale:
          weight: 25

      # 设置金丝雀的比例匹配金丝雀的权重,默认选项
      - setCanaryScale:
          matchTrafficWeight: true

      # 以下配置仅支持istio
      - setHeaderRoute:
          # 将被创建在spec.strategy.canary.trafficRouting.managedRoutes
          name: "header-route-1"
          # 标头的匹配路由规则
          match:
              # 标头名称
            - headerName: "version"
              # 必须包含一个精确、正则表达式或前缀字段
              headerValue:
                # 完全匹配
                exact: "2"
                # 正则表达式
                regex: "2.0.(.*)"
                # 前缀
                prefix: "2.0"

      # 以下配置仅支持istio 
      - setMirrorRoute:
          # 将被创建到spec.strategy.canary.trafficRouting.managedRoutes
          name: "header-route-1"
          # 路由到金丝雀的百分比
          percentage: 100
          # 匹配规则,若无将会删除该路由,单个块内的所有规则逻辑或,多个规则块逻辑与
          match:
            - method: # 基于请求方法,完全匹配,正则,路径
                exact: "GET"
                regex: "P.*"
                prefix: "POST"
              path: # 基于请求路径,分别为完全匹配,正则,路径
                exact: "/test"
                regex: "/test/.*"
                prefix: "/"
              headers: # 基于请求头,分别为完全匹配,正则,路径
                agent-1b: 
                  exact: "firefox"
                  regex: "firefox2(.*)"
                  prefix: "firefox"

      # 一个分析步骤
      - analysis:
          templates:
          - templateName: success-rate

      # 一个实验步骤
      - experiment:
          duration: 1h
          templates:
          - name: baseline
            specRef: stable
            # optional, creates a service for the experiment if set
            service:
              # optional, service: {} is also acceptable if name is not included
              name: test-service
          - name: canary
            specRef: canary
            # optional, set the weight of traffic routed to this version
            weight: 10
          analyses:
          - name : mann-whitney
            templateName: mann-whitney
            # Metadata which will be attached to the AnalysisRun.
            analysisRunMetadata:
              labels:
                app.service.io/analysisType: smoke-test
              annotations:
                link.argocd.argoproj.io/external-link: http://my-loggin-platform.com/pre-generated-link

      # 反亲和配置,二选一
      antiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution: {}
        preferredDuringSchedulingIgnoredDuringExecution:
          weight: 1 # Between 1 - 100

      # 流量路由指定ingress或者istio
      trafficRouting:
        # 此选项仅支持nginx,控制流量权重
        maxTrafficWeight: 1000
        # 定义了路由列表的创建顺序,按照先后具有优先级
        managedRoutes:
          - name: set-header
          - name: mirror-route
        # istio流量设置
        istio:
          # 配置vs
          virtualService: 
            name: rollout-vsvc  # 必须给定vs名称
            routes:
            - primary # vs中单个路由为可选,多个为必须,需要定义主要和次要
          virtualServices:
          # 配置多个虚拟服务
          - name: rollouts-vsvc1  # 名称
            routes:
              - primary 
          - name: rollouts-vsvc2  
            routes:
              - secondary 
        # NGINX Ingress Controller routing configuration
        nginx:
          # Either stableIngress or stableIngresses must be configured, but not both.
          stableIngress: primary-ingress
          stableIngresses:
            - primary-ingress
            - secondary-ingress
            - tertiary-ingress
          annotationPrefix: customingress.nginx.ingress.kubernetes.io # optional
          additionalIngressAnnotations:   # optional
            canary-by-header: X-Canary
            canary-by-header-value: iwantsit

        # ALB Ingress Controller routing configuration
        alb:
          ingress: ingress  # required
          servicePort: 443  # required
          annotationPrefix: custom.alb.ingress.kubernetes.io # optional

        # Service Mesh Interface routing configuration
        smi:
          rootService: root-svc # optional
          trafficSplitName: rollout-example-traffic-split # optional

      # Add a delay in second before scaling down the canary pods when update
      # is aborted for canary strategy with traffic routing (not applicable for basic canary).
      # 0 means canary pods are not scaled down. Default is 30 seconds.
      abortScaleDownDelaySeconds: 30

status:
  pauseConditions:
  - reason: StepPause
    startTime: 2019-10-00T1234
  - reason: BlueGreenPause
    startTime: 2019-10-00T1234
  - reason: AnalysisRunInconclusive
    startTime: 2019-10-00T1234 

6.1.1 Canary 金丝雀

金丝雀值得时在更新我们的负载时,可以将一部分流量切换到新版本进行观察,待满足条件后再进行滚动更新指导最后完全转换为新版本,在rollout中当我们更改template字段时会触发我们定义的金丝雀规则
# 工作机制
setWeight  字段指定应该发送到金丝雀的流量百分比,
pause  结构指示 rollout 暂停。
当控制器到达 rollout 的  pause  步骤时,
它将向  .status.PauseConditions  字段添加一个  PauseCondition  结构。如果  pause  结构中的  duration  字段被设置,那么 rollout 将不会进展到下一个步骤,直到它等待  duration  字段的值。否则,rollout 将无限期等待,直到 Pause 条件被删除

6.1.1.1 基础模式

基础模式流量会随着副本比例数量增加
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15.4
        ports:
        - containerPort: 80
  minReadySeconds: 30
  revisionHistoryLimit: 3
  strategy:
    canary: 
      maxSurge: "25%"
      maxUnavailable: 0
      steps:
      - setWeight: 10 # 指定金丝雀版本流量百分比
      - pause: # 暂停
          duration: 1h # 1小时 支持秒 默认单位(s) 分钟(m) 不适用于长期保持两个版本
      - setWeight: 20
      - pause: {} # 暂停直到手动执行更新

6.1.1.2 动态金丝雀扩展模式

spec:
  strategy:
    canary:
      steps:
      - setCanaryScale: # 控制规模和权重
          replicas: 3 # 不改变流量权重的情况下显示计算副本
      - setCanaryScale:
          weight: 25 # 不改变流量权重的情况指定权重百分比
      - setCanaryScale:
          matchTrafficWeight: true # 是否开启匹配金丝雀的流量权重
      - setWeight: 90 # 若不开启匹配权重,那么90%的流量将流向金丝雀版本 开启后,后续的setWeight将创建与流量权重匹配的金丝雀副本

6.1.1.3 动态稳定扩展模式

默认情况下当使用金丝雀部署时,旧的版本始终会报错100%的副本,优势在于发布失败后流量可以立即切回而没有启动延迟,此选项可以动态的进行扩容,保持始终只存在这么多pod而不需要承担额外的副本,比如多个应用扩容时导致节点无可用冗余pod数量可用
spec:
  strategy:
    canary:
      dynamicStableScale: true # 开启此模式
      abortScaleDownDelaySeconds: 600 # 指示金丝雀版本权重更改后旧的副本保存时间默认为秒

6.1.1.4 滚动更新模式

和deploy一样,指定每次更新的最大比例和最多有几个pod不可用
  strategy:
    canary: 
      maxSurge: "25%" 
      maxUnavailable: 0

6.1.1.5 analysis

# 可选项,配置分析手段对指标进行分析,若不满足条件则进行回滚
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout-canary
spec:
  replicas: 5 # 副本数,默认为1
  analysis:
    successfulRunHistoryLimit: 10 # 成功分析运行的存储数量
    unsuccessfulRunHistoryLimit: 10 # 不成功的数量,和上面默认都是5

6.1.1.6 antiAffinity

# 反亲和配置

6.1.1.7 canaryService

# 引用一个服务该服务只向金丝雀版本发送流量

6.1.1.8 stableService

# 同上,只允许访问老版本的服务,相当于长期将服务进行隔离

6.1.1.9 maxSurge

# 每次滚动更新的比例默认25%

6.1.1.10 maxUnavailable

# 在更新期间可以不可用的pods的最大数量或百分比默认25%

6.1.1.11 trafficRouting

流量路由设置
posted on 2024-04-12 16:26  要快乐不要emo  阅读(113)  评论(0)    收藏  举报