「Bug」Istio 的 Sidercar 和 IngressGateway 间歇性地报错:Envoy proxy is NOT ready

Bug 描述

IngressGateway 日志如下:

ryan@RYAN-MI-DESKTOP:~$ kubectl -n istio-system logs istio-ingressgateway-7f85d5ffcc-f4g7l
2020-04-22T01:09:24.691767Z     info    FLAG: --binaryPath="/usr/local/bin/envoy"
2020-04-22T01:09:24.691874Z     info    FLAG: --concurrency="0"
2020-04-22T01:09:24.691889Z     info    FLAG: --configPath="/etc/istio/proxy"
2020-04-22T01:09:24.691904Z     info    FLAG: --connectTimeout="10s"
2020-04-22T01:09:24.692020Z     info    FLAG: --controlPlaneAuthPolicy="NONE"
2020-04-22T01:09:24.692047Z     info    FLAG: --controlPlaneBootstrap="true"
2020-04-22T01:09:24.692054Z     info    FLAG: --customConfigFile=""
2020-04-22T01:09:24.692059Z     info    FLAG: --datadogAgentAddress=""
2020-04-22T01:09:24.692064Z     info    FLAG: --disableInternalTelemetry="false"
2020-04-22T01:09:24.692070Z     info    FLAG: --discoveryAddress="istio-pilot.istio-system.svc:15012"
2020-04-22T01:09:24.692075Z     info    FLAG: --dnsRefreshRate="300s"
2020-04-22T01:09:24.692081Z     info    FLAG: --domain="istio-system.svc.cluster.local"
2020-04-22T01:09:24.692087Z     info    FLAG: --drainDuration="45s"
2020-04-22T01:09:24.692092Z     info    FLAG: --envoyAccessLogService=""
2020-04-22T01:09:24.692096Z     info    FLAG: --envoyMetricsService=""
2020-04-22T01:09:24.692101Z     info    FLAG: --help="false"
2020-04-22T01:09:24.692106Z     info    FLAG: --id=""
2020-04-22T01:09:24.692111Z     info    FLAG: --ip=""
2020-04-22T01:09:24.692116Z     info    FLAG: --lightstepAccessToken=""
2020-04-22T01:09:24.692120Z     info    FLAG: --lightstepAddress=""
2020-04-22T01:09:24.692125Z     info    FLAG: --lightstepCacertPath=""
2020-04-22T01:09:24.692130Z     info    FLAG: --lightstepSecure="false"
2020-04-22T01:09:24.692135Z     info    FLAG: --log_as_json="false"
2020-04-22T01:09:24.692140Z     info    FLAG: --log_caller=""
2020-04-22T01:09:24.692145Z     info    FLAG: --log_output_level="default:info"
2020-04-22T01:09:24.692149Z     info    FLAG: --log_rotate=""
2020-04-22T01:09:24.692155Z     info    FLAG: --log_rotate_max_age="30"
2020-04-22T01:09:24.692160Z     info    FLAG: --log_rotate_max_backups="1000"
2020-04-22T01:09:24.692165Z     info    FLAG: --log_rotate_max_size="104857600"
2020-04-22T01:09:24.692170Z     info    FLAG: --log_stacktrace_level="default:none"
2020-04-22T01:09:24.692181Z     info    FLAG: --log_target="[stdout]"
2020-04-22T01:09:24.692186Z     info    FLAG: --mixerIdentity=""
2020-04-22T01:09:24.692191Z     info    FLAG: --outlierLogPath=""
2020-04-22T01:09:24.692197Z     info    FLAG: --parentShutdownDuration="1m0s"
2020-04-22T01:09:24.692201Z     info    FLAG: --pilotIdentity=""
2020-04-22T01:09:24.692208Z     info    FLAG: --proxyAdminPort="15000"
2020-04-22T01:09:24.692213Z     info    FLAG: --proxyComponentLogLevel="misc:error"
2020-04-22T01:09:24.692218Z     info    FLAG: --proxyLogLevel="warning"
2020-04-22T01:09:24.692224Z     info    FLAG: --serviceCluster="istio-ingressgateway"
2020-04-22T01:09:24.692230Z     info    FLAG: --serviceregistry="Kubernetes"
2020-04-22T01:09:24.692234Z     info    FLAG: --statsdUdpAddress=""
2020-04-22T01:09:24.692240Z     info    FLAG: --statusPort="15020"
2020-04-22T01:09:24.692245Z     info    FLAG: --stsPort="0"
2020-04-22T01:09:24.692250Z     info    FLAG: --templateFile=""
2020-04-22T01:09:24.692255Z     info    FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2020-04-22T01:09:24.692260Z     info    FLAG: --trust-domain="cluster.local"
2020-04-22T01:09:24.692265Z     info    FLAG: --zipkinAddress="zipkin.istio-system:9411"
2020-04-22T01:09:24.692311Z     info    Version 1.5.0-c3c353285578eb68b334fc8766746b754b6b3789-Clean
2020-04-22T01:09:24.693194Z     info    Obtained private IP [100.82.150.181]
2020-04-22T01:09:24.693320Z     info    Proxy role: &model.Proxy{ClusterID:"", Type:"router", IPAddresses:[]string{"100.82.150.181", "100.82.150.181"}, ID:"istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system", Locality:(*envoy_api_v2_core.Locality)(nil), DNSDomain:"istio-system.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), WorkloadLabels:labels.Collection(nil), IstioVersion:(*model.IstioVersion)(nil)}
2020-04-22T01:09:24.693356Z     info    PilotSAN []string(nil)
2020-04-22T01:09:24.693370Z     info    MixerSAN []string(nil)
2020-04-22T01:09:24.695137Z     info    Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: istio-ingressgateway
statNameLength: 189
tracing:
  zipkin:
    address: zipkin.istio-system:9411

2020-04-22T01:09:24.695190Z     info    JWT policy is first-party-jwt
2020-04-22T01:09:24.695693Z     info    Using user-configured CA istio-pilot.istio-system.svc:15012
2020-04-22T01:09:24.695722Z     info    istiod uses self-issued certificate
2020-04-22T01:09:24.695907Z     info    the CA cert of istiod is: -----BEGIN CERTIFICATE-----
MIIC3TCCAcWgAwIBAgIQJgBrERl3Dow3k42JaJOG4TANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIwMDQxOTA2NDgzN1oXDTMwMDQx
......
s6cBajEjOj2eqwrwoSYYvICrQGNyrLEzjez1E4sjuXaDiKYDPS8rC35J62jwEiOn
nYwCjfi+/xwxukIp8BifwrQ=
-----END CERTIFICATE-----

2020-04-22T01:09:24.696850Z     info    parsed scheme: ""
2020-04-22T01:09:24.696895Z     info    scheme "" not registered, fallback to default scheme
2020-04-22T01:09:24.696956Z     info    ccResolverWrapper: sending update to cc: {[{istio-pilot.istio-system.svc:15012  <nil> 0 <nil>}] <nil> <nil>}
2020-04-22T01:09:24.696997Z     info    ClientConn switching balancer to "pick_first"
2020-04-22T01:09:24.697302Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {CONNECTING <nil>}
2020-04-22T01:09:24.736769Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {READY <nil>}
2020-04-22T01:09:24.872608Z     info    Starting gateway SDS
2020-04-22T01:09:24.997481Z     info    sds     SDS gRPC server for workload UDS starts, listening on "/etc/istio/proxy/SDS"

2020-04-22T01:09:24.997920Z     info    sds     SDS gRPC server for ingress gateway controller starts, listening on "/var/run/ingress_gateway/sds"

2020-04-22T01:09:24.998010Z     info    PilotSAN []string{"istio-pilot.istio-system.svc"}
2020-04-22T01:09:24.998190Z     info    Starting proxy agent
2020-04-22T01:09:24.998494Z     info    sds     Start SDS grpc server
2020-04-22T01:09:24.998681Z     info    Opening status port 15020

2020-04-22T01:09:24.998806Z     info    sds     Start SDS grpc server for ingress gateway proxy
2020-04-22T01:09:24.998897Z     info    Received new config, creating new Envoy epoch 0
2020-04-22T01:09:24.998981Z     info    Epoch 0 starting
2020-04-22T01:09:25.044273Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format [Envoy (Epoch 0)] [%Y-%m-%d %T.%e][%t][%l][%n] %v -l warning --component-log-level misc:error]
[Envoy (Epoch 0)] [2020-04-22 01:09:25.199][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-04-22 01:09:25.200][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-04-22T01:09:25.231736Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default new connection
2020-04-22T01:09:25.784256Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[Envoy (Epoch 0)] [2020-04-22 01:09:26.010][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 14, no healthy upstream
[Envoy (Epoch 0)] [2020-04-22 01:09:26.011][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream
2020-04-22T01:09:26.060376Z     info    cache   Root cert has changed, start rotating root cert for SDS clients
2020-04-22T01:09:26.060838Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default pushed key/cert pair to proxy
2020-04-22T01:09:26.060881Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-1 resource:default pushed secret
2020-04-22T01:09:27.765512Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:09:29.766995Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:09:32.764881Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:34.769088Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:36.765021Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:38.767440Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:40.766358Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:42.769452Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:44.765521Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:46.764808Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:48.772710Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:50.768864Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:52.766023Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:54.774756Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:56.773031Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:09:58.765617Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:00.768655Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:02.441087Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA new connection
2020-04-22T01:10:02.441485Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA pushed root cert to proxy
2020-04-22T01:10:02.441513Z     info    sds     node:router~100.82.150.181~istio-ingressgateway-7f85d5ffcc-f4g7l.istio-system~istio-system.svc.cluster.local-2 resource:ROOTCA pushed secret
2020-04-22T01:10:02.765271Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:04.765566Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:06.773683Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:08.770177Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:10.766355Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:12.764962Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:14.775662Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:16.774207Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:18.768873Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:20.776176Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:22.769696Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:24.765587Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:26.776204Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:28.767088Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:30.785644Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:32.765494Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:34.764348Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:36.765735Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:38.764585Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:39.925673Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 3 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2020-04-22T01:10:42.770012Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:44.768852Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:46.767273Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:48.765688Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:50.783984Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:52.916101Z     info    Envoy proxy is NOT ready: Get http://127.0.0.1:15000/stats?usedonly&filter=^(cluster_manager.cds|listener_manager.lds).(update_success|update_rejected)$: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:10:54.254978Z     info    Envoy proxy is ready
2020-04-22T01:10:58.773283Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:00.780969Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:02.769031Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:04.767144Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:06.770441Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:08.771764Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:10.764956Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:12.766110Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:13.864978Z     info    Envoy proxy is ready
2020-04-22T01:11:32.773066Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:34.766067Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:36.766066Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:38.775108Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:40.766046Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:42.770071Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:44.764804Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:11:45.893578Z     info    Envoy proxy is ready
2020-04-22T01:12:12.765846Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:14.769021Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:16.765763Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:18.780407Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:20.766747Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:22.766172Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:24.770132Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:26.773160Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:28.766725Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:30.768249Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:32.097014Z     info    Envoy proxy is ready
2020-04-22T01:12:48.769664Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:50.765486Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:52.767476Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:54.767704Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:56.767033Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:12:58.766685Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:00.768763Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:02.764860Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:04.766701Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:06.781599Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:08.769040Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:13:10.046523Z     info    Envoy proxy is ready
2020-04-22T01:36:29.588987Z     info    transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-04-22T01:36:29.589216Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {CONNECTING <nil>}
2020-04-22T01:36:29.611300Z     info    pickfirstBalancer: HandleSubConnStateChange: 0xc000391d30, {READY <nil>}
[Envoy (Epoch 0)] [2020-04-22 01:37:10.305][26][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
2020-04-22T01:37:14.765362Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:16.764682Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:18.765650Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:20.765993Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:22.764939Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:24.765619Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:26.765554Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:28.766080Z     info    Envoy proxy is NOT ready: failed to get server info: Get http://127.0.0.1:15000/stats?usedonly&filter=server.state: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2020-04-22T01:37:30.031610Z     info    Envoy proxy is ready

IngressGateway 间歇性报错:Envoy proxy is NOT ready,最后因为 Readiness 探针多次失败,被 Kill 掉。

排查流程

首先怀疑是节点性能问题,因为我们通过 nodeSelector 将 Istio 限制到了两个 8c/16G 的网络专用节点上,HPA 的扩容可能导致节点性能问题。

将 HPA 的副本上下限都调成了 2,就是固定两个副本。调整后情况有所改善,不再被 OOMKilled,但是隔几十分钟还是会报一次 Envoy proxy is NOT ready

查看 Pod 信息:

ryan@RYAN-MI-DESKTOP:~$ kubectl -n istio-system describe pod istio-ingressgateway-7f85d5ffcc-f4g7l
Name:         istio-ingressgateway-7f85d5ffcc-f4g7l
Namespace:    istio-system
Priority:     0
# ...... 省略若干信息
    State:          Running
      Started:      Wed, 22 Apr 2020 09:09:24 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    0
      Started:      Wed, 22 Apr 2020 08:27:03 +0800
      Finished:     Wed, 22 Apr 2020 09:09:22 +0800
    Ready:          True
    Restart Count:  3
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
# ...... 省略
Events:
  Type     Reason     Age                   From                   Message
  ----     ------     ----                  ----                   -------
  Warning  Unhealthy  25m (x663 over 119m)  kubelet, 192-168-1-93  Readiness probe failed: Get http://100.82.150.181:15020/healthz/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

从上面的日志上看,Pod 因为长时间 Not Ready 而被重启了三次,此外这个 Pod 还触发过 OOMKilled。

这个问题连续出了一周的时间,一直没找到问题。还好是用于测试开发的集群,勉强也能用。

在 istio 上新提了 issue:IngressGateway's stats api is intermittently abnormal: failed to get server info: Get http://127.0.0.1:15000/stats

根据官方的回应,很可能是因为我们的 virtualservice 等配置过多(2000+),导致 envoy 主线程繁忙。

解决方法是每个名字空间都需要配一个专用 envoy 入口网关,让 envoy 只监听一个命名空间的配置,减小压力。
可这样的话,在最外层还得加一层路由(比如 ingress),将流量路由到各名字空间的 envoy 入口网关去。

开发集群是否有必要做这么多层路由?待观察。

=============
更新:我们升级到 istio 1.5 后,这个问题得到了缓解,基本没再出过 IngressGateway NodeReady 的问题。
并不需要为每个名字空间配一个专用入口网关。

posted @ 2020-05-05 10:39  於清樂  阅读(2107)  评论(0编辑  收藏  举报