CloudWatch Application Signals:不改代码就能监控微服务调用链
CloudWatch Application Signals:不改代码就能监控微服务调用链
上周接了个活——老板要求把线上十几个微服务的调用链可视化,还得有 SLO 告警。
传统方案是接 OpenTelemetry SDK,每个服务改代码、加 instrumentation、配 exporter。十几个服务改一遍,光 PR review 就得一周。
然后发现 CloudWatch Application Signals 能免代码侵入搞定这事,用的是 Java/Python/.NET 的自动注入(auto-instrumentation),部署时挂个 sidecar 或 init container 就行。
核心原理
Application Signals 的工作方式:
- 自动注入:通过 Java Agent / Python auto-instrumentation 自动拦截 HTTP/gRPC/数据库调用
- Trace 采集:自动生成 span,记录调用链路、延迟、状态码
- SLI/SLO:基于采集的数据自动计算服务级别指标
- 零代码改动:应用代码不需要任何修改
在 EKS 上开启
最简单的方式是用 CloudWatch Agent Operator:
# 安装 CloudWatch Observability Addon
apiVersion: eks.amazonaws.com/v1alpha1
kind: Addon
metadata:
name: amazon-cloudwatch-observability
spec:
addonName: amazon-cloudwatch-observability
addonVersion: v2.4.0-eksbuild.1
或者用 eksctl:
eksctl create addon \
--cluster my-cluster \
--name amazon-cloudwatch-observability \
--region us-east-1
装完后,给需要监控的 Deployment 加一个 annotation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
spec:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
spec:
containers:
- name: order-service
image: my-registry/order-service:latest
ports:
- containerPort: 8080
重启 Pod 后,Java Agent 自动注入,开始采集 traces。
Python 应用的配置
Python 应用用 inject-python annotation:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-python: "true"
支持 Flask、Django、FastAPI 等主流框架。自动拦截:
- HTTP 出入站请求
- boto3 调用(DynamoDB、S3、SQS 等)
- 数据库连接(MySQL、PostgreSQL)
- Redis/Memcached 调用
查看调用链
开启后在 CloudWatch 控制台 → Application Signals → Service Map 就能看到:
[API Gateway] → [Order Service] → [DynamoDB]
↓
[Payment Service] → [SQS] → [Notification Service]
↓
[Inventory Service] → [RDS PostgreSQL]
每个节点显示:
- 请求量(RPM)
- 平均延迟(P50/P95/P99)
- 错误率
- 可用性
设置 SLO
SLO 是 Application Signals 的杀手锏功能。通过控制台或 API 设置:
import boto3
cw = boto3.client("application-signals", region_name="us-east-1")
# 创建 SLO:order-service 的 P99 延迟不超过 500ms
response = cw.create_service_level_objective(
Name="order-service-latency-slo",
Description="Order service P99 latency < 500ms",
SliConfig={
"SliMetricConfig": {
"KeyAttributes": {
"Type": "Service",
"Name": "order-service",
"Environment": "production"
},
"MetricType": "LATENCY",
"Statistic": "p99",
"PeriodSeconds": 60
},
"MetricThreshold": 500,
"ComparisonOperator": "LessThanThreshold"
},
Goal={
"AttainmentGoal": 99.9, # 99.9% 时间满足
"Interval": {
"RollingInterval": {
"DurationUnit": "DAY",
"Duration": 30 # 30天滚动窗口
}
}
}
)
print(f"SLO created: {response['Slo']['Arn']}")
告警配置
SLO burn rate 告警——当错误预算消耗过快时触发:
# 在 SLO 基础上创建 burn rate 告警
cw_client = boto3.client("cloudwatch", region_name="us-east-1")
cw_client.put_metric_alarm(
AlarmName="order-service-slo-burn-rate",
MetricName="SloAttainment",
Namespace="AWS/ApplicationSignals",
Dimensions=[
{"Name": "SloName", "Value": "order-service-latency-slo"}
],
Statistic="Average",
Period=300,
EvaluationPeriods=3,
Threshold=99.0, # 当 attainment 低于 99% 时告警
ComparisonOperator="LessThanThreshold",
AlarmActions=["arn:aws:sns:us-east-1:123456789012:oncall-team"]
)
踩坑记录
-
Init Container 顺序:如果用了 Istio sidecar,注意 init container 的顺序。OTel agent 要在 Istio proxy 之前注入。
-
内存开销:Java Agent 会额外占用 50-100MB 内存。如果你的 Pod 内存 limit 设得很紧,需要调高。
-
冷启动影响:Java Agent 加载会让冷启动慢 2-3 秒。对 Lambda 不友好,但对长运行的 EKS Pod 影响可忽略。
-
采样率:默认是全量采集。高流量服务建议配采样率,否则 CloudWatch 费用会很高。
env:
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "0.1" # 采样 10%
- 跨账号调用:如果微服务分布在不同 AWS 账号,需要配 X-Ray 跨账号权限才能关联完整链路。
费用估算
- Application Signals 本身免费
- 底层走 CloudWatch Metrics + X-Ray Traces 计费
- 典型场景(10 个服务、日均 100 万请求、10% 采样):约 $30-50/月
- 对比自建 Jaeger/Zipkin + Prometheus + Grafana 的运维成本,省事不少
参考链接
- Application Signals 文档:https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Monitoring-Sections.html
- EKS Addon 安装指南:https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-Enable-EKS.html
- SLO 配置指南:https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-SLOs.html
- OpenTelemetry 集成:https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-Enable-EKS-OTel.html

浙公网安备 33010602011771号