vault部署
-
机器分布
DNS Name | IP Address | Node Type | Location |
vault-1.domain | 10.0.0.1 | voter | zone1 |
vault-2.domain | 10.0.0.2 | voter | zone2 |
vault-3.domain | 10.0.0.3 | voter | zone3 |
vault-4.domain | 10.0.0.4 | non-voter | zone1 |
vault-5.domain | 10.0.0.5 | non-voter | zone2 |
vault-6.domain | 10.0.0.6 | non-voter | zone3 |
vault.domain | 10.0.0.100 | (load balancer) | (all zones) |
- 需要一个aws的aksk拥有kms和S3的读写权限(可以控制到具体的bucket和kms key)
- 使用脚本时需要修改脚本中
CLUSTER_NAME, CLUSTER_IPS
和awskms
相关信息 - 必须先初始化一个leader节点,然后其他的才能retry_join
-
自签证书
- 自签CA证书,有效期30年(ca.key、ca.crt)
- 生成节点证书,有效期30年
- 证书(
vault-public.pub
)。 - 证书的私钥(
vault-private.key
)。 - 用于销售证书的证书颁发机构的捆绑文件(
ca.pub
)。
-
部署二进制及system service
部署(使用op用户启动)
# useradd --system --user-group --shell /bin/false vault # vault整体部署在/data0/vault下,ssl目录下包含证书及license,config下包含配置信息,data存储数据 mkdir -p /data0/vault/config /data0/vault/ssl /data0/vault/data /data0/vault/log chown -R op:op /data0/vault curl -O https://releases.hashicorp.com/vault/1.20.0+ent/vault_1.20.0+ent_linux_amd64.zip unzip vault_1.20.0+ent_linux_amd64.zip mv vault /usr/bin/ chown op:op /usr/bin/vault # 部署证书到/data0/vault/ssl
System service
cat << EOF >> /etc/systemd/system/vault.service [Unit] Description="HashiCorp Vault - A tool for managing secrets" Documentation=https://developer.hashicorp.com/vault/docs Requires=network-online.target After=network-online.target ConditionFileNotEmpty=/data0/vault/config/config.hcl StartLimitIntervalSec=60 StartLimitBurst=3 [Service] Type=notify EnvironmentFile=/data0/vault/config/vault.env User=op Group=op ProtectSystem=full ProtectHome=read-only PrivateTmp=yes PrivateDevices=yes SecureBits=keep-caps AmbientCapabilities=CAP_IPC_LOCK CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK NoNewPrivileges=yes ExecStart=/usr/bin/vault server -config=/data0/vault/config/config.hcl ExecReload=/bin/kill --signal HUP $MAINPID KillMode=process KillSignal=SIGINT Restart=on-failure RestartSec=5 TimeoutStopSec=30 LimitNOFILE=65536 LimitMEMLOCK=infinity LimitCORE=0 [Install] WantedBy=multi-user.targetEOF EOF # systemctl daemon-reload # systemctl enable vault
-
节点配置
- 投票节点(非leader,第一次搭建时leader节点不需要retry_join块,非投票节点也用这个配置,一个zone只有一个节点会成为投票节点)
# config.hcl for vault1 (voter node example) api_addr = "https://10.0.0.1:8200" cluster_addr = "https://10.0.0.1:8201" ui = true disable_mlock = true cluster_name = "my-test" # log level: trace、debug、info、warn、error log_level = "debug" log_file = "/data0/vault/log/vault.log" log_rotate_duration = "24h" # 10M log_rotate_bytes = 10485760 log_rotate_max_files = 10 license_path = "/data0/vault/ssl/vault.hclic" listener "tcp" { address = "10.0.0.1:8200" cluster_address = "10.0.0.1:8201" # tls_disable = true tls_client_ca_file = "/data0/vault/ssl/ca.pub" tls_cert_file = "/data0/vault/ssl/vault-public.pub" tls_key_file = "/data0/vault/ssl/vault-private.key" tls_min_version = "tls12" telemetry { unauthenticated_metrics_access = true } } storage "raft" { path = "/data0/vault/data" # 集群内唯一 node_id = "vault1" # 一个zone内只能有一个投票节点 autopilot_redundancy_zone = "zone1" # 以非投票节点加入,只有节点第一次启动时生效,后续这个信息会被存储在 raft 集群的状态里(写入 Raft 日志并复制到所有节点),而不是配置文件里 # retry_join_as_non_voter = true retry_join { leader_api_addr = "https://10.0.0.1:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } retry_join { leader_api_addr = "https://10.0.0.2:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } retry_join { leader_api_addr = "https://10.0.0.3:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } retry_join { leader_api_addr = "https://10.0.0.4:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } retry_join { leader_api_addr = "https://10.0.0.5:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } retry_join { leader_api_addr = "https://10.0.0.6:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "vault.domain" } } # 测试验证阶段配置,生产改为false profiling { unauthenticated_pprof_access = true unauthenticated_in_flight_request_access = true } adaptive_overload_protection { disable_write_controller = false } seal "awskms" { region = "us-east-1" # 替换成正确的aksk access_key = "xxxxxxxxxxx" secret_key = "xxxxxxxxxxxx" kms_key_id = "xxxxxxxxxxxx" endpoint = "https://xxxxxxxx.kms.us-east-1.vpce.amazonaws.com" } telemetry { disable_hostname = true prometheus_retention_time = "6h" unauthenticated_pprof_access = true }
-
初始化(在leader节点执行)
启动vault执行初始化# 启动vault systemctl start vault # 设置vault cli使用的环境变量,不能用ip,要用node的domain,因为证书里签的域名 echo "${node_ip} ${node_name}.vault.domain" >> /etc/hosts export VAULT_ADDR=https://${node_name}.vault.domain:8200 export VAULT_CACERT=/data0/vault/ssl/ca.pub # 指向你配置文件里 tls_client_ca_file # 执行初始化,这里会生成一个root token,root token要记下来 vault operator init # 解封 #vault operator unseal # 查看状态 vault status # 重启vault # sudo systemctl restart vault # 使用 root token 登录并执行后续操作 vault login ${root_token} # 开启audit log vault audit enable file file_path=/data0/vault/log/audit.log # 查看 audit log状态 vault audit list
查看vault状态
# 查看vault状态 vault status # 查看集群节点列表 vault operator raft list-peers # 查看节点投票状态 vault operator raft autopilot state
-
自动快照配置
-
aws自动快照
# aws-snapshot.json { "storage_type": "aws-s3", "file_prefix": "paris", "interval": "8h", "retain": 30, "local_max_space": 2621440000, "path_prefix": "primary", "aws_s3_bucket": "vault-snapshots", "aws_s3_region": "eu-west-3", "aws_access_key_id": "ASI...COFFEE", "aws_secret_access_key": "wJalr...COFFEEKEY", # 一般不需要 #"aws_session_token": "IQoJb3JpZ2luX2IQ...COFFEE", "aws_s3_server_side_encryption": "true" } # start auto snapshot vault write sys/storage/raft/snapshot-auto/config/paris-primary @aws-snapshot.json # 查看状态 vault read sys/storage/raft/snapshot-auto/config/paris-primary # 或者 vault list sys/storage/raft/snapshot-auto/config
-
手动全量快照
# 设置vault cli使用的环境变量,不能用ip,要用node的domain,因为证书里签的域名 echo "${node_ip} ${node_name}.vault.domain" >> /etc/hosts export VAULT_ADDR=https://${node_name}.vault.domain:8200 export VAULT_CACERT=/data0/vault/ssl/ca.pub # 指向你配置文件里 tls_client_ca_file # 执行导出快照 vault operator raft snapshot save ${file_name}.snapshot
-
-
启动其他非leader节点服务
-
LB 配置
需要一个L7的NLB做负载均衡,LB健康检查参考:https://developer.hashicorp.com/vault/api-docs/system/health
----------------------------------end--------------------------------------------------------------
-
一些配置及操作参考
- aws自动解封配置:
seal "awskms" { region = "us-east-1" access_key = "xxxxxxx" secret_key = "xxxxxxxxx" kms_key_id = "xxxxxxxxxx" endpoint = "https://xxxxxxx.amazonaws.com" }
- 阿里云自动解封配置:
seal "alicloudkms" { region = "us-east-1" access_key = "xxxxxxx" secret_key = "xxxxxxxxxxxx" kms_key_id = "xxxxxxxxxx" domain = "kms-vpc-address" }
- 监控配置:
# 设置了unauthenticated_metrics_access = true,测试获取指标
curl -k https://${ip}:8200/v1/sys/metrics
# 没设置unauthenticated_metrics_access = true,测试获取指标,token生成参见下文
curl -k --header "X-Vault-Token: ${token}" https://${ip}:8200/v1/sys/metrics
# 查看token情况 vault token lookup | grep policies # 创建Prometheus策略 vault policy write prometheus-metrics - << EOF path "/sys/metrics" { capabilities = ["read", "list"] } EOF #创建Prometheus使用的token vault token create \ -field=token \ -policy prometheus-metrics -display-name prometheus-metrics \ > /data0/vault/config/prometheus-token # Prometheus配置 cat > /data0/prometheus/vault-prometheus.yml << EOF scrape_configs: - job_name: vault metrics_path: /v1/sys/metrics params: format: ['prometheus'] scheme: http tls_config: insecure_skip_verify: true # 或者配置 CA authorization: credentials: "<Vault token>" # credentials_file: /etc/prometheus/prometheus-token static_configs: - targets: ['10.0.0.6:8200'] EOF
- 查看autopilot状态
vault operator raft autopilot state # 手动加节点到集群,在待加入节点上执行 export VAULT_ADDR="http://${待加入节点IP}:8200" vault operator raft join http://${leader节点IP}:8200
- 将节点修改为投票/非投票节点
# 将节点修改为投票节点, 不能用 # vault operator raft promote <node_id> # 将节点修改为非投票节点,不能用 # vault operator raft demote <node_id> # 在leader节点删除节点 vault operator raft remove-peer ${node_id} # 在要删除的节点停止服务,删除数据,启动服务 systemctl stop vault rm -rf /data0/vault/data mkdir /data0/vault/data && chown op:op /data0/vault/data systemctl start vault
- 扩容
1. 在新节点安装 Vault
保证新节点和现有节点版本一致(Vault、OS、TLS 支持)。
配置好 Vault 二进制和数据目录,比如 /usr/bin/vault 和 /data0/vault/data。
2. 配置 Vault(server.hcl)跟之前部署类似
新节点需要一个配置文件,例如 /data0/vault/config/config.hcl,证书,node_id要唯一
3. 启动新节点
注意:新节点 无需初始化或 unseal,它会从现有集群同步状态
systemctl start vault
- 缩容
1.删除节点 时要先在 leader 上执行:
vault operator raft remove-peer <node_id>
然后停止服务
systemctl stop vault
- 当无法恢复足够多机器来完成选举时如何恢复集群
- 在存储目录(
/data0/vault/data
)内,有一个名为的文件夹raft
vault
└── data
├── raft
│ ├── raft.db
│ └── snapshots
└── vault.db
为了使唯一剩余的 Vault 服务器达到法定人数并选举自己为领导者,需创建一个
raft/peers.json
包含服务器信息的文件。该文件格式为 JSON 数组,其中包含运行状况良好的 Vault 服务器的服务器 ID、地址:端口和选举权信息(例如test-node1
)。cat > /data0/vault/data/raft/peers.json << EOF
[
{
"id": "vault_1",
"address": "10.0.0.6:8201",
"non_voter": false
}
]
EOF
id (string: <required>)
- 指定服务器的服务器 ID。address (string: <required>)
- 指定服务器的主机和端口。端口是服务器的集群端口。non_voter (bool: <false>)
- 这控制服务器是否为非投票者。- 重新启动 Vault 进程以使 Vault 能够加载新
peers.json
文件
sudo systemctl restart vault
- 解封,如果未配置使用自动解封,请解封 Vault,然后检查状态,咱们配置了自动解封,应该不需要手动解封
vault operator unseal
- 查看状态
vault operator raft list-peers
- 其他节点恢复后重新加入集群或扩容新节点到集群
脚本
生成ssl 证书
#!/usr/bin/env bash set -euo pipefail # ========= 可配置参数 ========= DAYS_CA=10950 # 根证书有效期 (30年) DAYS_CERT=10950 CA_KEY="ca.key" CA_CRT="ca.pub" CA_SUBJECT="/C=CN/ST=GLOBAL/L=GLOBAL/O=MyOrg/OU=IT/CN=Vault-Root-CA" VAULT_KEY="vault-private.key" VAULT_CSR="vault.csr" VAULT_CRT="vault-public.pub" VAULT_SUBJECT="/C=CN/ST=GLOBAL/L=GLOBAL/O=MyOrg/OU=IT/CN=Vault-Root-CA" CONFIG="openssl-san.cnf" # 填写你的域名和IP(可选) DNS_NAME="xxx-vault.com" #IP_LIST=("10.0.0.5" "10.0.0.6") # ========= 生成 OpenSSL SAN 配置 ========= cat > $CONFIG <<EOF [ req ] default_bits = 2048 distinguished_name = req_distinguished_name req_extensions = req_ext prompt = no [ req_distinguished_name ] CN = ${DNS_NAME} [ req_ext ] subjectAltName = @alt_names [ alt_names ] DNS.1 = ${DNS_NAME} DNS.2 = xxxx.net DNS.3 = yyyy.com DNS.4 = *.${DNS_NAME} DNS.5 = *.xxxx.net DNS.6 = *.yyyy.com EOF # 动态添加多个 IP #i=1 #for ip in "${IP_LIST[@]}"; do # echo "IP.${i} = ${ip}" >> $CONFIG # ((i++)) #done # ========= 生成 CA 根证书 ========= echo "==> 生成 CA 根证书" openssl genrsa -out "${CA_KEY}" 4096 openssl req -x509 -new -nodes -key "${CA_KEY}" -sha256 -days "$DAYS_CA" -subj "${CA_SUBJECT}" -out "${CA_CRT}" # ========= 生成 Vault 服务端私钥 & CSR ========= echo "==> 生成 Vault 服务端私钥 & CSR" openssl genrsa -out "${VAULT_KEY}" 2048 openssl req -new -key "${VAULT_KEY}" -out "${VAULT_CSR}" -subj "${VAULT_SUBJECT}" -config "${CONFIG}" # ========= 用 CA 签发 Vault 服务端证书 ========= echo "==> 使用 CA 签发 Vault 服务器证书" openssl x509 -req -in "${VAULT_CSR}" -CA "${CA_CRT}" -CAkey "${CA_KEY}" -CAcreateserial \ -out "${VAULT_CRT}" -days "$DAYS_CERT" -sha256 -extensions req_ext -extfile "${CONFIG}" mkdir -p /data0/vault/config /data0/vault/ssl /data0/vault/data /data0/vault/log chown -R op:op /data0/vault cp ca.pub vault-private.key vault-public.pub /data0/vault/ssl/ chown op:op /data0/vault/ssl/* # cp ${CA_CRT} ${VAULT_CRT} ${VAULT_KEY} /data0/vault/ssl echo "✅ 生成完成!文件列表:" echo " - CA 根证书: ${CA_CRT}" echo " - Vault 证书: ${VAULT_CRT}" echo " - Vault 私钥: ${VAULT_KEY}" echo "" echo "📌 提示:将 ${CA_CRT} 分发到所有 Vault 节点的 trust store(或者 VAULT_CACERT 环境变量)," echo " 将 ${VAULT_CRT} 和 ${VAULT_KEY} 配置到 Vault 服务端 tls_cert_file / tls_key_file。"
部署脚本
#!/usr/bin/env python3 # deploy_cluster_vault.py # 用 Python 一键部署 Vault 节点,包含目录创建、权限设置、Vault 二进制下载、配置文件生成、systemd unit 文件生成 import argparse import os import socket import subprocess CLUSTER_NAME = "my-test" # 同一个zone只能有一个投票节点,如果机器不够3个zone,同一个zone的机器zone值可以写不同的,保证有3个投票节点 CLUSTER_IPS = { "10.0.0.1": {"zone": "zone-a", "voter": True, "node_id": "my-test-node1", "leader": True}, "10.0.0.2": {"zone": "zone-c", "voter": True, "node_id": "my-test-node2", "leader": False}, "10.0.0.3": {"zone": "zone-b", "voter": True, "node_id": "my-test-node3", "leader": False}, "10.0.0.4": {"zone": "zone-a", "voter": False, "node_id": "my-test-node4", "leader": False}, } TLS_SERVERNAME = "vault.domain" VAULT_VERSION = "1.20.0+ent" VAULT_ZIP = f"vault_{VAULT_VERSION}_linux_amd64.zip" VAULT_URL = f"https://releases.hashicorp.com/vault/{VAULT_VERSION}/{VAULT_ZIP}" VAULT_BIN = "/usr/bin/vault" VAULT_USER = "op" VAULT_GROUP = "op" CFG_FILE="/data0/vault/config/config.hcl" def run_cmd(cmd, check=True): print(f"▶️ 执行: {' '.join(cmd)}") subprocess.run(cmd, check=check) def get_ip(): """获取本机主 IP""" try: result = subprocess.run( ["ip", "-4", "addr", "show"], capture_output=True, text=True, check=True ) for line in result.stdout.splitlines(): line = line.strip() if line.startswith("inet ") and not line.startswith("inet 127"): return line.split()[1].split("/")[0] except Exception: return socket.gethostbyname(socket.gethostname()) raise RuntimeError("未能获取本机 IP") def prepare_dirs(): """创建 Vault 目录并设置权限""" print("📁 创建 Vault 目录...") run_cmd(["mkdir", "-p", "/data0/vault/config", "/data0/vault/ssl", "/data0/vault/data", "/data0/vault/log"]) run_cmd(["touch", "/data0/vault/config/vault.env"]) run_cmd(["chown", "-R", f"{VAULT_USER}:{VAULT_GROUP}", "/data0/vault"]) def audit_log_rotate(): # 保留30天日志 rotate_days = 30 config_file = "/etc/logrotate.d/vault-audit" config_content = f"""/data0/vault/log/audit.log {{ daily rotate {rotate_days} compress missingok notifempty copytruncate create 0640 {VAULT_USER} {VAULT_GROUP} }} """ try: with open(config_file, "w") as f: f.write(config_content) print(f"✅ logrotate 配置已生成: {config_file}") except PermissionError: print(f"❌ 没有权限写入 {config_file},请用 sudo 运行脚本") def install_vault(): """下载并安装 Vault 二进制文件""" if not os.path.exists(VAULT_BIN): print(f"⬇️ 下载 Vault {VAULT_VERSION} ...") run_cmd(["curl", "-O", VAULT_URL]) run_cmd(["unzip", "-o", VAULT_ZIP]) run_cmd(["mv", "vault", VAULT_BIN]) run_cmd(["chown", f"{VAULT_USER}:{VAULT_GROUP}", VAULT_BIN]) else: print(f"✅ {VAULT_BIN} 已存在,跳过下载") def update_hosts_file(ip): """ 将当前节点 IP 和 FQDN 写入 /etc/hosts FQDN = node_id.tls_domain """ hosts_file = "/etc/hosts" node_id = CLUSTER_IPS[ip]["node_id"] fqdn = f"{node_id}.{TLS_SERVERNAME}" line_to_add = f"{ip}\t{fqdn}\n" try: # 读取原始内容 with open(hosts_file, "r") as f: lines = f.readlines() # 过滤掉旧的相同 FQDN new_lines = [l for l in lines if fqdn not in l] new_lines.append(line_to_add) # 写回 /etc/hosts with open(hosts_file, "w") as f: f.writelines(new_lines) print(f"✅ 已更新 /etc/hosts: {ip} {fqdn}") except PermissionError: print("❌ 没有权限写入 /etc/hosts,请使用 sudo 运行脚本") def generate_retry_join(hosts, tls_servername): block = "" for h in hosts: block += f""" retry_join {{ leader_api_addr = "https://{h}:8200" leader_ca_cert_file = "/data0/vault/ssl/ca.pub" leader_client_cert_file = "/data0/vault/ssl/vault-public.pub" leader_client_key_file = "/data0/vault/ssl/vault-private.key" leader_tls_servername = "{tls_servername}" }} """ return block.rstrip("\n") def generate_systemd_service(): """生成 Vault systemd unit 文件""" service_path = "/etc/systemd/system/vault.service" print(f"📝 生成 systemd unit 文件: {service_path}") unit_content = f"""[Unit] Description="HashiCorp Vault - A tool for managing secrets" Documentation=https://developer.hashicorp.com/vault/docs Requires=network-online.target After=network-online.target ConditionFileNotEmpty=/data0/vault/config/config.hcl StartLimitIntervalSec=60 StartLimitBurst=3 [Service] Type=notify EnvironmentFile=/data0/vault/config/vault.env User={VAULT_USER} Group={VAULT_GROUP} ProtectSystem=full ProtectHome=read-only PrivateTmp=yes PrivateDevices=yes SecureBits=keep-caps AmbientCapabilities=CAP_IPC_LOCK CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK NoNewPrivileges=yes ExecStart={VAULT_BIN} server -config=/data0/vault/config/config.hcl ExecReload=/bin/kill --signal HUP $MAINPID KillMode=process KillSignal=SIGINT Restart=on-failure RestartSec=5 TimeoutStopSec=30 LimitNOFILE=65536 LimitMEMLOCK=infinity LimitCORE=0 [Install] WantedBy=multi-user.target """ with open(service_path, "w") as f: f.write(unit_content) print("✅ systemd unit 文件生成完毕") def main(): # parser = argparse.ArgumentParser(description="生成 Vault config.hcl 并安装 Vault") # parser.add_argument("-c", "--cluster", required=True) # parser.add_argument("-n", "--node-id", required=True) # parser.add_argument("-z", "--zone", required=True) # parser.add_argument("-t", "--type", required=True, choices=["voter", "non-voter"]) # parser.add_argument("-r", "--retry-join", required=True) # args = parser.parse_args() ip = get_ip() if ip not in CLUSTER_IPS: raise RuntimeError("本机 IP不在集群列表内") prepare_dirs() install_vault() # hosts = [h.strip() for h in args.retry_join.split(",") if h.strip()] hosts = CLUSTER_IPS.keys() join_block = generate_retry_join(hosts, TLS_SERVERNAME) # 生成 config.hcl print(f"📝 生成 Vault 配置文件: {CFG_FILE}") config = f"""# Generated by deploy_vault.py api_addr = "https://{ip}:8200" cluster_addr = "https://{ip}:8201" ui = true disable_mlock = true cluster_name = "{CLUSTER_NAME}" log_level = "debug" log_file = "/data0/vault/log/vault.log" log_rotate_duration = "24h" log_rotate_bytes = 10485760 log_rotate_max_files = 10 license_path = "/data0/vault/ssl/vault.hclic" """ config += f""" listener "tcp" {{ address = "{ip}:8200" cluster_address = "{ip}:8201" # tls_disable = true tls_client_ca_file = "/data0/vault/ssl/ca.pub" tls_cert_file = "/data0/vault/ssl/vault-public.pub" tls_key_file = "/data0/vault/ssl/vault-private.key" tls_min_version = "tls12" telemetry {{ unauthenticated_metrics_access = true }} }} storage "raft" {{ path = "/data0/vault/data" node_id = "{CLUSTER_IPS[ip]["node_id"]}" autopilot_redundancy_zone = "{CLUSTER_IPS[ip]["zone"]}" """ if not CLUSTER_IPS[ip]["voter"]: config += f""" retry_join_as_non_voter = true """ if CLUSTER_IPS[ip]["leader"]: config += f"""}}\n""" else: config += f"""{join_block} }}\n""" config += """ profiling { unauthenticated_pprof_access = true unauthenticated_in_flight_request_access = true } adaptive_overload_protection { disable_write_controller = false } seal "awskms" { region = "ap-east-1" access_key = "xxxxxxxxxxxxxxxxx" secret_key = "xxxxxxxxxxxxxxxxxxx" kms_key_id = "xxxxxxxxxxxxxxxx" endpoint = "https://kms.ap-east-1.amazonaws.com" } telemetry { disable_hostname = true prometheus_retention_time = "6h" unauthenticated_pprof_access = true } """ with open(CFG_FILE, "w") as f: f.write(config) run_cmd(["chown", f"{VAULT_USER}:{VAULT_GROUP}", CFG_FILE]) generate_systemd_service() audit_log_rotate() update_hosts_file(ip) print(f"✅ 已生成 Vault 配置: {CFG_FILE}") print(f" - node_id: {CLUSTER_IPS[ip]['node_id']}") print(f" - ip: {ip}") print(f" - zone: {CLUSTER_IPS[ip]['zone']}") # print(f" - is_voter: {CLUSTER_IPS[ip]['voter']}") print(f" - retry_join hosts: {hosts}") print("\nℹ️ 你可以执行以下命令启用 Vault 服务:") print("sudo systemctl daemon-reload && sudo systemctl enable vault && sudo systemctl start vault") if __name__ == "__main__": main()
自动快照配置
#aws-snapshot.json
{ "storage_type": "aws-s3", "file_prefix": "paris", "interval": "8h", "retain": 30, "local_max_space": 2621440000, "path_prefix": "primary", "aws_s3_bucket": "my-backup-vault-test", "aws_s3_region": "ap-east-1", "aws_access_key_id": "xxxxxxxxxxxxx", "aws_secret_access_key": "xxxxxxxxxxxxx", "aws_s3_server_side_encryption": "true" }
# start auto snapshot
vault write sys/storage/raft/snapshot-auto/config/paris-primary @aws-snapshot.json
# 查看状态
vault read sys/storage/raft/snapshot-auto/config/paris-primary
# 或者
vault list sys/storage/raft/snapshot-auto/config
附录
部署文档:https://developer.hashicorp.com/validated-designs/vault-solution-design-guides-vault-enterprise/deploying-vault-private-datacenter
阿里云kms解封:https://developer.hashicorp.com/vault/docs/configuration/seal/alicloudkms
Aws kms解封:https://developer.hashicorp.com/vault/docs/configuration/seal/awskms
vault初始化:https://developer.hashicorp.com/vault/tutorials/auto-unseal/autounseal-aws-kms#step-2-test-the-auto-unseal-feature
快照管理:https://developer.hashicorp.com/vault/docs/commands/operator/raft#snapshot
存储后端配置:https://developer.hashicorp.com/vault/docs/configuration/storage
自动快照配置:https://developer.hashicorp.com/vault/docs/sysadmin/snapshots/automate
自动快照支持的后端:https://developer.hashicorp.com/vault/api-docs/system/storage/raftautosnapshots#storage_type
cli命令:https://developer.hashicorp.com/vault/docs/commands/server#_log_format
监控:https://developer.hashicorp.com/vault/tutorials/archive/monitor-telemetry-grafana-prometheus
集群恢复:https://developer.hashicorp.com/vault/tutorials/raft/raft-lost-quorum
集成存储使用及故障修复文档:https://developer.hashicorp.com/vault/tutorials/raft/raft-storage