vault部署

  1. 机器分布

DNS Name IP Address Node Type Location
vault-1.domain 10.0.0.1 voter zone1
vault-2.domain 10.0.0.2 voter zone2
vault-3.domain 10.0.0.3 voter zone3
vault-4.domain 10.0.0.4 non-voter zone1
vault-5.domain 10.0.0.5 non-voter zone2
vault-6.domain 10.0.0.6 non-voter zone3
vault.domain 10.0.0.100 (load balancer) (all zones)
  • 需要一个aws的aksk拥有kms和S3的读写权限(可以控制到具体的bucket和kms key)
  • 使用脚本时需要修改脚本中CLUSTER_NAME, CLUSTER_IPSawskms相关信息
  • 必须先初始化一个leader节点,然后其他的才能retry_join
  1. 自签证书

    1. 自签CA证书,有效期30年(ca.key、ca.crt)
    2. 生成节点证书,有效期30年
    3. 证书(vault-public.pub)。
    4. 证书的私钥(vault-private.key)。
    5. 用于销售证书的证书颁发机构的捆绑文件(ca.pub)。
  2. 部署二进制及system service

  部署(使用op用户启动)
# useradd --system --user-group --shell /bin/false vault
# vault整体部署在/data0/vault下,ssl目录下包含证书及license,config下包含配置信息,data存储数据
mkdir -p /data0/vault/config /data0/vault/ssl /data0/vault/data /data0/vault/log
chown -R op:op /data0/vault
curl -O https://releases.hashicorp.com/vault/1.20.0+ent/vault_1.20.0+ent_linux_amd64.zip
unzip vault_1.20.0+ent_linux_amd64.zip
mv vault /usr/bin/
chown op:op /usr/bin/vault

# 部署证书到/data0/vault/ssl
System service
cat << EOF >> /etc/systemd/system/vault.service
[Unit]
Description="HashiCorp Vault - A tool for managing secrets"
Documentation=https://developer.hashicorp.com/vault/docs
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/data0/vault/config/config.hcl
StartLimitIntervalSec=60
StartLimitBurst=3

[Service]
Type=notify
EnvironmentFile=/data0/vault/config/vault.env
User=op
Group=op
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart=/usr/bin/vault server -config=/data0/vault/config/config.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGINT
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
LimitNOFILE=65536
LimitMEMLOCK=infinity
LimitCORE=0

[Install]
WantedBy=multi-user.targetEOF
EOF

# systemctl daemon-reload
# systemctl enable vault
  1. 节点配置

    • 投票节点(非leader,第一次搭建时leader节点不需要retry_join块,非投票节点也用这个配置,一个zone只有一个节点会成为投票节点)
    # config.hcl for vault1 (voter node example)
    api_addr = "https://10.0.0.1:8200"
    cluster_addr = "https://10.0.0.1:8201"
    ui = true
    disable_mlock = true
    cluster_name = "my-test"
    # log level: trace、debug、info、warn、error
    log_level = "debug" 
    log_file = "/data0/vault/log/vault.log"
    log_rotate_duration = "24h"
    # 10M
    log_rotate_bytes = 10485760
    log_rotate_max_files = 10
    
    license_path = "/data0/vault/ssl/vault.hclic"
    
    listener "tcp" {
      address     = "10.0.0.1:8200"
      cluster_address = "10.0.0.1:8201"
      # tls_disable = true
      tls_client_ca_file = "/data0/vault/ssl/ca.pub"
      tls_cert_file = "/data0/vault/ssl/vault-public.pub"
      tls_key_file  = "/data0/vault/ssl/vault-private.key"
      tls_min_version = "tls12"
      telemetry {
        unauthenticated_metrics_access = true
      }
    }
    
    storage "raft" {
      path    = "/data0/vault/data"
      # 集群内唯一
      node_id = "vault1"
      # 一个zone内只能有一个投票节点
      autopilot_redundancy_zone = "zone1"
      # 以非投票节点加入,只有节点第一次启动时生效,后续这个信息会被存储在 raft 集群的状态里(写入 Raft 日志并复制到所有节点),而不是配置文件里
      # retry_join_as_non_voter = true
    
      retry_join {
        leader_api_addr     = "https://10.0.0.1:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    
      retry_join {
        leader_api_addr     = "https://10.0.0.2:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    
      retry_join {
        leader_api_addr     = "https://10.0.0.3:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    
      retry_join {
        leader_api_addr     = "https://10.0.0.4:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    
      retry_join {
        leader_api_addr     = "https://10.0.0.5:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    
      retry_join {
        leader_api_addr     = "https://10.0.0.6:8200"
        leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
        leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
        leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
        leader_tls_servername = "vault.domain"
      }
    }
    
    # 测试验证阶段配置,生产改为false
    profiling {
      unauthenticated_pprof_access = true
      unauthenticated_in_flight_request_access = true
    }
    
    adaptive_overload_protection { 
      disable_write_controller = false 
    }
    
    seal "awskms" {
      region     = "us-east-1"
      # 替换成正确的aksk
      access_key = "xxxxxxxxxxx"
      secret_key = "xxxxxxxxxxxx"
      kms_key_id = "xxxxxxxxxxxx"
      endpoint   = "https://xxxxxxxx.kms.us-east-1.vpce.amazonaws.com"
    }
    
    telemetry {
      disable_hostname = true
      prometheus_retention_time = "6h"
      unauthenticated_pprof_access = true
    }
  1. 初始化(在leader节点执行)

      启动vault执行初始化
    # 启动vault
    systemctl start vault
    # 设置vault cli使用的环境变量,不能用ip,要用node的domain,因为证书里签的域名
    echo "${node_ip} ${node_name}.vault.domain" >> /etc/hosts
    export VAULT_ADDR=https://${node_name}.vault.domain:8200
    export VAULT_CACERT=/data0/vault/ssl/ca.pub   # 指向你配置文件里 tls_client_ca_file
    
    # 执行初始化,这里会生成一个root token,root token要记下来
    vault operator init
    
    # 解封
    #vault operator unseal
    # 查看状态
    vault status
    # 重启vault
    # sudo systemctl restart vault
    
    # 使用 root token 登录并执行后续操作
    vault login ${root_token}
    # 开启audit log
    vault audit enable file file_path=/data0/vault/log/audit.log
    # 查看 audit log状态
    vault audit list

     查看vault状态

    # 查看vault状态
    vault status
    
    # 查看集群节点列表
    vault operator raft list-peers
    
    # 查看节点投票状态
    vault operator raft autopilot state
  2. 自动快照配置

    • aws自动快照
    # aws-snapshot.json
    {
      "storage_type": "aws-s3",
      "file_prefix": "paris",
      "interval": "8h",
      "retain": 30,
      "local_max_space": 2621440000,
      "path_prefix": "primary",
      "aws_s3_bucket": "vault-snapshots",
      "aws_s3_region": "eu-west-3",
      "aws_access_key_id": "ASI...COFFEE",
      "aws_secret_access_key": "wJalr...COFFEEKEY",
      # 一般不需要
      #"aws_session_token": "IQoJb3JpZ2luX2IQ...COFFEE",
      "aws_s3_server_side_encryption": "true"
    }
    
    # start auto snapshot
    vault write sys/storage/raft/snapshot-auto/config/paris-primary @aws-snapshot.json
    # 查看状态
    vault read sys/storage/raft/snapshot-auto/config/paris-primary
    # 或者
    vault list sys/storage/raft/snapshot-auto/config
    • 手动全量快照
    # 设置vault cli使用的环境变量,不能用ip,要用node的domain,因为证书里签的域名
    echo "${node_ip} ${node_name}.vault.domain" >> /etc/hosts
    export VAULT_ADDR=https://${node_name}.vault.domain:8200
    export VAULT_CACERT=/data0/vault/ssl/ca.pub   # 指向你配置文件里 tls_client_ca_file
    
    # 执行导出快照
    vault operator raft snapshot save ${file_name}.snapshot
  1. 启动其他非leader节点服务

  1. LB 配置

      需要一个L7的NLB做负载均衡,LB健康检查参考:
      https://developer.hashicorp.com/vault/api-docs/system/health
----------------------------------end--------------------------------------------------------------
 
 
 
  1. 一些配置及操作参考

  • aws自动解封配置:
seal "awskms" {
  region     = "us-east-1"
  access_key = "xxxxxxx"
  secret_key = "xxxxxxxxx"
  kms_key_id = "xxxxxxxxxx"
  endpoint   = "https://xxxxxxx.amazonaws.com"
}
  • 阿里云自动解封配置:
seal "alicloudkms" {
  region     = "us-east-1" 
  access_key = "xxxxxxx" 
  secret_key = "xxxxxxxxxxxx" 
  kms_key_id = "xxxxxxxxxx"
  domain = "kms-vpc-address"
}
  • 监控配置:
# 设置了unauthenticated_metrics_access = true,测试获取指标
curl -k https://${ip}:8200/v1/sys/metrics
# 没设置unauthenticated_metrics_access = true,测试获取指标,token生成参见下文
curl -k --header "X-Vault-Token: ${token}" https://${ip}:8200/v1/sys/metrics

# 查看token情况 vault token lookup
| grep policies # 创建Prometheus策略 vault policy write prometheus-metrics - << EOF path "/sys/metrics" { capabilities = ["read", "list"] } EOF #创建Prometheus使用的token vault token create \ -field=token \ -policy prometheus-metrics -display-name prometheus-metrics \ > /data0/vault/config/prometheus-token # Prometheus配置 cat > /data0/prometheus/vault-prometheus.yml << EOF scrape_configs: - job_name: vault metrics_path: /v1/sys/metrics params: format: ['prometheus'] scheme: http tls_config: insecure_skip_verify: true # 或者配置 CA authorization: credentials: "<Vault token>" # credentials_file: /etc/prometheus/prometheus-token static_configs: - targets: ['10.0.0.6:8200'] EOF
  • 查看autopilot状态
vault operator raft autopilot state

# 手动加节点到集群,在待加入节点上执行
export VAULT_ADDR="http://${待加入节点IP}:8200"
vault operator raft join http://${leader节点IP}:8200
  • 将节点修改为投票/非投票节点
# 将节点修改为投票节点, 不能用
# vault operator raft promote <node_id>
# 将节点修改为非投票节点,不能用
# vault operator raft demote <node_id>
# 在leader节点删除节点
vault operator raft remove-peer ${node_id}
# 在要删除的节点停止服务,删除数据,启动服务
systemctl stop vault
rm -rf /data0/vault/data
mkdir /data0/vault/data && chown op:op /data0/vault/data
systemctl start vault
  • 扩容
1. 在新节点安装 Vault
保证新节点和现有节点版本一致(Vault、OS、TLS 支持)。
配置好 Vault 二进制和数据目录,比如 /usr/bin/vault 和 /data0/vault/data。
2. 配置 Vault(server.hcl)跟之前部署类似
新节点需要一个配置文件,例如 /data0/vault/config/config.hcl,证书,node_id要唯一
3. 启动新节点
注意:新节点 无需初始化或 unseal,它会从现有集群同步状态
systemctl start vault
  • 缩容
1.删除节点 时要先在 leader 上执行:
vault operator raft remove-peer <node_id>
然后停止服务
systemctl stop vault
  • 当无法恢复足够多机器来完成选举时如何恢复集群
  1. 在存储目录(/data0/vault/data)内,有一个名为的文件夹raft
vault
└── data
    ├── raft
    │   ├── raft.db
    │   └── snapshots
    └── vault.db
为了使唯一剩余的 Vault 服务器达到法定人数并选举自己为领导者,需创建一个raft/peers.json包含服务器信息的文件。该文件格式为 JSON 数组,其中包含运行状况良好的 Vault 服务器的服务器 ID、地址:端口和选举权信息(例如test-node1)。
cat > /data0/vault/data/raft/peers.json << EOF
[
  {
    "id": "vault_1",
    "address": "10.0.0.6:8201",
    "non_voter": false
  }
]
EOF
id (string: <required>)- 指定服务器的服务器 ID。
address (string: <required>)- 指定服务器的主机和端口。端口是服务器的集群端口。
non_voter (bool: <false>)- 这控制服务器是否为非投票者。
  1. 重新启动 Vault 进程以使 Vault 能够加载新peers.json文件
sudo systemctl restart vault
  1. 解封,如果未配置使用自动解封,请解封 Vault,然后检查状态,咱们配置了自动解封,应该不需要手动解封
vault operator unseal
  1. 查看状态
vault operator raft list-peers
  1. 其他节点恢复后重新加入集群或扩容新节点到集群
 

脚本

生成ssl 证书
#!/usr/bin/env bash
set -euo pipefail

# ========= 可配置参数 =========
DAYS_CA=10950             # 根证书有效期 (30年)
DAYS_CERT=10950
CA_KEY="ca.key"
CA_CRT="ca.pub"
CA_SUBJECT="/C=CN/ST=GLOBAL/L=GLOBAL/O=MyOrg/OU=IT/CN=Vault-Root-CA"

VAULT_KEY="vault-private.key"
VAULT_CSR="vault.csr"
VAULT_CRT="vault-public.pub"
VAULT_SUBJECT="/C=CN/ST=GLOBAL/L=GLOBAL/O=MyOrg/OU=IT/CN=Vault-Root-CA"

CONFIG="openssl-san.cnf"

# 填写你的域名和IP(可选)
DNS_NAME="xxx-vault.com"
#IP_LIST=("10.0.0.5" "10.0.0.6")

# ========= 生成 OpenSSL SAN 配置 =========
cat > $CONFIG <<EOF
[ req ]
default_bits       = 2048
distinguished_name = req_distinguished_name
req_extensions     = req_ext
prompt             = no

[ req_distinguished_name ]
CN = ${DNS_NAME}

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = ${DNS_NAME}
DNS.2 = xxxx.net
DNS.3 = yyyy.com
DNS.4 = *.${DNS_NAME}
DNS.5 = *.xxxx.net
DNS.6 = *.yyyy.com
EOF

# 动态添加多个 IP
#i=1
#for ip in "${IP_LIST[@]}"; do
#  echo "IP.${i} = ${ip}" >> $CONFIG
#  ((i++))
#done

# ========= 生成 CA 根证书 =========
echo "==> 生成 CA 根证书"
openssl genrsa -out "${CA_KEY}" 4096
openssl req -x509 -new -nodes -key "${CA_KEY}" -sha256 -days "$DAYS_CA" -subj "${CA_SUBJECT}" -out "${CA_CRT}"

# ========= 生成 Vault 服务端私钥 & CSR =========
echo "==> 生成 Vault 服务端私钥 & CSR"
openssl genrsa -out "${VAULT_KEY}" 2048
openssl req -new -key "${VAULT_KEY}" -out "${VAULT_CSR}" -subj "${VAULT_SUBJECT}" -config "${CONFIG}"

# ========= 用 CA 签发 Vault 服务端证书 =========
echo "==> 使用 CA 签发 Vault 服务器证书"
openssl x509 -req -in "${VAULT_CSR}" -CA "${CA_CRT}" -CAkey "${CA_KEY}" -CAcreateserial \
  -out "${VAULT_CRT}" -days "$DAYS_CERT" -sha256 -extensions req_ext -extfile "${CONFIG}"

mkdir -p /data0/vault/config /data0/vault/ssl /data0/vault/data /data0/vault/log
chown -R op:op /data0/vault
cp ca.pub vault-private.key vault-public.pub /data0/vault/ssl/
chown op:op /data0/vault/ssl/*
# cp ${CA_CRT} ${VAULT_CRT} ${VAULT_KEY} /data0/vault/ssl

echo "✅ 生成完成!文件列表:"
echo "  - CA 根证书:      ${CA_CRT}"
echo "  - Vault 证书:     ${VAULT_CRT}"
echo "  - Vault 私钥:     ${VAULT_KEY}"
echo ""
echo "📌 提示:将 ${CA_CRT} 分发到所有 Vault 节点的 trust store(或者 VAULT_CACERT 环境变量),"
echo "         将 ${VAULT_CRT} 和 ${VAULT_KEY} 配置到 Vault 服务端 tls_cert_file / tls_key_file。"

部署脚本

#!/usr/bin/env python3
# deploy_cluster_vault.py
# 用 Python 一键部署 Vault 节点,包含目录创建、权限设置、Vault 二进制下载、配置文件生成、systemd unit 文件生成

import argparse
import os
import socket
import subprocess

CLUSTER_NAME = "my-test"
# 同一个zone只能有一个投票节点,如果机器不够3个zone,同一个zone的机器zone值可以写不同的,保证有3个投票节点
CLUSTER_IPS = {
    "10.0.0.1": {"zone": "zone-a", "voter": True, "node_id": "my-test-node1", "leader": True},
    "10.0.0.2": {"zone": "zone-c", "voter": True, "node_id": "my-test-node2", "leader": False},
    "10.0.0.3": {"zone": "zone-b", "voter": True, "node_id": "my-test-node3", "leader": False},
    "10.0.0.4": {"zone": "zone-a", "voter": False, "node_id": "my-test-node4", "leader": False},
}

TLS_SERVERNAME = "vault.domain"
VAULT_VERSION = "1.20.0+ent"
VAULT_ZIP = f"vault_{VAULT_VERSION}_linux_amd64.zip"
VAULT_URL = f"https://releases.hashicorp.com/vault/{VAULT_VERSION}/{VAULT_ZIP}"
VAULT_BIN = "/usr/bin/vault"
VAULT_USER = "op"
VAULT_GROUP = "op"
CFG_FILE="/data0/vault/config/config.hcl"

def run_cmd(cmd, check=True):
    print(f"▶️  执行: {' '.join(cmd)}")
    subprocess.run(cmd, check=check)

def get_ip():
    """获取本机主 IP"""
    try:
        result = subprocess.run(
            ["ip", "-4", "addr", "show"], capture_output=True, text=True, check=True
        )
        for line in result.stdout.splitlines():
            line = line.strip()
            if line.startswith("inet ") and not line.startswith("inet 127"):
                return line.split()[1].split("/")[0]
    except Exception:
        return socket.gethostbyname(socket.gethostname())
    raise RuntimeError("未能获取本机 IP")

def prepare_dirs():
    """创建 Vault 目录并设置权限"""
    print("📁 创建 Vault 目录...")
    run_cmd(["mkdir", "-p", "/data0/vault/config", "/data0/vault/ssl", "/data0/vault/data", "/data0/vault/log"])
    run_cmd(["touch", "/data0/vault/config/vault.env"])
    run_cmd(["chown", "-R", f"{VAULT_USER}:{VAULT_GROUP}", "/data0/vault"])

def audit_log_rotate():
    # 保留30天日志
    rotate_days = 30
    config_file = "/etc/logrotate.d/vault-audit"
    config_content = f"""/data0/vault/log/audit.log {{
    daily
    rotate {rotate_days}
    compress
    missingok
    notifempty
    copytruncate
    create 0640 {VAULT_USER} {VAULT_GROUP}
}}
"""
    try:
        with open(config_file, "w") as f:
            f.write(config_content)
        print(f"✅ logrotate 配置已生成: {config_file}")
    except PermissionError:
        print(f"❌ 没有权限写入 {config_file},请用 sudo 运行脚本")

def install_vault():
    """下载并安装 Vault 二进制文件"""
    if not os.path.exists(VAULT_BIN):
        print(f"⬇️ 下载 Vault {VAULT_VERSION} ...")
        run_cmd(["curl", "-O", VAULT_URL])
        run_cmd(["unzip", "-o", VAULT_ZIP])
        run_cmd(["mv", "vault", VAULT_BIN])
        run_cmd(["chown", f"{VAULT_USER}:{VAULT_GROUP}", VAULT_BIN])
    else:
        print(f"✅ {VAULT_BIN} 已存在,跳过下载")

def update_hosts_file(ip):
    """
    将当前节点 IP 和 FQDN 写入 /etc/hosts
    FQDN = node_id.tls_domain
    """
    hosts_file = "/etc/hosts"
    node_id = CLUSTER_IPS[ip]["node_id"]
    fqdn = f"{node_id}.{TLS_SERVERNAME}"
    line_to_add = f"{ip}\t{fqdn}\n"

    try:
        # 读取原始内容
        with open(hosts_file, "r") as f:
            lines = f.readlines()

        # 过滤掉旧的相同 FQDN
        new_lines = [l for l in lines if fqdn not in l]
        new_lines.append(line_to_add)

        # 写回 /etc/hosts
        with open(hosts_file, "w") as f:
            f.writelines(new_lines)
        print(f"✅ 已更新 /etc/hosts: {ip} {fqdn}")
    except PermissionError:
        print("❌ 没有权限写入 /etc/hosts,请使用 sudo 运行脚本")

def generate_retry_join(hosts, tls_servername):
    block = ""
    for h in hosts:
        block += f"""
  retry_join {{
    leader_api_addr     = "https://{h}:8200"
    leader_ca_cert_file = "/data0/vault/ssl/ca.pub"
    leader_client_cert_file = "/data0/vault/ssl/vault-public.pub"
    leader_client_key_file  = "/data0/vault/ssl/vault-private.key"
    leader_tls_servername = "{tls_servername}"
  }}
"""
    return block.rstrip("\n")

def generate_systemd_service():
    """生成 Vault systemd unit 文件"""
    service_path = "/etc/systemd/system/vault.service"
    print(f"📝 生成 systemd unit 文件: {service_path}")
    unit_content = f"""[Unit]
Description="HashiCorp Vault - A tool for managing secrets"
Documentation=https://developer.hashicorp.com/vault/docs
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/data0/vault/config/config.hcl
StartLimitIntervalSec=60
StartLimitBurst=3

[Service]
Type=notify
EnvironmentFile=/data0/vault/config/vault.env
User={VAULT_USER}
Group={VAULT_GROUP}
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart={VAULT_BIN} server -config=/data0/vault/config/config.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGINT
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
LimitNOFILE=65536
LimitMEMLOCK=infinity
LimitCORE=0

[Install]
WantedBy=multi-user.target
"""
    with open(service_path, "w") as f:
        f.write(unit_content)
    print("✅ systemd unit 文件生成完毕")

def main():
    # parser = argparse.ArgumentParser(description="生成 Vault config.hcl 并安装 Vault")
    # parser.add_argument("-c", "--cluster", required=True)
    # parser.add_argument("-n", "--node-id", required=True)
    # parser.add_argument("-z", "--zone", required=True)
    # parser.add_argument("-t", "--type", required=True, choices=["voter", "non-voter"])
    # parser.add_argument("-r", "--retry-join", required=True)
    # args = parser.parse_args()

    ip = get_ip()
    if ip not in CLUSTER_IPS:
        raise RuntimeError("本机 IP不在集群列表内")
    prepare_dirs()
    install_vault()
    # hosts = [h.strip() for h in args.retry_join.split(",") if h.strip()]
    hosts = CLUSTER_IPS.keys()
    join_block = generate_retry_join(hosts, TLS_SERVERNAME)

    # 生成 config.hcl
    print(f"📝 生成 Vault 配置文件: {CFG_FILE}")
    config = f"""# Generated by deploy_vault.py
api_addr = "https://{ip}:8200"
cluster_addr = "https://{ip}:8201"
ui = true
disable_mlock = true
cluster_name = "{CLUSTER_NAME}"

log_level = "debug"
log_file = "/data0/vault/log/vault.log"
log_rotate_duration = "24h"
log_rotate_bytes = 10485760
log_rotate_max_files = 10

license_path = "/data0/vault/ssl/vault.hclic"
"""

    config += f"""
listener "tcp" {{
  address     = "{ip}:8200"
  cluster_address = "{ip}:8201"
  # tls_disable = true
  tls_client_ca_file = "/data0/vault/ssl/ca.pub"
  tls_cert_file = "/data0/vault/ssl/vault-public.pub"
  tls_key_file  = "/data0/vault/ssl/vault-private.key"
  tls_min_version = "tls12"
  telemetry {{
    unauthenticated_metrics_access = true
  }}
}}

storage "raft" {{
  path    = "/data0/vault/data"
  node_id = "{CLUSTER_IPS[ip]["node_id"]}"
  autopilot_redundancy_zone = "{CLUSTER_IPS[ip]["zone"]}"
"""
    if not CLUSTER_IPS[ip]["voter"]:
        config += f"""
  retry_join_as_non_voter = true
"""
    if CLUSTER_IPS[ip]["leader"]:
        config += f"""}}\n"""
    else:
        config += f"""{join_block}
}}\n"""

    config += """
profiling {
  unauthenticated_pprof_access = true
  unauthenticated_in_flight_request_access = true
}

adaptive_overload_protection {
  disable_write_controller = false
}

seal "awskms" {
  region     = "ap-east-1"
  access_key = "xxxxxxxxxxxxxxxxx"
  secret_key = "xxxxxxxxxxxxxxxxxxx"
  kms_key_id = "xxxxxxxxxxxxxxxx"
  endpoint   = "https://kms.ap-east-1.amazonaws.com"
}

telemetry {
  disable_hostname = true
  prometheus_retention_time = "6h"
  unauthenticated_pprof_access = true
}
"""
    with open(CFG_FILE, "w") as f:
        f.write(config)
    run_cmd(["chown", f"{VAULT_USER}:{VAULT_GROUP}", CFG_FILE])

    generate_systemd_service()
    audit_log_rotate()
    update_hosts_file(ip)

    print(f"✅ 已生成 Vault 配置: {CFG_FILE}")
    print(f" - node_id: {CLUSTER_IPS[ip]['node_id']}")
    print(f" - ip: {ip}")
    print(f" - zone: {CLUSTER_IPS[ip]['zone']}")
    # print(f" - is_voter: {CLUSTER_IPS[ip]['voter']}")
    print(f" - retry_join hosts: {hosts}")
    print("\nℹ️ 你可以执行以下命令启用 Vault 服务:")
    print("sudo systemctl daemon-reload && sudo systemctl enable vault && sudo systemctl start vault")

if __name__ == "__main__":
    main()

 

自动快照配置

#aws-snapshot.json
{
"storage_type": "aws-s3", "file_prefix": "paris", "interval": "8h", "retain": 30, "local_max_space": 2621440000, "path_prefix": "primary", "aws_s3_bucket": "my-backup-vault-test", "aws_s3_region": "ap-east-1", "aws_access_key_id": "xxxxxxxxxxxxx", "aws_secret_access_key": "xxxxxxxxxxxxx", "aws_s3_server_side_encryption": "true" }
# start auto snapshot
vault write sys/storage/raft/snapshot-auto/config/paris-primary @aws-snapshot.json
# 查看状态
vault read sys/storage/raft/snapshot-auto/config/paris-primary
# 或者
vault list sys/storage/raft/snapshot-auto/config
 

附录

部署文档:https://developer.hashicorp.com/validated-designs/vault-solution-design-guides-vault-enterprise/deploying-vault-private-datacenter
阿里云kms解封:https://developer.hashicorp.com/vault/docs/configuration/seal/alicloudkms
Aws kms解封:https://developer.hashicorp.com/vault/docs/configuration/seal/awskms
vault初始化:https://developer.hashicorp.com/vault/tutorials/auto-unseal/autounseal-aws-kms#step-2-test-the-auto-unseal-feature
快照管理:https://developer.hashicorp.com/vault/docs/commands/operator/raft#snapshot
存储后端配置:https://developer.hashicorp.com/vault/docs/configuration/storage
自动快照配置:https://developer.hashicorp.com/vault/docs/sysadmin/snapshots/automate
自动快照支持的后端:https://developer.hashicorp.com/vault/api-docs/system/storage/raftautosnapshots#storage_type
cli命令:https://developer.hashicorp.com/vault/docs/commands/server#_log_format
监控:https://developer.hashicorp.com/vault/tutorials/archive/monitor-telemetry-grafana-prometheus
集群恢复:https://developer.hashicorp.com/vault/tutorials/raft/raft-lost-quorum
集成存储使用及故障修复文档:https://developer.hashicorp.com/vault/tutorials/raft/raft-storage

posted on 2025-09-12 15:16  生活费  阅读(18)  评论(0)    收藏  举报

导航