大模型私有化部署-deepseek671-mindie

  使用2台华为800T,RoCE组网,部署DeepSeek R1/V3的W8A8版本,参考魔乐社区DeepSeek-R1,完全按照步骤做就可以了。下面记录下几个关键容易出问题的点:

1、检查npu联网情况,获取每台主机、每张npu的ip用以编写ranktable.json:for i in {0..7};do hccn_tool -i $i -ip -g; done

  ps:碰到过莫名其妙的某张npu的netdetect被改了,使用命令改回:hccn_tool -i 0 -netdetect -s address 100.97.2.190

2、编写ranktable.json,主从机配置一样。server_list中第一个server为主节点(rank_id在前);server_id为节点ip地址;container_ip为容器ip地址,与server_id相同

完整ranktable.json示例

{
    "version": "1.0",
    "server_count": "2",
    "server_list": [
        {
            "server_id": "192.168.231.228",
            "container_ip": "192.168.231.228",
            "device": [
                { "device_id": "0", "device_ip": "100.97.2.153", "rank_id": "0" },
                { "device_id": "1", "device_ip": "100.97.2.154", "rank_id": "1" },
                { "device_id": "2", "device_ip": "100.97.2.155", "rank_id": "2" },
                { "device_id": "3", "device_ip": "100.97.2.156", "rank_id": "3" },
                { "device_id": "4", "device_ip": "100.97.2.157", "rank_id": "4" },
                { "device_id": "5", "device_ip": "100.97.2.158", "rank_id": "5" },
                { "device_id": "6", "device_ip": "100.97.2.159", "rank_id": "6" },
                { "device_id": "7", "device_ip": "100.97.2.160", "rank_id": "7" }
            ]
        },
        {
            "server_id": "192.168.231.227",
            "container_ip": "192.168.231.227",
            "device": [
                { "device_id": "0", "device_ip": "100.97.2.145", "rank_id": "8" },
                { "device_id": "1", "device_ip": "100.97.2.146", "rank_id": "9" },
                { "device_id": "2", "device_ip": "100.97.2.147", "rank_id": "10" },
                { "device_id": "3", "device_ip": "100.97.2.148", "rank_id": "11" },
                { "device_id": "4", "device_ip": "100.97.2.149", "rank_id": "12" },
                { "device_id": "5", "device_ip": "100.97.2.150", "rank_id": "13" },
                { "device_id": "6", "device_ip": "100.97.2.151", "rank_id": "14" },
                { "device_id": "7", "device_ip": "100.97.2.152", "rank_id": "15" }
            ]
        }
    ],
    "status": "completed"
}

3、修改权重目录属性为750:chmod -R 750 {/path-to-weights/DeepSeek-R1}

4、修改权重目录中config.json文件中的model_type为deepseekv2:"model_type": "deepseekv2"

5、编写服务化配置文件config.json:主从机使用配置相同,后续要映射到容器内的/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json

"ipAddress" : 192.168.231.228, # 服务地址(主节点IP)
"port" : 1024, # 服务端口
"httpsEnabled" : false, # 如果网络环境不安全,不开启HTTPS通信,即“httpsEnabled”=“false”时,会存在较高的网络安全风险 ... "multiNodesInferEnabled" : true, # 开启多机推理 ... # 若不需要安全认证,则将以下两个参数设为false "interCommTLSEnabled" : false, "interNodeTLSEnabled" : false, ... "npudeviceIds" : [[0,1,2,3,4,5,6,7]], ... "modelName" : "DeepSeek-R1" # 不影响服务化拉起 "modelWeightPath" : "权重路径", "worldSize":8,
"cpuMemSize" : 40, # 原始默认值5,会导致输入token被限制为2559,需扩大

完整config.json示例:

{
    "Version" : "1.0.0",

    "ServerConfig" :
    {
        "ipAddress" : "192.168.231.228",
        "managementIpAddress" : "192.168.231.228",
        "port" : 1025,
        "managementPort" : 1026,
        "metricsPort" : 1027,
        "allowAllZeroIpListening" : false,
        "maxLinkNum" : 1000,
        "httpsEnabled" : false,
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : false,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm",
        "tokenTimeout" : 600,
        "e2eTimeout" : 600,
        "distDPServerEnabled":false
    },

    "BackendConfig" : {
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : true,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : false,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 2560,
            "maxInputTokenLen" : 2048,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "DeepSeek-V3",
                    "modelWeightPath" : "/app1/models/deepseek-v3-0324",
                    "worldSize" : 8,
                    "cpuMemSize" : 40,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 2560,
            "maxInputTokenLen" : 2048,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "DeepSeek-V3",
                    "modelWeightPath" : "/app1/models/deepseek-v3-0324",
                    "worldSize" : 8,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 8192,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 512,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

 

6、下载镜像(swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts),下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f。

7、准备启动脚本:start.sh

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
source /usr/local/Ascend/atb-models/set_env.sh
source /usr/local/Ascend/mindie/set_env.sh
export ATB_LLM_HCCL_ENABLE=1
export ATB_LLM_COMM_BACKEND="hccl"
export HCCL_CONNECT_TIMEOUT=7200
export WORLD_SIZE=16
export HCCL_EXEC_TIMEOUT=0
export MASTER_IP=192.168.231.228 # 主节点IP
export MINDIE_LOG_TO_STDOUT=1
export HCCL_OP_EXPANSION_MODE="AIV"
export INF_NAN_MODE_ENABLE=0
export PARALLEL_PARAMS=[4,4,4,4,-1,-1]

export RANKTABLEFILE=/app1/scripts/ranktable.json #指定ranktable文件位置
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MIES_CONTAINER_IP=192.168.231.228 # 容器所在宿主机IP,主、从机此处不同
export OMP_NUM_THREADS=1
export NPU_MEMORY_FRACTION=0.95
export MINDIE_LOG_LEVEL=info && export MINDIE_LOG_TO_STDOUT=1

cd /usr/local/Ascend/mindie/latest/mindie-service/
./bin/mindieservice_daemon &

8、拉起容器:建议先暂停防火墙验证全部配置和流程,待服务正常拉起后再启动防火墙

docker run -itd --privileged --net=host \
   --name=DeepSeek-V3-W8A8 \ # 容器名
   --shm-size 500g \
   --device=/dev/davinci0 \
   --device=/dev/davinci1 \
   --device=/dev/davinci2 \
   --device=/dev/davinci3 \
   --device=/dev/davinci4 \
   --device=/dev/davinci5 \
   --device=/dev/davinci6 \
   --device=/dev/davinci7 \
   --device=/dev/davinci_manager \
   --device=/dev/hisi_hdc \
   --device /dev/devmm_svm \
   -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
   -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
   -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
   -v /usr/local/sbin:/usr/local/sbin \
   -v /etc/hccn.conf:/etc/hccn.conf \
   -v /app1/:/app1/ \ # 映射运行目录,包括start.sh、ranktable.json、权重文件、服务化配置文件config.json
   -v /app1/scripts/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json \ 映射服务化配置文件
   --entrypoint=/bin/bash \
   swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC2-800I-A2-py311-openeuler24.03-lts \
   -c "sh /app1/scripts/start.sh && /bin/bash" / 容器启动时执行启动脚本

9、验证服务:

curl -H "Accept: application/json"-H "Content-type: application/json" -X POST -d '{"model": "DeepSeek-V3","messages": [{"role": "user", "content": "你是哪个模型?"},{"role": "assistant", "content": "你好"}],"stream": false}'      http://192.168.231.230:1025/v1/chat/completions

 10、调优:

  config.json中的有几个参数控制大模型的输入输出上限,具体见配置参数说明。maxSeqLen >= maxInputTokenLen + maxIterTimes。maxPrefillTokens >= maxInputTokenLen

  • maxSeqLen:最大序列长度,32768
  • maxInputTokenLen:输入token最大长度,10240
  • maxPrefillTokens:必须大于或等于maxInputTokenLen的取值,10240
  • maxIterTimes:模型全局最大输出长度,22528
  • cpuMemSize: 会影响模型输入token最大长度,经验值:40
posted @ 2025-07-08 16:12  badwood  阅读(151)  评论(0)    收藏  举报
Badwood's Blog