大模型私有化部署-deepseek671-mindie
使用2台华为800T,RoCE组网,部署DeepSeek R1/V3的W8A8版本,参考魔乐社区DeepSeek-R1,完全按照步骤做就可以了。下面记录下几个关键容易出问题的点:
1、检查npu联网情况,获取每台主机、每张npu的ip用以编写ranktable.json:for i in {0..7};do hccn_tool -i $i -ip -g; done
ps:碰到过莫名其妙的某张npu的netdetect被改了,使用命令改回:hccn_tool -i 0 -netdetect -s address 100.97.2.190
2、编写ranktable.json,主从机配置一样。server_list中第一个server为主节点(rank_id在前);server_id为节点ip地址;container_ip为容器ip地址,与server_id相同
完整ranktable.json示例
{ "version": "1.0", "server_count": "2", "server_list": [ { "server_id": "192.168.231.228", "container_ip": "192.168.231.228", "device": [ { "device_id": "0", "device_ip": "100.97.2.153", "rank_id": "0" }, { "device_id": "1", "device_ip": "100.97.2.154", "rank_id": "1" }, { "device_id": "2", "device_ip": "100.97.2.155", "rank_id": "2" }, { "device_id": "3", "device_ip": "100.97.2.156", "rank_id": "3" }, { "device_id": "4", "device_ip": "100.97.2.157", "rank_id": "4" }, { "device_id": "5", "device_ip": "100.97.2.158", "rank_id": "5" }, { "device_id": "6", "device_ip": "100.97.2.159", "rank_id": "6" }, { "device_id": "7", "device_ip": "100.97.2.160", "rank_id": "7" } ] }, { "server_id": "192.168.231.227", "container_ip": "192.168.231.227", "device": [ { "device_id": "0", "device_ip": "100.97.2.145", "rank_id": "8" }, { "device_id": "1", "device_ip": "100.97.2.146", "rank_id": "9" }, { "device_id": "2", "device_ip": "100.97.2.147", "rank_id": "10" }, { "device_id": "3", "device_ip": "100.97.2.148", "rank_id": "11" }, { "device_id": "4", "device_ip": "100.97.2.149", "rank_id": "12" }, { "device_id": "5", "device_ip": "100.97.2.150", "rank_id": "13" }, { "device_id": "6", "device_ip": "100.97.2.151", "rank_id": "14" }, { "device_id": "7", "device_ip": "100.97.2.152", "rank_id": "15" } ] } ], "status": "completed" }
3、修改权重目录属性为750:chmod -R 750 {/path-to-weights/DeepSeek-R1}
4、修改权重目录中config.json文件中的model_type为deepseekv2:"model_type": "deepseekv2"
5、编写服务化配置文件config.json:主从机使用配置相同,后续要映射到容器内的/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
"ipAddress" : 192.168.231.228, # 服务地址(主节点IP)
"port" : 1024, # 服务端口
"httpsEnabled" : false, # 如果网络环境不安全,不开启HTTPS通信,即“httpsEnabled”=“false”时,会存在较高的网络安全风险 ... "multiNodesInferEnabled" : true, # 开启多机推理 ... # 若不需要安全认证,则将以下两个参数设为false "interCommTLSEnabled" : false, "interNodeTLSEnabled" : false, ... "npudeviceIds" : [[0,1,2,3,4,5,6,7]], ... "modelName" : "DeepSeek-R1" # 不影响服务化拉起 "modelWeightPath" : "权重路径", "worldSize":8,
"cpuMemSize" : 40, # 原始默认值5,会导致输入token被限制为2559,需扩大
完整config.json示例:
{ "Version" : "1.0.0", "ServerConfig" : { "ipAddress" : "192.168.231.228", "managementIpAddress" : "192.168.231.228", "port" : 1025, "managementPort" : 1026, "metricsPort" : 1027, "allowAllZeroIpListening" : false, "maxLinkNum" : 1000, "httpsEnabled" : false, "fullTextEnabled" : false, "tlsCaPath" : "security/ca/", "tlsCaFile" : ["ca.pem"], "tlsCert" : "security/certs/server.pem", "tlsPk" : "security/keys/server.key.pem", "tlsPkPwd" : "security/pass/key_pwd.txt", "tlsCrlPath" : "security/certs/", "tlsCrlFiles" : ["server_crl.pem"], "managementTlsCaFile" : ["management_ca.pem"], "managementTlsCert" : "security/certs/management/server.pem", "managementTlsPk" : "security/keys/management/server.key.pem", "managementTlsPkPwd" : "security/pass/management/key_pwd.txt", "managementTlsCrlPath" : "security/management/certs/", "managementTlsCrlFiles" : ["server_crl.pem"], "kmcKsfMaster" : "tools/pmt/master/ksfa", "kmcKsfStandby" : "tools/pmt/standby/ksfb", "inferMode" : "standard", "interCommTLSEnabled" : false, "interCommPort" : 1121, "interCommTlsCaPath" : "security/grpc/ca/", "interCommTlsCaFiles" : ["ca.pem"], "interCommTlsCert" : "security/grpc/certs/server.pem", "interCommPk" : "security/grpc/keys/server.key.pem", "interCommPkPwd" : "security/grpc/pass/key_pwd.txt", "interCommTlsCrlPath" : "security/grpc/certs/", "interCommTlsCrlFiles" : ["server_crl.pem"], "openAiSupport" : "vllm", "tokenTimeout" : 600, "e2eTimeout" : 600, "distDPServerEnabled":false }, "BackendConfig" : { "backendName" : "mindieservice_llm_engine", "modelInstanceNumber" : 1, "npuDeviceIds" : [[0,1,2,3,4,5,6,7]], "tokenizerProcessNumber" : 8, "multiNodesInferEnabled" : true, "multiNodesInferPort" : 1120, "interNodeTLSEnabled" : false, "interNodeTlsCaPath" : "security/grpc/ca/", "interNodeTlsCaFiles" : ["ca.pem"], "interNodeTlsCert" : "security/grpc/certs/server.pem", "interNodeTlsPk" : "security/grpc/keys/server.key.pem", "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt", "interNodeTlsCrlPath" : "security/grpc/certs/", "interNodeTlsCrlFiles" : ["server_crl.pem"], "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa", "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb", "ModelDeployConfig" : { "maxSeqLen" : 2560, "maxInputTokenLen" : 2048, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "DeepSeek-V3", "modelWeightPath" : "/app1/models/deepseek-v3-0324", "worldSize" : 8, "cpuMemSize" : 40, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false } ] }, "ScheduleConfig" : { "templateType" : "Standard", "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb", "ModelDeployConfig" : { "maxSeqLen" : 2560, "maxInputTokenLen" : 2048, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "DeepSeek-V3", "modelWeightPath" : "/app1/models/deepseek-v3-0324", "worldSize" : 8, "cpuMemSize" : 5, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false } ] }, "ScheduleConfig" : { "templateType" : "Standard", "templateName" : "Standard_LLM", "cacheBlockSize" : 128, "maxPrefillBatchSize" : 50, "maxPrefillTokens" : 8192, "prefillTimeMsPerReq" : 150, "prefillPolicyType" : 0, "decodeTimeMsPerReq" : 50, "decodePolicyType" : 0, "maxBatchSize" : 200, "maxIterTimes" : 512, "maxPreemptCount" : 0, "supportSelectBatch" : false, "maxQueueDelayMicroseconds" : 5000 } } }
6、下载镜像(swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts),下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f。
7、准备启动脚本:start.sh
source /usr/local/Ascend/ascend-toolkit/set_env.sh source /usr/local/Ascend/nnal/atb/set_env.sh source /usr/local/Ascend/atb-models/set_env.sh source /usr/local/Ascend/mindie/set_env.sh export ATB_LLM_HCCL_ENABLE=1 export ATB_LLM_COMM_BACKEND="hccl" export HCCL_CONNECT_TIMEOUT=7200 export WORLD_SIZE=16 export HCCL_EXEC_TIMEOUT=0 export MASTER_IP=192.168.231.228 # 主节点IP export MINDIE_LOG_TO_STDOUT=1 export HCCL_OP_EXPANSION_MODE="AIV" export INF_NAN_MODE_ENABLE=0 export PARALLEL_PARAMS=[4,4,4,4,-1,-1] export RANKTABLEFILE=/app1/scripts/ranktable.json #指定ranktable文件位置 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MIES_CONTAINER_IP=192.168.231.228 # 容器所在宿主机IP,主、从机此处不同 export OMP_NUM_THREADS=1 export NPU_MEMORY_FRACTION=0.95 export MINDIE_LOG_LEVEL=info && export MINDIE_LOG_TO_STDOUT=1 cd /usr/local/Ascend/mindie/latest/mindie-service/ ./bin/mindieservice_daemon &
8、拉起容器:建议先暂停防火墙验证全部配置和流程,待服务正常拉起后再启动防火墙
docker run -itd --privileged --net=host \ --name=DeepSeek-V3-W8A8 \ # 容器名 --shm-size 500g \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device /dev/devmm_svm \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /usr/local/sbin:/usr/local/sbin \ -v /etc/hccn.conf:/etc/hccn.conf \ -v /app1/:/app1/ \ # 映射运行目录,包括start.sh、ranktable.json、权重文件、服务化配置文件config.json -v /app1/scripts/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json \ 映射服务化配置文件 --entrypoint=/bin/bash \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.RC2-800I-A2-py311-openeuler24.03-lts \ -c "sh /app1/scripts/start.sh && /bin/bash" / 容器启动时执行启动脚本
9、验证服务:
curl -H "Accept: application/json"-H "Content-type: application/json" -X POST -d '{"model": "DeepSeek-V3","messages": [{"role": "user", "content": "你是哪个模型?"},{"role": "assistant", "content": "你好"}],"stream": false}' http://192.168.231.230:1025/v1/chat/completions
10、调优:
config.json中的有几个参数控制大模型的输入输出上限,具体见配置参数说明。maxSeqLen >= maxInputTokenLen + maxIterTimes。maxPrefillTokens >= maxInputTokenLen
- maxSeqLen:最大序列长度,32768
- maxInputTokenLen:输入token最大长度,10240
- maxPrefillTokens:必须大于或等于maxInputTokenLen的取值,10240
- maxIterTimes:模型全局最大输出长度,22528
- cpuMemSize: 会影响模型输入token最大长度,经验值:40
浙公网安备 33010602011771号