mindie推理框架
华为自己的大模型推理框架,链接:https://www.hiascend.com/document/detail/zh/mindie/,左上角选版本,目前是2.3.0,据说2.2.rc1最稳定
三种推理方式:镜像、物理机、容器。镜像方式最省,但还不完美,跑完镜像要进去改配置、手工启动服务。物理机和容器都是要手工安装CANN、mindie等组件。
mindie镜像下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f
容器启动命令:
docker run -it -d --net=host --shm-size=16g --privileged --restart always --name qwen72b --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro -v /app/model:/model:ro swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.2.RC1-800I-A2-py311-openeuler24.03-lts bash
进入容器:docker exec -it qwen72b bash
环境配置:若服务启动报错:bin/mindieservice_daemon: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory,需配置环境变量
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/torch/lib/:$LD_LIBRARY_PATH
修改配置:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{ "Version" : "1.0.0", "LogConfig" : { "logLevel" : "Info", "logFileSize" : 20, "logFileNum" : 20, "logPath" : "logs/mindie-server.log" }, "ServerConfig" : { "ipAddress" : "192.168.68.12", "managementIpAddress" : "127.0.0.2", "port" : 9000, "managementPort" : 1027, "metricsPort" : 1028, "allowAllZeroIpListening" : false, "maxLinkNum" : 1000, "httpsEnabled" : false, "fullTextEnabled" : false, "tlsCaPath" : "security/ca/", "tlsCaFile" : ["ca.pem"], "tlsCert" : "security/certs/server.pem", "tlsPk" : "security/keys/server.key.pem", "tlsPkPwd" : "security/pass/key_pwd.txt", "tlsCrlPath" : "security/certs/", "tlsCrlFiles" : ["server_crl.pem"], "managementTlsCaFile" : ["management_ca.pem"], "managementTlsCert" : "security/certs/management/server.pem", "managementTlsPk" : "security/keys/management/server.key.pem", "managementTlsPkPwd" : "security/pass/management/key_pwd.txt", "managementTlsCrlPath" : "security/management/certs/", "managementTlsCrlFiles" : ["server_crl.pem"], "kmcKsfMaster" : "tools/pmt/master/ksfa", "kmcKsfStandby" : "tools/pmt/standby/ksfb", "inferMode" : "standard", "interCommTLSEnabled" : true, "interCommPort" : 1121, "interCommTlsCaPath" : "security/grpc/ca/", "interCommTlsCaFiles" : ["ca.pem"], "interCommTlsCert" : "security/grpc/certs/server.pem", "interCommPk" : "security/grpc/keys/server.key.pem", "interCommPkPwd" : "security/grpc/pass/key_pwd.txt", "interCommTlsCrlPath" : "security/grpc/certs/", "interCommTlsCrlFiles" : ["server_crl.pem"], "openAiSupport" : "vllm" }, "BackendConfig" : { "backendName" : "mindieservice_llm_engine", "modelInstanceNumber" : 1, "npuDeviceIds" : [[0,1,2,3,4,5,6,7]], "tokenizerProcessNumber" : 8, "multiNodesInferEnabled" : false, "multiNodesInferPort" : 1120, "interNodeTLSEnabled" : true, "interNodeTlsCaPath" : "security/grpc/ca/", "interNodeTlsCaFiles" : ["ca.pem"], "interNodeTlsCert" : "security/grpc/certs/server.pem", "interNodeTlsPk" : "security/grpc/keys/server.key.pem", "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt", "interNodeTlsCrlPath" : "security/grpc/certs/", "interNodeTlsCrlFiles" : ["server_crl.pem"], "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa", "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb", "ModelDeployConfig" : { "maxSeqLen" : 32768, "maxInputTokenLen" : 16384, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "dsqwen32b", "modelWeightPath" : "/model/deepseek/DeepSeek-R1-Distill-Qwen-32B", "worldSize" : 8, "cpuMemSize" : 5, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false } ] }, "ScheduleConfig" : { "templateType" : "Standard", "templateName" : "Standard_LLM", "cacheBlockSize" : 128, "maxPrefillBatchSize" : 50, "maxPrefillTokens" : 16384, "prefillTimeMsPerReq" : 150, "prefillPolicyType" : 0, "decodeTimeMsPerReq" : 50, "decodePolicyType" : 0, "maxBatchSize" : 200, "maxIterTimes" : 16384, "maxPreemptCount" : 0, "supportSelectBatch" : false, "maxQueueDelayMicroseconds" : 5000 } } }
启动服务:
nohup ./bin/mindieservice_daemon > output.log 2>&1 &
测试:
curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"model": "qwen-30b","messages": [{"role": "user", "content": "你是什么模型?"},{"role": "assistant", "content": "你好"}],"stream": false}' http://192.168.231.230:1025/v1/chat/completions
工程化:每次启动需要到容器里面执行命令明显不行。需要将配置文件从外部注入容器,然后自动执行启动脚本。
- 容器启动时需要注入配置文件和执行脚本
docker run -it -d --net=host --shm-size=16g --privileged --restart always --name test \
--device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro \
-v /app2/models:/models -v /app2/scripts/start.sh:/start.sh -v /app2/scripts/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.2.RC1-800I-A2-py311-openeuler24.03-lts /bin/bash -c "/start.sh" - 宿主机上存放的配置文件(/app2/scripts/config.json、执行脚本(/app2/scripts/start.sh)调整后可以重启容器达到启动不同参数或模型的目的
- 执行脚本:解决了重启容器能自动执行start.sh脚本问题。
#!/bin/bash
set -euo pipefail
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/torch/lib/:$LD_LIBRARY_PATH
cd $MIES_INSTALL_PATH || exit 1
echo "[$(date)] 停止旧的mindieservice_daemon进程..."
pkill -9 -f mindie || true
sleep 5
echo "[$(date)] 启动mindieservice_daemon..."
nohup ./bin/mindieservice_daemon > output.log 2>&1
# 监控服务进程,进程退出则脚本退出(核心:让脚本随服务进程存活)
echo "[$(date)] 监控mindieservice_daemon进程..."
while pgrep -f mindieservice_daemon > /dev/null; do
sleep 1
done
echo "[$(date)] mindieservice_daemon进程退出,脚本退出"
exit 1
浙公网安备 33010602011771号