mindie推理框架及工程化

  华为自己的大模型推理框架,链接:https://www.hiascend.com/document/detail/zh/mindie/,左上角选版本,目前是2.3.0,据说2.2.rc1最稳定

  三种推理方式:镜像、物理机、容器。镜像方式最省,但还不完美,跑完镜像要进去改配置、手工启动服务。物理机和容器都是要手工安装CANN、mindie等组件。

  1、mindie镜像下载地址:https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f

  2、容器启动命令:

docker run -it -d --net=host --shm-size=16g --privileged --restart always --name qwen72b --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro -v /app/model:/model:ro swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.2.RC1-800I-A2-py311-openeuler24.03-lts bash

  3、进入容器:docker exec -it qwen72b bash

  4、环境配置:若服务启动报错:bin/mindieservice_daemon: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory,需配置环境变量

export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/torch/lib/:$LD_LIBRARY_PATH

  5、修改配置:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json

{
    "Version" : "1.0.0",
    "LogConfig" :
    {
        "logLevel" : "Info",
        "logFileSize" : 20,
        "logFileNum" : 20,
        "logPath" : "logs/mindie-server.log"
    },

    "ServerConfig" :
    {
        "ipAddress" : "192.168.68.12",
        "managementIpAddress" : "127.0.0.2",
        "port" : 9000,
        "managementPort" : 1027,
        "metricsPort" : 1028,
        "allowAllZeroIpListening" : false,
        "maxLinkNum" : 1000,
        "httpsEnabled" : false,
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : true,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm"
    },

    "BackendConfig" : {
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : false,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : true,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
            "maxSeqLen" : 32768,
            "maxInputTokenLen" : 16384,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "dsqwen32b",
                    "modelWeightPath" : "/model/deepseek/DeepSeek-R1-Distill-Qwen-32B",
                    "worldSize" : 8,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 16384,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 16384,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

  6、启动服务:

nohup ./bin/mindieservice_daemon > output.log 2>&1 &

  7、测试:

curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"model": "qwen-30b","messages": [{"role": "user", "content": "你是什么模型?"},{"role": "assistant", "content": "你好"}],"stream": false}' http://192.168.231.230:1025/v1/chat/completions

  8、工程化:每次启动需要到容器里面执行命令明显不行。需要将配置文件从外部注入容器,然后自动执行启动脚本。

  • 容器启动时需要注入配置文件和执行脚本
    docker run -it -d --net=host --shm-size=16g --privileged --restart always --name test \
    --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro \
    -v /app2/models:/models -v /app2/scripts/start.sh:/start.sh -v /app2/scripts/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json \
    swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.2.RC1-800I-A2-py311-openeuler24.03-lts /bin/bash -c "/start.sh"
  • 宿主机上存放的配置文件(/app2/scripts/config.json、执行脚本(/app2/scripts/start.sh)调整后可以重启容器达到启动不同参数或模型的目的
  • 配置文件:以单机推理为例,修改allowAllZeroIpListening=true,可以帮动到全0地址,虽然降低了一些安全性。
    {
        "Version" : "1.0.0",
    
        "ServerConfig" :
        {
            "ipAddress" : "0.0.0.0",
            "managementIpAddress" : "0.0.0.0",
            "port" : 1025,
            "managementPort" : 1026,
            "metricsPort" : 1027,
            "allowAllZeroIpListening" : true,
            "maxLinkNum" : 1000,
            "httpsEnabled" : false,
            "fullTextEnabled" : false,
            "tlsCaPath" : "security/ca/",
            "tlsCaFile" : ["ca.pem"],
            "tlsCert" : "security/certs/server.pem",
            "tlsPk" : "security/keys/server.key.pem",
            "tlsPkPwd" : "security/pass/key_pwd.txt",
            "tlsCrlPath" : "security/certs/",
            "tlsCrlFiles" : ["server_crl.pem"],
            "managementTlsCaFile" : ["management_ca.pem"],
            "managementTlsCert" : "security/certs/management/server.pem",
            "managementTlsPk" : "security/keys/management/server.key.pem",
            "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
            "managementTlsCrlPath" : "security/management/certs/",
            "managementTlsCrlFiles" : ["server_crl.pem"],
            "kmcKsfMaster" : "tools/pmt/master/ksfa",
            "kmcKsfStandby" : "tools/pmt/standby/ksfb",
            "inferMode" : "standard",
            "interCommTLSEnabled" : true,
            "interCommPort" : 1121,
            "interCommTlsCaPath" : "security/grpc/ca/",
            "interCommTlsCaFiles" : ["ca.pem"],
            "interCommTlsCert" : "security/grpc/certs/server.pem",
            "interCommPk" : "security/grpc/keys/server.key.pem",
            "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
            "interCommTlsCrlPath" : "security/grpc/certs/",
            "interCommTlsCrlFiles" : ["server_crl.pem"],
            "openAiSupport" : "vllm",
            "tokenTimeout" : 600,
            "e2eTimeout" : 600,
            "distDPServerEnabled":false
        },
    
        "BackendConfig" : {
            "backendName" : "mindieservice_llm_engine",
            "modelInstanceNumber" : 1,
            "npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
            "tokenizerProcessNumber" : 8,
            "multiNodesInferEnabled" : false,
            "multiNodesInferPort" : 1120,
            "interNodeTLSEnabled" : true,
            "interNodeTlsCaPath" : "security/grpc/ca/",
            "interNodeTlsCaFiles" : ["ca.pem"],
            "interNodeTlsCert" : "security/grpc/certs/server.pem",
            "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
            "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
            "interNodeTlsCrlPath" : "security/grpc/certs/",
            "interNodeTlsCrlFiles" : ["server_crl.pem"],
            "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
            "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
            "kvPoolConfig" : {"backend":"", "configPath":""},
            "ModelDeployConfig" :
            {
                "maxSeqLen" : 6400,
                "maxInputTokenLen" : 6000,
                "truncation" : true,
                "ModelConfig" : [
                    {
                        "modelInstanceType" : "Standard",
                        "modelName" : "im-30b",
                        "modelWeightPath" : "/models/Qwen3-30B-A3B-Instruct-2507",
                        "worldSize" : 8,
                        "cpuMemSize" : 0,
                        "npuMemSize" : -1,
                        "backendType" : "atb",
                        "trustRemoteCode" : false,
                        "async_scheduler_wait_time": 120,
                        "kv_trans_timeout": 10,
                        "kv_link_timeout": 1080
                    }
                ]
            },
    
            "ScheduleConfig" :
            {
                "templateType" : "Standard",
                "templateName" : "Standard_LLM",
                "cacheBlockSize" : 128,
    
                "maxPrefillBatchSize" : 50,
                "maxPrefillTokens" : 6000,
                "prefillTimeMsPerReq" : 150,
                "prefillPolicyType" : 0,
    
                "decodeTimeMsPerReq" : 50,
                "decodePolicyType" : 0,
    
                "maxBatchSize" : 100,
                "maxIterTimes" : 400,
                "maxPreemptCount" : 0,
                "supportSelectBatch" : false,
                "maxQueueDelayMicroseconds" : 5000,
                "maxFirstTokenWaitTime": 2500
            }
        },
    
        "LogConfig": {
            "dynamicLogLevel" : "",
            "dynamicLogLevelValidHours" : 2,
            "dynamicLogLevelValidTime" : ""
        }
    }

     

  • 执行脚本:解决了重启容器能自动执行start.sh脚本问题。
    #!/bin/bash
    set -euo pipefail
    export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/torch/lib/:$LD_LIBRARY_PATH
    # 定义模型服务管控指标查询接口为普罗格式,端口在config.json中的metricsPort属性定义。
    export MIES_SERVICE_MONITOR_MODE=1
    cd $MIES_INSTALL_PATH || exit 1

    echo "[$(date)] 停止旧的mindieservice_daemon进程..."
    pkill -9 -f mindie || true
    sleep 5

    echo "[$(date)] 启动mindieservice_daemon..."
    nohup ./bin/mindieservice_daemon > output.log 2>&1

    # 监控服务进程,进程退出则脚本退出(核心:让脚本随服务进程存活)
    echo "[$(date)] 监控mindieservice_daemon进程..."
    while pgrep -f mindieservice_daemon > /dev/null; do
        sleep 1
    done
    echo "[$(date)] mindieservice_daemon进程退出,脚本退出"
    exit 1
  • 测试:
    # 模型访问
    curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"model": "im-30b","messages": [{"role": "user", "content": "你是什么模型?"},{"role": "assistant", "content": "你好"}],"stream": false}' http://192.168.68.16:8001/v1/chat/completions
    # 监控访问(普罗格式)
    curl http://192.168.68.16:1027/metrics

     

   9、调优

 

posted @ 2026-03-06 14:44  badwood  阅读(125)  评论(0)    收藏  举报
Badwood's Blog