mistral & mixtral set up & benchmark

use text-generation-inference to set up

run command

click to view command
docker run --gpus all --shm-size 1g -p 3000:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:1.3.3 \
    --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 \
    --num-shard 4 \
    --max-batch-total-tokens 1024000 \
    --max-total-tokens 32000

to access server Docker, using this command to build a bridge
ssh -f -N -L 3000:localhost:3000 ludaze@10.96.15.227

then run

click to view code
curl 127.0.0.1:3000/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

benchmark

MT bench

click here to view the code
git clone https://github.com/lm-sys/FastChat.git
conda create -n mt_bench python=3.10
conda activate mt_bench
cd FastChat
pip install -e ".[model_worker,llm_judge]"
cd ~/Docker/Llama/BenchMark/FastChat/fastchat/llm_judge
python gen_model_answer.py --model-path /home/ludaze/Docker/Llama/Mixtral/Mixtral-8x7B-v0.1 --model-id Mixtral-8x7B-v0.1 --num-gpus-per-model 4 --num-gpus-total 4
posted @ 2023-12-30 08:27  Daze_Lu  阅读(47)  评论(0)    收藏  举报