mistral & mixtral set up & benchmark
use text-generation-inference to set up
run command
click to view command
docker run --gpus all --shm-size 1g -p 3000:80 -v /data:/data ghcr.io/huggingface/text-generation-inference:1.3.3 \
--model-id mistralai/Mixtral-8x7B-Instruct-v0.1 \
--num-shard 4 \
--max-batch-total-tokens 1024000 \
--max-total-tokens 32000
to access server Docker, using this command to build a bridge
ssh -f -N -L 3000:localhost:3000 ludaze@10.96.15.227
then run
click to view code
curl 127.0.0.1:3000/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-H 'Content-Type: application/json'
benchmark
MT bench
click here to view the code
git clone https://github.com/lm-sys/FastChat.git
conda create -n mt_bench python=3.10
conda activate mt_bench
cd FastChat
pip install -e ".[model_worker,llm_judge]"
cd ~/Docker/Llama/BenchMark/FastChat/fastchat/llm_judge
python gen_model_answer.py --model-path /home/ludaze/Docker/Llama/Mixtral/Mixtral-8x7B-v0.1 --model-id Mixtral-8x7B-v0.1 --num-gpus-per-model 4 --num-gpus-total 4
浙公网安备 33010602011771号