humaneval benchmark

use code-eval

command
git clone https://github.com/abacaj/code-eval.git
cd code-eval
conda create -n human_eval python=3.10
conda activate human_eval
pip install -r requirements.txt

python eval_llama.py
evaluate_functional_correctness ./results/llama/eval.jsonl

the result
7B llama2
image

13B llama2

image

mistral7B
image

mixtral?
image

posted @ 2024-01-09 04:12  Daze_Lu  阅读(68)  评论(0)    收藏  举报