humaneval benchmark

use code-eval

command

git clone https://github.com/abacaj/code-eval.git
cd code-eval
conda create -n human_eval python=3.10
conda activate human_eval
pip install -r requirements.txt

python eval_llama.py
evaluate_functional_correctness ./results/llama/eval.jsonl

the result
7B llama2

13B llama2

mistral7B

mixtral?

posted @ 2024-01-09 04:12 Daze_Lu 阅读(68) 评论(0) 收藏举报

刷新页面返回顶部

humaneval benchmark

use code-eval

公告