dataset format of benchmarks

note: the datasets are classified into two types, generative(the answer is natural language, the length and content are not in a fixed format) and selection(such as selecting an answer from A B C D).

 

 

mmlu

image

triviaqa

image

gsm8k

image

human eval

image

bbh

image

hellaswag

image

NQ(natural question)

image

MBPP

image

PIQA

image

SIQA

image

ARC

image

winogrande

image

openbookQA

image

commonsense_qa

image

squad

image

quac

image

boolq

image

posted @ 2024-01-02 11:51  Daze_Lu  阅读(24)  评论(0)    收藏  举报