dataset format of benchmarks
note: the datasets are classified into two types, generative(the answer is natural language, the length and content are not in a fixed format) and selection(such as selecting an answer from A B C D).
mmlu

triviaqa

gsm8k

human eval

bbh

hellaswag

NQ(natural question)

MBPP

PIQA

SIQA

ARC

winogrande

openbookQA

commonsense_qa

squad

quac

boolq

浙公网安备 33010602011771号