dataset format of benchmarks

note: the datasets are classified into two types, generative(the answer is natural language, the length and content are not in a fixed format) and selection(such as selecting an answer from A B C D).