alex_bn_lee

导航

【820】Python R 读取 csv 文件加入数据类型控制

参考:PySe-023-pandas.read_csv 读取 csv 文件,指定列数据类型 解决字符串数据列变为数字的问题

参考:Read a delimited file (including CSV and TSV) into a tibble


Python:根据具体的列名指定数据格式

import pandas as pd
# the column of "id" will be stored as "string", otherwise it will be stored as "int", maybe
pd.read_csv("df.csv", dtype={"id": str})

R:用缩写代替具体的列的属性

df <- readr::read_csv("df_eu_sim.csv", col_types = "Ddc") 

具体如下:

col_types

One of NULL, a cols() specification, or a string. See vignette("readr") for more details.

If NULL, all column types will be inferred from guess_max rows of the input, interspersed throughout the file. This is convenient (and fast), but not robust. If the guessed types are wrong, you'll need to increase guess_max or supply the correct types yourself.

Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

 

posted on 2023-03-13 17:53  McDelfino  阅读(86)  评论(0)    收藏  举报