【820】Python R 读取 csv 文件加入数据类型控制
参考:PySe-023-pandas.read_csv 读取 csv 文件,指定列数据类型 解决字符串数据列变为数字的问题
参考:Read a delimited file (including CSV and TSV) into a tibble
Python:根据具体的列名指定数据格式
import pandas as pd
# the column of "id" will be stored as "string", otherwise it will be stored as "int", maybe
pd.read_csv("df.csv", dtype={"id": str})
R:用缩写代替具体的列的属性
df <- readr::read_csv("df_eu_sim.csv", col_types = "Ddc")
具体如下:
col_types
One of NULL
, a cols()
specification, or a string. See vignette("readr")
for more details.
If NULL
, all column types will be inferred from guess_max
rows of the input, interspersed throughout the file. This is convenient (and fast), but not robust. If the guessed types are wrong, you'll need to increase guess_max
or supply the correct types yourself.
Column specifications created by list()
or cols()
must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only()
.
Alternatively, you can use a compact string representation where each character represents one column:
-
c = character
-
i = integer
-
n = number
-
d = double
-
l = logical
-
f = factor
-
D = date
-
T = date time
-
t = time
-
? = guess
-
_ or - = skip