Pandas中列表数据保存读取问题
Pandas中,我们通常将数据保存为csv格式,csv本质上就是将数据通过分隔符保存的字符串,这就导致数据在保存为csv格式的过程中,数据本身的数据类型可能会发生改变,比如某些列表形式的数据就存储为字符串形似数据。因而,当我们再次打开csv文件时,之前的列表格式的数据现在将显示为字符串,从而导致后续处理时出现问题。因此需要将字符串形式的列表重新转化为列表格式。
代码示例:
import pandas as pd
df1 = pd.DataFrame({'URL_domains':[['wa.me','t.co','goo.gl','fb.com'],['tinyurl.com','bit.ly'],['test.in']]})
print(df1)
df1['len_of_url_list'] = df1['URL_domains'].map(len)
print(df1)
df1.to_csv('test.csv', encoding='utf8', index=False)
output:
URL_domains
0 [wa.me, t.co, goo.gl, fb.com]
1 [tinyurl.com, bit.ly]
2 [test.in]
URL_domains len_of_url_list
0 [wa.me, t.co, goo.gl, fb.com] 4
1 [tinyurl.com, bit.ly] 2
2 [test.in] 1
打开保存的数据
import pandas as pd
import ast
df2 = pd.read_csv('test.csv')
print(df2)
df2['len_url_reopen'] = df2['URL_domains'].map(len)
print(df2)
import ast
df2['len_url_transform'] = df2['URL_domains'].map(ast.literal_eval).map(len)
print(df2)
output:
URL_domains len_url
0 ['wa.me', 't.co', 'goo.gl', 'fb.com'] 4
1 ['tinyurl.com', 'bit.ly'] 2
2 ['test.in'] 1
URL_domains len_url len_url_reopen
0 ['wa.me', 't.co', 'goo.gl', 'fb.com'] 4 37
1 ['tinyurl.com', 'bit.ly'] 2 25
2 ['test.in'] 1 11
URL_domains len_url len_url_reopen len_url_transform
0 ['wa.me', 't.co', 'goo.gl', 'fb.com'] 4 37 4
1 ['tinyurl.com', 'bit.ly'] 2 25 2
2 ['test.in'] 1 11 1
浙公网安备 33010602011771号