Pandas中列表数据保存读取问题

Pandas中,我们通常将数据保存为csv格式,csv本质上就是将数据通过分隔符保存的字符串,这就导致数据在保存为csv格式的过程中,数据本身的数据类型可能会发生改变,比如某些列表形式的数据就存储为字符串形似数据。因而,当我们再次打开csv文件时,之前的列表格式的数据现在将显示为字符串,从而导致后续处理时出现问题。因此需要将字符串形式的列表重新转化为列表格式。
代码示例:

import pandas as pd 

df1 = pd.DataFrame({'URL_domains':[['wa.me','t.co','goo.gl','fb.com'],['tinyurl.com','bit.ly'],['test.in']]})
print(df1)
df1['len_of_url_list'] = df1['URL_domains'].map(len)
print(df1)
df1.to_csv('test.csv', encoding='utf8', index=False)

output:

                     URL_domains
0  [wa.me, t.co, goo.gl, fb.com]
1          [tinyurl.com, bit.ly]
2                      [test.in]
                     URL_domains  len_of_url_list
0  [wa.me, t.co, goo.gl, fb.com]                4
1          [tinyurl.com, bit.ly]                2
2                      [test.in]                1

打开保存的数据

import pandas as pd 
import ast
df2 = pd.read_csv('test.csv')
print(df2)
df2['len_url_reopen'] = df2['URL_domains'].map(len)
print(df2)
import ast
df2['len_url_transform'] = df2['URL_domains'].map(ast.literal_eval).map(len)
print(df2)

output:

                             URL_domains  len_url
0  ['wa.me', 't.co', 'goo.gl', 'fb.com']        4
1              ['tinyurl.com', 'bit.ly']        2
2                            ['test.in']        1
                             URL_domains  len_url  len_url_reopen
0  ['wa.me', 't.co', 'goo.gl', 'fb.com']        4              37
1              ['tinyurl.com', 'bit.ly']        2              25
2                            ['test.in']        1              11
                             URL_domains  len_url  len_url_reopen  len_url_transform
0  ['wa.me', 't.co', 'goo.gl', 'fb.com']        4              37   4
1              ['tinyurl.com', 'bit.ly']        2              25   2
2                            ['test.in']        1              11   1
posted @ 2022-06-08 10:19  EconCoder  阅读(12)  评论(0)    收藏  举报