10月4日总结

类别型(Categoricals)

Pandas 的 DataFrame 里可以包含类别数据。完整文档详见类别简介 (opens new window)和 API 文档 (opens new window)

In [127]: df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
   .....:                    "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']})
   .....: 

将 grade 的原生数据转换为类别型数据:

In [128]: df["grade"] = df["raw_grade"].astype("category")

In [129]: df["grade"]
Out[129]: 
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

用有含义的名字重命名不同类型,调用 Series.cat.categories

In [130]: df["grade"].cat.categories = ["very good", "good", "very bad"]

重新排序各类别,并添加缺失类,Series.cat 的方法默认返回新 Series

In [131]: df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium",
   .....:                                               "good", "very good"])
   .....: 

In [132]: df["grade"]
Out[132]: 
0    very good
1         good
2         good
3    very good
4    very good
5     very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
posted @ 2021-10-04 21:07  不详·Christina  阅读(27)  评论(0)    收藏  举报