10月4日总结
类别型(Categoricals)
Pandas 的 DataFrame 里可以包含类别数据。完整文档详见类别简介 (opens new window)和 API 文档 (opens new window)。
In [127]: df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
.....: "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']})
.....:
将 grade 的原生数据转换为类别型数据:
In [128]: df["grade"] = df["raw_grade"].astype("category")
In [129]: df["grade"]
Out[129]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
用有含义的名字重命名不同类型,调用 Series.cat.categories。
In [130]: df["grade"].cat.categories = ["very good", "good", "very bad"]
重新排序各类别,并添加缺失类,Series.cat 的方法默认返回新 Series。
In [131]: df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium",
.....: "good", "very good"])
.....:
In [132]: df["grade"]
Out[132]:
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]

浙公网安备 33010602011771号