pandas.DataFrame.drop_duplicates的使用介绍
参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)
这个方法默认是去除每一行中的重复行,可以指定特定的去重的columns参数位subset。
keep{‘first’, ‘last’, False}, default ‘first’
Determines which duplicates (if any) to keep. - first : Drop duplicates except for the first occurrence. - last : Drop duplicates except for the last occurrence. - False : Drop all duplicates.
keep ,可以让你选择去重以后需要选择留下的内容,first为第一次出现的索引,last为最后一次出现的索引,Fasle为放弃所有的重复行
inplace就不介绍了。
ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
New in version 1.0.0.
这个是是否重复调整索引
上官方demo
In [8]: df
Out[8]:
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
In [9]: df.drop_duplicates()
Out[9]:
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
In [10]: df.drop_duplicates(ignore_index=True)
Out[10]:
brand style rating
0 Yum Yum cup 4.0
1 Indomie cup 3.5
2 Indomie pack 15.0
3 Indomie pack 5.0
In [11]: df.drop_duplicates(keep='last')
Out[11]:
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
In [12]: df.drop_duplicates(keep=False)
Out[12]:
brand style rating
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
浙公网安备 33010602011771号