直接消除重复数据

1 #直接消除
2 import pandas as pd
3 students = pd.read_excel('Students_Duplicates.xlsx')
4 students.drop_duplicates(subset='Name',inplace = True)
5 print(students)

消除前面重复的或者后面重复的数据

1 # 消除重复数据
2 import pandas as pd
3 students = pd.read_excel('Students_Duplicates.xlsx')
4 # keep first or last 保留第一个或者最后一个
5 students.drop_duplicates(subset='Name',inplace = True,keep='last')
6 print(students)

找出哪些是重复的数据

1 # 提示哪些是重复的
2 import pandas as pd
3 students = pd.read_excel('Students_Duplicates.xlsx')
4 dupe =students.duplicated(subset='Name')
5 # print(type(dupe))
6 dupe = dupe[dupe]
7 # print(dupe.index)
8 print(students.iloc[dupe.index])

 

posted on 2019-02-21 18:00  Canvas2018  阅读(124)  评论(0)    收藏  举报