直接消除重复数据
1 #直接消除 2 import pandas as pd 3 students = pd.read_excel('Students_Duplicates.xlsx') 4 students.drop_duplicates(subset='Name',inplace = True) 5 print(students)
消除前面重复的或者后面重复的数据
1 # 消除重复数据 2 import pandas as pd 3 students = pd.read_excel('Students_Duplicates.xlsx') 4 # keep first or last 保留第一个或者最后一个 5 students.drop_duplicates(subset='Name',inplace = True,keep='last') 6 print(students)
找出哪些是重复的数据
1 # 提示哪些是重复的 2 import pandas as pd 3 students = pd.read_excel('Students_Duplicates.xlsx') 4 dupe =students.duplicated(subset='Name') 5 # print(type(dupe)) 6 dupe = dupe[dupe] 7 # print(dupe.index) 8 print(students.iloc[dupe.index])
本人的文档都是自我记录,以便日后查看。
浙公网安备 33010602011771号