Merge是将不同的表按key关联到一个表:https://zhuanlan.zhihu.com/p/392717632

pd.merge(left,   # 待合并的2个数据框
         right, 
         how='inner',  # ‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’
         on=None, # 连接的键,默认是相同的键
         left_on=None,  # 指定不同的连接字段:键不同,但是键的取值有相同的内容
         right_on=None, 
         left_index=False,   # 根据索引来连接
         right_index=False, 
         sort=False, # 是否排序
         suffixes=('_x', '_y'),   # 改变后缀
         copy=True, 
         indicator=False,   # 显示字段来源
         validate=None)
参数解释:
left、right:待合并的数据帧
how:合并的方式,有5种:{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, 默认是 ‘inner’
1、 left:左连接,保留left的全部数据;right类似;类比于SQL的left join 或者right join
2、outer:全连接功能,类似SQL的full outer join
3、inner:交叉连接,类比于SQL的inner join
4、cross:创建两个数据帧DataFrame的笛卡尔积,默认保留左边的顺序
on:连接的列属性;默认是两个DataFrame的相同字段
left_on/right_on:指定两个不同的键进行联结
left_index、right_index:通过索引进行合并
suffixes:指定我们自己想要的后缀
indictor:显示字段的来源

一、实列

先生成4个表

import pandas as pd
df1=pd.DataFrame({'userid':list('abcd'),'age':[10,11,12,13]})
df2=pd.DataFrame({'userid':list('aca'),'score':[100,200,300]})
df3=pd.DataFrame({'userid':list('ae'),'score':[400,600]})
df4=pd.DataFrame({'userid1':list('ac'),'score':[800,900]})
df1 useridage
0 a 10
1 b 11
2 c 12
3 d 13
df2 useridscore
0 a 100
1 c 200
2 a 300
 df3useridscore
0 a 400
1 e 600
df4 userid1score
0 a 800
1 c 900

 

1、默认合并:

pd.merge(df1,df2):

  • 两个数据框df1(left)、df2(right)有相同的字段userid
  • 默认是通过相同的字段(键)进行关联,取出键中相同的值(ac),而且每个键的记录要全部显示,比如a有多条记录
 useridagescore
0 a 10 100
1 a 10 300
2 c 12 200

2、inner:(默认是内联)

pd.merge(df1,df2,how='inner')

inner称之为内连接。它会直接根据相同的列属性userid进行关联,取出属性下面相同的数据信息

 useridagescore
0 a 10 100
1 a 10 300
2 c 12 200

3、outer:外联

pd.merge(df1,df2,how='outer')

outer称之为外连接,在拼接的过程中会取两个数据框中键的并集进行拼接

  • 外连接,取出全部交集键的并集。例子中是user的并集
  • 如果某个键在某个数据框中不存在数据,则为NaN
 useridagescore
0 a 10 100.0
1 a 10 300.0
2 b 11 NaN
3 c 12 200.0
4 d 13 NaN

再看一个外联

pd.merge(df1,df3,how='outer')

 useridagescore
0 a 10.0 400.0
1 b 11.0 NaN
2 c 12.0 NaN
3 d 13.0 NaN
4 e NaN 600.0

 

4、left:以左边数据框中的键为基准;如果左边存在但是右边不存在,则右边用NaN表示

pd.merge(df1,df3,how='left')

 useridagescore
0 a 10 400.0
1 b 11 NaN
2 c 12 NaN
3 d 13 NaN

再看一个left

pd.merge(df1,df2,how='left')

 useridagescore
0 a 10 100.0
1 a 10 300.0
2 b 11 NaN
3 c 12 200.0
4 d 13 NaN

5、right:以右边数据框中的键的取值为基准;如果右边存在但是左边不存在,则左边用NaN表示

pd.merge(df1,df3,how='right')

 useridagescore
0 a 10.0 400
1 e NaN 600

再看一个right

pd.merge(df1,df2,how='right')

 useridagescore
0 a 10 100
1 a 10 300
2 c 12 200

 6、left_on和right_on:当要合并的表没有相同的的键,但这些键的取值部分有相同,则要指明左边和右面的表的键

df_rating=pd.read_csv('./ratings.csv',sep=',',engine='python',header =0,names='UserId,MoveId,Rating,Timestamp'.split(','))
df_move=pd.read_csv('./movies.csv',sep=',',engine='python',header=0,names='moveId,title,genres'.split(','))
df_rat_move=pd.merge(df_rating,df_move,left_on='MoveId',right_on='moveId',how='inne
 UserIdMoveIdRatingTimestamp
0 1 1 4.0 964982703
1 1 3 4.0 964981247
2 1 6 4.0 964982224
3 1 47 5.0 964983815
4 1 50 5.0 964982931
... ... ... ... ...
100831 610 166534 4.0 1493848402
100832 610 168248 5.0 1493850091
100833 610 168250 5.0 1494273047
100834 610 168252 5.0 1493846352
100835 610 170875 3.0 1493846415

 

 moveIdtitlegenres
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
... ... ... ...
9737 193581 Black Butler: Book of the Atlantic (2017) Action|Animation|Comedy|Fantasy
9738 193583 No Game No Life: Zero (2017) Animation|Comedy|Fantasy
9739 193585 Flint (2017) Drama
9740 193587 Bungo Stray Dogs: Dead Apple (2018) Action|Animation
9741 193609 Andrew Dice Clay: Dice Rules (1991) Comedy

 

 UserIdMoveIdRatingTimestampmoveIdtitlegenres
0 1 1 4.0 964982703 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 5 1 4.0 847434962 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
2 7 1 4.5 1106635946 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
3 15 1 2.5 1510577970 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
4 17 1 4.5 1305696483 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
... ... ... ... ... ... ... ...
100831 610 160341 2.5 1479545749 160341 Bloodmoon (1997) Action|Thriller
100832 610 160527 4.5 1479544998 160527 Sympathy for the Underdog (1971) Action|Crime|Drama
100833 610 160836 3.0 1493844794 160836 Hazard (2005) Action|Drama|Thriller
100834 610 163937 3.5 1493848789 163937 Blair Witch (2016) Horror|Thriller
100835 610 163981 3.5 1493850155 163981 31 (2016) Horror