一 前言

pandas数据拼接有可能会用到，比如出现重复数据，需要合并两份数据的交集，并集就是个不错的选择，知识追寻者本着技多不压身的态度蛮学习了一下下；

二 数据拼接

2.1 join()联结

join操作能将 2 个DataFrame 合并为一块，前提是DataFrame 之间的列没有重复

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

data1 = {
'user' : ['zszxz','craler','rose'],
'price' : [100, 200, 300],
}
index1 = ['user1','user2','user3']
frame1  = pd.DataFrame(data1,index1)

data2 = {
'person' : ['zszxz','craler','rose'],
'number' : [100, 2000, 3000],
'activity' : ['swing','riding','climbing']
}
index2 = ['user1','user2','user3']
frame2  = pd.DataFrame(data2,index2)

join = frame1.join(frame2)
print(join)


         user  price    hobby  person  number  activity
user1   zszxz    100  reading   zszxz     100     swing
user2  craler    200  running  craler    2000    riding
user3    rose    300   hiking    rose    3000  climbing


2.2 concat()拼接

ser1 = pd.Series(['111','222',np.NaN])
ser2 = pd.Series(['333','444',np.NaN])
# 默认按行拼接
print(pd.concat([ser1, ser2]))


ser1 = pd.Series(['111','222',np.NaN])
ser2 = pd.Series(['333','444',np.NaN])
# 按列拼接
print(pd.concat([ser1, ser2],axis=1))


     0    1
0  111  333
1  222  444
2  NaN  NaN


ser1 = pd.Series(['111','222',np.NaN])
ser2 = pd.Series(['333','444',np.NaN])
# 按列拼接
data = pd.concat([ser1, ser2],axis=1, keys=['zszxz', 'rzxx'])
print(data)


  zszxz rzxx
0   111  333
1   222  444
2   NaN  NaN


2.3 combine_first()组合

ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3])
ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4])
data = ser1.combine_first(ser2)
print(data)


1    111
2    222
3    NaN
4    555
dtype: object


ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3])
ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4])
data = ser2.combine_first(ser1)
print(data)


1    333
2    444
3    NaN
4    555
dtype: object


2.4 轴转换

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

data = {
'user' : ['zszxz','craler','rose'],
'price' : [100, 200, 300],
}
index = ['user1','user2','user3']
frame  = pd.DataFrame(data,index)
print(frame)


         user  price    hobby
user2  craler    200  running
user3    rose    300   hiking

• stack() 将 列转为行；
# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

data = {
'user' : ['zszxz','craler','rose'],
'price' : [100, 200, 300],
}
index = ['user1','user2','user3']
frame  = pd.DataFrame(data,index)
print(frame.stack())


user1  user       zszxz
price        100
user2  user      craler
price        200
hobby    running
user3  user        rose
price        300
hobby     hiking
dtype: object

• 使用 unstack()将 数据结构重新返回
# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

data = {
'user' : ['zszxz','craler','rose'],
'price' : [100, 200, 300],
}
index = ['user1','user2','user3']
frame  = pd.DataFrame(data,index)
sta = frame.stack()
print(sta.unstack())


         user price    hobby