Pandas之使用groupby对某列进行分组合成
1. 对数值形式的字段进行分组聚合
```
import numpy as np
import pandas as pd
gender=['男','女']
skin=['white','black','yellow']
date=['2020-01-02','2020-01-10','2020-01-15','2020-01-20','2020-01-31']
data=pd.DataFrame({
'height':np.random.randint(150,170,50),
'gender':[gender[x] for x in np.random.randint(0,2,50)],
'age':np.random.randint(15,90,50),
'skin':[skin[x] for x in np.random.randint(0,len(skin),50)],
'date':[date[x] for x in np.random.randint(0, len(date), 50)],
})
print(data.head())
group_result = data.groupby(by=['gender', 'skin'], as_index=False)['age'].max()
print(group_result)
```
2. 将某列分组合并为列表
```
import pandas as pd
import numpy as np
data = pd.DataFrame({'column1': ['key1', 'key1', 'key2', 'key2'],
'column2': ['value1', 'value2', 'value3', 'value3']})
print(data)
data_dict = data.groupby('column1').column2.apply(list)
print(data_dict)
```
输出结果:
column1 column2
0 key1 value1
1 key1 value2
2 key2 value3
3 key2 value3
column1
key1 [value1, value2]
key2 [value3, value3]
3. 将某列的字符串进行分组拼接
```
df = pd.DataFrame({'id': [10001, 10001, 10002, 10002, 10002],
'skill': ['python', 'java', 'python', 'C++', 'java']})
print(df)
data = df.groupby('id')['skill'].apply(lambda x: x.str.cat(sep=',')).reset_index()
print(data)
```
数据结果:
id skill
0 10001 python
1 10001 java
2 10002 python
3 10002 C++
4 10002 java
id skill
0 10001 python,java
1 10002 python,C++,java
浙公网安备 33010602011771号