Pandas之使用groupby对某列进行分组合成

1. 对数值形式的字段进行分组聚合

```
import numpy as np
import pandas as pd

gender=['男','女']
skin=['white','black','yellow']
date=['2020-01-02','2020-01-10','2020-01-15','2020-01-20','2020-01-31']

data=pd.DataFrame({
    'height':np.random.randint(150,170,50),
    'gender':[gender[x] for x in np.random.randint(0,2,50)],
    'age':np.random.randint(15,90,50),
    'skin':[skin[x] for x in np.random.randint(0,len(skin),50)],
    'date':[date[x] for x in np.random.randint(0, len(date), 50)],
})
print(data.head())
group_result = data.groupby(by=['gender', 'skin'], as_index=False)['age'].max()
print(group_result)
```

2. 将某列分组合并为列表

```
import pandas as pd 
import numpy as np 

data = pd.DataFrame({'column1': ['key1', 'key1', 'key2', 'key2'],
                    'column2': ['value1', 'value2', 'value3', 'value3']})
print(data)

data_dict = data.groupby('column1').column2.apply(list)
print(data_dict)
```

输出结果:

column1 column2
0 key1 value1
1 key1 value2
2 key2 value3
3 key2 value3

column1
key1 [value1, value2]
key2 [value3, value3]

3. 将某列的字符串进行分组拼接

```
df = pd.DataFrame({'id': [10001, 10001, 10002, 10002, 10002],
         'skill': ['python', 'java', 'python', 'C++', 'java']})
print(df)
data = df.groupby('id')['skill'].apply(lambda x: x.str.cat(sep=',')).reset_index()
print(data)
```

数据结果:

id skill
0 10001 python
1 10001 java
2 10002 python
3 10002 C++
4 10002 java

id skill
0 10001 python,java
1 10002 python,C++,java

posted @ 2022-05-21 10:11  EconCoder  阅读(8)  评论(0)    收藏  举报