series中MultiIndex
1.dataFrame中groupby后变成MultiIndex型的series
group=df_tmp.groupby(['SK_ID_CURR','CREDIT_ACTIVE']).size() group
SK_ID_CURR CREDIT_ACTIVE
162297 Active 1
Closed 2
215354 Active 6
Closed 1
dtype: int64
group.index
MultiIndex(levels=[[162297, 215354], ['Active', 'Closed']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['SK_ID_CURR', 'CREDIT_ACTIVE'])
for id in group.index.levels[0]: print (id)
162297 215354
#group.index.get_loc((215354,"Active"))获取索引为(215354,"Active")其对应的values中的下标
group.values[group.index.get_loc((215354,"Active"))]
6
2.衍生出各个id,Active的数量,代码如下:
df_new=pd.DataFrame({'SK_ID_CURR':list(set(df_bureau.SK_ID_CURR))}) #衍生特征首先构建一个新的dataframe
group=df_bureau.groupby(['SK_ID_CURR','CREDIT_ACTIVE']).size() df_new['CREDIT_ACTIVE_COUNT']=0 for id in group.index.levels[0]: df_new['CREDIT_ACTIVE_COUNT'].loc[df_new.SK_ID_CURR==id]=group.values[group.index.get_loc((id,"Active"))]

浙公网安备 33010602011771号