pandas 技巧

find index of top 3 largest values of each column:

df1.apply(lambda s: pd.Series(s.nlargest(3).index))

map dataframe column

df["ItemIdx"] = df["question"].map(lambda x: itemMap.get(x,np.NaN))

load a dictionay from a save pkl file

with open ("l.pkl","rb") as f:
    itemMap= pickle.lead(f)

find the startpoint of each session (after sorted):

offset = np.zeros(df["sessinId"].nunique()+1,dtype=np.int32)
offset[1:] = df.groupby('sessinId').size().cumsum()

create a dictionary of two pandas Dataframe columns?

In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Remap values in pandas column with a dict

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})
>>> di = {1: "A", 2: "B"}
>>> df
  col1 col2
0    w    a
1    1    2
2    2  NaN
>>> df.replace({"col1": di})
  col1 col2
0    w    a
1    A    2
2    B  NaN

去掉括号里数字字母

config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\([a-zA-Z0-9]\)","",x))

去掉括号里

config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\(.*?\)","",x))

index reset

dfff.reset_index(drop=True)

Translate every element in numpy array according to key

>>> a = np.array([[1,2,3],
              [3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]]

pandas dataframe to nested json after groupby

#application: dataframe to mongoDB

test_dict = {'id':[1,2,3,1,2,1],
"name":[...],
"math":[...],
"English":[...]}

df = pd.DataFrame(data=test_dict)

e = df.groupby(["name","id"],as_index=False).apply(lambda x: x[["math","english"]].to_dict("r"))

sss = e.reset_index().rename(columns={0:"questions"})

result_dict = sss.to_dict("records")

posted @ 2019-11-01 10:57 SENTIMENT_SONNE 阅读(189) 评论(0) 收藏举报

刷新页面返回顶部

SENTIMENT_SONNE

pandas 技巧

公告