pandas 技巧
find index of top 3 largest values of each column:
df1.apply(lambda s: pd.Series(s.nlargest(3).index))
map dataframe column
df["ItemIdx"] = df["question"].map(lambda x: itemMap.get(x,np.NaN))
load a dictionay from a save pkl file
with open ("l.pkl","rb") as f:
itemMap= pickle.lead(f)
find the startpoint of each session (after sorted):
offset = np.zeros(df["sessinId"].nunique()+1,dtype=np.int32)
offset[1:] = df.groupby('sessinId').size().cumsum()
create a dictionary of two pandas Dataframe columns?
In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Remap values in pandas column with a dict
>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})
>>> di = {1: "A", 2: "B"}
>>> df
col1 col2
0 w a
1 1 2
2 2 NaN
>>> df.replace({"col1": di})
col1 col2
0 w a
1 A 2
2 B NaN
去掉括号里数字字母
config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\([a-zA-Z0-9]\)","",x))
去掉括号里
config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\(.*?\)","",x))
index reset
dfff.reset_index(drop=True)
Translate every element in numpy array according to key
>>> a = np.array([[1,2,3],
[3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
[36, 34, 45]]
pandas dataframe to nested json after groupby
#application: dataframe to mongoDB
test_dict = {'id':[1,2,3,1,2,1],
"name":[...],
"math":[...],
"English":[...]}
df = pd.DataFrame(data=test_dict)
e = df.groupby(["name","id"],as_index=False).apply(lambda x: x[["math","english"]].to_dict("r"))
sss = e.reset_index().rename(columns={0:"questions"})
result_dict = sss.to_dict("records")

浙公网安备 33010602011771号