Pandas_关于dataFrame.mean()方法的用法与numpy.array()的使用问题

文章目录

如果dataFrame是用numpy.array()得到的对象（ndarray)来初始化，那么可能使得dataFrame.mean()方法无法正常工作：
- the error result by numpy.array()
- the normal result:
the doc of the dataFrame.mean()

如果dataFrame是用numpy.array()得到的对象（ndarray)来初始化，那么可能使得dataFrame.mean()方法无法正常工作：

使用字典来初始化dataFrame对象则可以使mean()正常工作

''' 7 '''
''' df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
     index=['cobra', 'viper', 'sidewinder'],
     columns=['max_speed', 'shield']) '''
data_dict={'dataset':['I','I','I','I','I'],'x':[10,8,13,9,11],'y':[8.04,6.95,7.58,8.81,8.33]}
df=pd.DataFrame(data=data_dict)
''' I found that if use numpy.array to make the dataFrame,the mean() method work uncorrectly!'''
# data_lists=[['I','I','I','I','I'],[10,8,13,9,11],[8.04,6.95,7.58,8.81,8.33]]#the common python lists will be priority by line make the parameter for the constractor with specified columns; the numpy.array instance may be is more flexible:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html?highlight=dataframe#pandas.DataFrame

# df=pd.DataFrame(data=np.array(data_lists).T,columns=['dataset','x','y'])
# print(df)
df.loc[[2,4],'dataset']="II"
# tmp=df[[2,4],['dataset']]
# print(df)
''' averages: '''
bool_serial_I=df['dataset']=='I'
# print(df['dataset']=='I')
tmp1=df.loc[bool_serial_I]
# print(tmp1.loc[:,['x','y']])
# tmp1=tmp1.loc[:,['x','y']]
# print(tmp1)
print(tmp1.mean(1,skipna=True))