二 index 的特性

index 对象的索引具有可重复性；index对象具有不可变性；如下所示索引 中重复了2次z ，能够正常运行；

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

index = ['z', 's', 'z', 'x', 'z']
data = ['t', 'o', 'y', 'o', 'u']

ser = pd.Series(data = data, index = index)
print(ser)


z    t
s    o
z    y
x    o
z    u
dtype: object


index = ['z', 's', 'z', 'x', 'z']
data = ['t', 'o', 'y', 'o', 'u']

ser = pd.Series(data = data, index = index)
print(ser['z'])


z    t
z    y
z    u
dtype: object


三 index 常用方法

3.1 reindex

index = ['l', 'o', 'v', 'e', 'r']
data = ['i', 's', 'y', 'o', 'u']
ser = pd.Series(data = data, index = index)
reser = ser.reindex(['r', 'e', 'l', 'o', 'v', 'e'])
print(reser)


r    u
e    o
l    i
o    s
v    y
l    i
dtype: object


3.2 填充

向前填充，o,v等字母会大于a，会填充a的值，z的值有重复会填充原来z的值；读者如果将a换成w有惊喜哟！就会出现比w小的索引的值都是NaN

index = ['a', 'z']
data = ['i', 's']
ser = pd.Series(data = data, index = index)
reser = ser.reindex(['o', 'z', 'v', 'o', 'k', 'y'],method='ffill' )
print(reser)


o    i
z    s
v    i
o    i
k    i
y    i
dtype: object


向后填充 顺序比s小的字母才会填充s的值，顺序比s大的值使用NaN代替；

index = ['a', 's']
data = ['i', 's']
ser = pd.Series(data = data, index = index)
reser = ser.reindex(['l', 'z', 'v', 'o', 'k', 'y'],method='bfill' )
print(reser)


l      s
z    NaN
v    NaN
o      s
k      s
y    NaN
dtype: object


3.3 删除索引

index = ['user1', 'user2', 'user3']
data = ['zszxz', 'caler', 'rose']
ser = pd.Series(index=index, data=data)
print(ser.drop('user3'))


user1    zszxz
user2    caler
dtype: object


dic = {
'user1':{'zszxz':0,'craler':1,'rose':2},
'user2':{'zszxz':3,'craler':4,'rose':5},
'user3':{'zszxz':6,'craler':7,'rose':8}
}
frame = pd.DataFrame(dic)
print(frame.drop(['craler','rose']))


       user1  user2  user3
zszxz      0      3      6


dic = {
'user1':{'zszxz':0,'craler':1,'rose':2},
'user2':{'zszxz':3,'craler':4,'rose':5},
'user3':{'zszxz':6,'craler':7,'rose':8}
}
frame = pd.DataFrame(dic)
print(frame.drop(['user1','user2'],axis=1))


        user3
zszxz       6
craler      7
rose        8


四 简单运算

index = ['user1', 'user2', 'user3']
data = [6, 8, 10]
ser1 = pd.Series(data = data, index = index)
index = ['user1', 'user2', 'user3', 'user4']
data = [4, 2, 10, 20]
ser2 = pd.Series(data = data, index = index)
print(ser1 + ser2)


user1    10.0
user2    10.0
user3    20.0
user4     NaN
dtype: float64


frame1 = pd.DataFrame(
data = np.arange(9).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

frame2 = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)
print(frame1 + frame2)


        user1  user2  user3
zszxz       0      3      6
craler      9     12     15
rose       18     21     24


五 函数运算

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame1 = pd.DataFrame(
data = np.arange(9).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

frame2 = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)



        user1  user2  user3
zszxz       0      3      6
craler      9     12     15
rose       18     21     24


5.2 sub()

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame1 = pd.DataFrame(
data = np.arange(9).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

frame2 = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

print(frame2.sub(frame1))


        user1  user2  user3
zszxz       0      1      2
craler      3      4      5
rose        6      7      8


5.3 div()

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame1 = pd.DataFrame(
data = np.arange(9).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

frame2 = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

print(frame2.div(frame1))


        user1  user2  user3
zszxz     NaN    2.0    2.0
craler    2.0    2.0    2.0
rose      2.0    2.0    2.0


5.4 mul()

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame1 = pd.DataFrame(
data = np.arange(9).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

frame2 = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)

print(frame2.mul(frame1))


        user1  user2  user3
zszxz       0      2      8
craler     18     32     50
rose       72     98    128


六Series 与 DataFrame之间的运算

Series 与 DataFrame 之间的运算也跟numpy的数组类似，遵循广播原则；如下示例中默认会执行运算匹配至DataFrame的列；如果Series 的 索引与 DataFrame的列不同，做运算后会产生并集，并且不同索引的位置使用NaN代替；

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
)
print('-----DataFrame------')
print(frame)
ser = pd.Series(data=np.arange(3), index=['user1', 'user2', 'user3'])
print('-----series------')
print(ser)
print('-----相减------')
print(frame - ser)


   user1  user2  user3
0      0      2      4
1      6      8     10
2     12     14     16
-----series------
user1    0
user2    1
user3    2
dtype: int32
-----相减------
user1  user2  user3
0      0      1      2
1      6      7      8
2     12     13     14


# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

frame = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
)
print('-----DataFrame------')
print(frame)
ser = pd.Series(data=np.arange(3))
print('-----series------')
print(ser)
print('-----相减------')
print(frame.sub(ser, axis = 0))


-----DataFrame------
user1  user2  user3
0      0      2      4
1      6      8     10
2     12     14     16
-----series------
0    0
1    1
2    2
dtype: int32
-----相减------
user1  user2  user3
0      0      2      4
1      5      7      9
2     10     12     14


七 选值

ser.loc['user1'] 与 ser['user1'] 是同样的道理；

index = ['user1', 'user2', 'user3']
data = [6, 8, 10]
ser = pd.Series(data = data, index = index)
print(ser.loc['user1'])


6


frame = pd.DataFrame(
data = np.arange(0,18,2).reshape((3,3)),
columns= ['user1', 'user2', 'user3'],
index =  ['zszxz','craler', 'rose']
)
print(frame.loc['zszxz','user2'])


2


八 常用函数说明

numpy.sqrt() 对每个元数取平方根 numpy.sqrt(frame)
DataFrame.sum() 求每个列的和 frame.sum()
DataFrame.mean() 求每个列的均值 frame.mean()
DataFrame.count() 求每列的数量 frame.count()
DataFrame.descripe() 计算多个统计量 frame.descripe
DataFrame.sort_index() 对索引按照字母顺序进行重新排序；ascending指定布尔值可以进行升序或者降序； frame.sort_index(axis=1)
DataFrame.sort_values() 对值进行排序，NaN默认最大，by参数可指定哪些列进行排序 frame.sort_values()
Series.order() 对index进行排序 ser.order()
DataFrame.rank() 默认是平均排名，也可以用参数 method = 'first'指定为原始数据排名，还有更多选项和入参参照官网 frame.rank()
DataFrame.corr() 计算相关性 frame.corr()
DataFrame.cov() 计算协方差 frame.cov()
DataFrame.corrwith() 计算2个数据集之间的相关性； frame1.corrwith(frame2)
DataFrame.fillna() 替换NaN为同一个元素 frame.fillna(0)
posted @ 2020-05-07 15:04  知识追寻者  阅读(378)  评论(0编辑  收藏  举报