pandas(八)重塑和轴向旋转

重塑层次化索引

层次化索引为DataFrame的重排提供了良好的一致性操作,主要方法有

stack :将数据的列旋转为行

unstack:将数据的行转换为列

 

用一个dataframe对象举例

In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number'))

In [5]: data
Out[5]:
number    one  two  three
state
Ohio        0    1      2
Colorado    3    4      5

In [6]: data.stack()#将列索引转换为行索引
Out[6]:
state     number
Ohio      one       0
          two       1
          three     2
Colorado  one       3
          two       4
          three     5
dtype: int32

In [7]: data.unstack()#将行索引转换为列索引
Out[7]:
number  state
one     Ohio        0
        Colorado    3
two     Ohio        1
        Colorado    4
three   Ohio        2
        Colorado    5
dtype: int32


In [9]: data.unstack().index
Out[9]:
MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=['number', 'state'])

In [10]:

对于DataFrame,无论是使用unstack,还是stack,得到都是一个Series对象

Series对象,只有unstack方法。

默认情况下,unstack操作的是最内层,传入分层级别的编号或名称即可对相应级别的索引做操作。

In [21]: result.unstack(0)
Out[21]:
state   Ohio  Colorado
number
one        0         3
two        1         4
three      2         5

In [22]: result.unstack()
Out[22]:
number    one  two  three
state
Ohio        0    1      2
Colorado    3    4      5

In [23]: result.unstack('state')
Out[23]:
state   Ohio  Colorado
number
one        0         3
two        1         4
three      2         5

如果不是所有的级别的值都能在个分组中找到的话,则unstack会引入缺失值

In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d'])

In [25]: s2 = Series([4,5,6],index = ['c','d','e'])

In [26]: data2 = pd.concat([s1,s2],keys = ['one','two'])

In [27]: data2
Out[27]:
one  a    0
     b    1
     c    2
     d    3
two  c    4
     d    5
     e    6
dtype: int64

In [28]: data2.unstack()
Out[28]:
       a    b    c    d    e
one  0.0  1.0  2.0  3.0  NaN
two  NaN  NaN  4.0  5.0  6.0

In [29]: data2.unstack(0)
Out[29]:
   one  two
a  0.0  NaN
b  1.0  NaN
c  2.0  4.0
d  3.0  5.0
e  NaN  6.0

而stack默认会滤除缺失值。

在对DataFrame进行旋转操作时,旋转的轴会成为旋转后索引的最低级别。也就是最内层索引。

 

posted @ 2018-04-15 13:00  左手十字  阅读(1370)  评论(0编辑  收藏  举报