pandas.DataFrame.stack抄书笔记

首先学习stack

来源链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack

pandas.DataFrame.stack

DataFrame.stack(level=- 1, dropna=True)[source]

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

if the columns have a single level, the output is a Series;

if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Parameters

levelint, str, list, default -1: Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
dropnabool, default True: Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

Returns

DataFrame or Series: Stacked dataframe or series.

简单理解就是从列中拿取一列来当行的索引，如果列是单一的，那返回的就是Series对象，如果是多层的，那返回的还是DataFrame对象。

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Examples

Single level columns

df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
                                    index=['cat', 'dog'],
                                    columns=['weight', 'height'])

　　Stacking a dataframe with a single level column axis returns a Series:

In [27]: df_single_level_cols                                                                                                            
Out[27]: 
     weight  height
cat       0       1
dog       2       3

In [28]: r = df_single_level_cols.stack()                                                                                                

In [29]: r                                                                                                                               
Out[29]: 
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

In [30]: r.index                                                                                                                         
Out[30]: 
MultiIndex([('cat', 'weight'),
            ('cat', 'height'),
            ('dog', 'weight'),
            ('dog', 'height')],
           )

In [31]:

　　从输出可以看出来返回的是将列索引转移到行索引上面，行索引变成了多层索引。

Multi level columns: simple case

multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('weight', 'pounds')])
df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
                                    index=['cat', 'dog'],
                                    columns=multicol1)

　　输出

In [38]: df_multi_level_cols1                                                                                                            
Out[38]: 
    weight       
        kg pounds
cat      1      2
dog      2      4

In [39]: df_multi_level_cols1.stack()                                                                                                    
Out[39]: 
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

　　从输出看出，stack抽走了最下面的一层column的index去当行标签了。

Missing values

multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('height', 'm')])
df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
                                    index=['cat', 'dog'],
                                    columns=multicol2)

　　It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

In [41]: df_multi_level_cols2                                                                                                            
Out[41]: 
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0

In [42]: df_multi_level_cols2.stack()                                                                                                    
Out[42]: 
        height  weight
cat kg     NaN     1.0
    m      2.0     NaN
dog kg     NaN     3.0
    m      4.0     NaN

　　从最下面抽了一层给行便签组合成联合索引，很多空的数据默认用了NaN

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

In [48]: df_multi_level_cols2.stack(level=0)                                                                                             
Out[48]: 
             kg    m
cat height  NaN  2.0
    weight  1.0  NaN
dog height  NaN  4.0
    weight  3.0  NaN

In [49]: df_multi_level_cols2                                                                                                            
Out[49]: 
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0

In [50]: df_multi_level_cols2.stack(level=[0,1])                                                                                         
Out[50]: 
cat  height  m     2.0
     weight  kg    1.0
dog  height  m     4.0
     weight  kg    3.0
dtype: float64

In [51]: df_multi_level_cols2.stack(level=[1,0])                                                                                         
Out[51]: 
cat  kg  weight    1.0
     m   height    2.0
dog  kg  weight    3.0
     m   height    4.0
dtype: float64

　　你也可以指定需要抽的行索引，也可以把所有的行索引抽出来。

Dropping missing values

In [52]: df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], 
    ...:                                     index=['cat', 'dog'], 
    ...:                                     columns=multicol2)                                                                          

In [53]: df_multi_level_cols3                                                                                                            
Out[53]: 
    weight height
        kg      m
cat    NaN    1.0
dog    2.0    3.0

　　Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

当一行数据都为NaN的时候，可以通过dropna的选择来控制是否删除

In [54]: df_multi_level_cols3.stack()                                                                                                    
Out[54]: 
        height  weight
cat m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

In [55]: df_multi_level_cols3.stack(dropna=False)                                                                                        
Out[55]: 
        height  weight
cat kg     NaN     NaN
    m      1.0     NaN
dog kg     NaN     2.0
    m      3.0     NaN

　　默认为True，表示行数据为空的时候，不显示。

posted @ 2021-03-03 17:11 就是想学习阅读(150) 评论(0) 收藏举报

区块链散户一枚

Python中毒爱好者

pandas.DataFrame.stack抄书笔记

pandas.DataFrame.stack

公告