pandas.DataFrame.stack抄书笔记
首先学习stack
来源链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack
pandas.DataFrame.stack
DataFrame.stack(level=- 1, dropna=True)[source]-
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
-
if the columns have a single level, the output is a Series;
-
if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
- Parameters
- levelint, str, list, default -1
-
Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
- dropnabool, default True
-
Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.
- Returns
- DataFrame or Series
-
Stacked dataframe or series.
-
简单理解就是从列中拿取一列来当行的索引,如果列是单一的,那返回的就是Series对象,如果是多层的,那返回的还是DataFrame对象。
Notes
The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).
Examples
Single level columns
df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
index=['cat', 'dog'],
columns=['weight', 'height'])
Stacking a dataframe with a single level column axis returns a Series:
In [27]: df_single_level_cols
Out[27]:
weight height
cat 0 1
dog 2 3
In [28]: r = df_single_level_cols.stack()
In [29]: r
Out[29]:
cat weight 0
height 1
dog weight 2
height 3
dtype: int64
In [30]: r.index
Out[30]:
MultiIndex([('cat', 'weight'),
('cat', 'height'),
('dog', 'weight'),
('dog', 'height')],
)
In [31]:
从输出可以看出来返回的是将列索引转移到行索引上面,行索引变成了多层索引。
Multi level columns: simple case
multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
('weight', 'pounds')])
df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
index=['cat', 'dog'],
columns=multicol1)
输出
In [38]: df_multi_level_cols1
Out[38]:
weight
kg pounds
cat 1 2
dog 2 4
In [39]: df_multi_level_cols1.stack()
Out[39]:
weight
cat kg 1
pounds 2
dog kg 2
pounds 4
从输出看出,stack抽走了最下面的一层column的index去当行标签了。
Missing values
multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
('height', 'm')])
df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
index=['cat', 'dog'],
columns=multicol2)
It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:
In [41]: df_multi_level_cols2
Out[41]:
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
In [42]: df_multi_level_cols2.stack()
Out[42]:
height weight
cat kg NaN 1.0
m 2.0 NaN
dog kg NaN 3.0
m 4.0 NaN
从最下面抽了一层给行便签组合成联合索引,很多空的数据默认用了NaN
Prescribing the level(s) to be stacked
The first parameter controls which level or levels are stacked:
In [48]: df_multi_level_cols2.stack(level=0)
Out[48]:
kg m
cat height NaN 2.0
weight 1.0 NaN
dog height NaN 4.0
weight 3.0 NaN
In [49]: df_multi_level_cols2
Out[49]:
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
In [50]: df_multi_level_cols2.stack(level=[0,1])
Out[50]:
cat height m 2.0
weight kg 1.0
dog height m 4.0
weight kg 3.0
dtype: float64
In [51]: df_multi_level_cols2.stack(level=[1,0])
Out[51]:
cat kg weight 1.0
m height 2.0
dog kg weight 3.0
m height 4.0
dtype: float64
你也可以指定需要抽的行索引,也可以把所有的行索引抽出来。
Dropping missing values
In [52]: df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
...: index=['cat', 'dog'],
...: columns=multicol2)
In [53]: df_multi_level_cols3
Out[53]:
weight height
kg m
cat NaN 1.0
dog 2.0 3.0
Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:
当一行数据都为NaN的时候,可以通过dropna的选择来控制是否删除
In [54]: df_multi_level_cols3.stack()
Out[54]:
height weight
cat m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
In [55]: df_multi_level_cols3.stack(dropna=False)
Out[55]:
height weight
cat kg NaN NaN
m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
默认为True,表示行数据为空的时候,不显示。
浙公网安备 33010602011771号