Python数据分析-数据移位与数据转换

1. 数据移位

　　shift()方法是一个非常有用的方法，用于数据位移与其他方法结合，能实现很多难以想象的功能，语法格式如下：

DataFrame.shift(periods=1, freq=None, axis=0, fill_value=_NoDefault.no_default, suffix=None)

　　使用可选的时间序列按所需周期数移动索引。

参数说明：

periods：int or Sequence

　　Number of periods to shift. Can be positive or negative. If an iterable of ints, the data will be shifted once by each int. This is equivalent to shifting by one value at a time and concatenating all resulting frames. The resulting columns will have the shift suffixed to their column names. For multiple periods, axis must not be 1.

freq：DateOffset, tseries.offsets, timedelta, or str, optional

　　Offset to use from the tseries module or time rule (e.g. ‘EOM’). If freq is specified then the index values are shifted but the data is not realigned. That is, use freq if you would like to extend the index when shifting and preserve the original data. If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.

axis：{0 or ‘index’, 1 or ‘columns’, None}, default None

　　Shift direction. For Series this parameter is unused and defaults to 0.

fill_value：object, optional

　　The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, np.nan is used. For datetime, timedelta, or period data, etc. NaT is used. For extension dtypes, self.dtype.na_value is used.

suffix：str, optional

　　If str and periods is an iterable, this is added after the column name and before the shift value for each shifted column name.

代码示例：

 1 df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
 2                    "Col2": [13, 23, 18, 33, 48],
 3                    "Col3": [17, 27, 22, 37, 52]},
 4                   index=pd.date_range("2020-01-01", "2020-01-05"))
 5 print(df)
 6 
 7 ### 结果
 8 #             Col1  Col2  Col3
 9 # 2020-01-01    10    13    17
10 # 2020-01-02    20    23    27
11 # 2020-01-03    15    18    22
12 # 2020-01-04    30    33    37
13 # 2020-01-05    45    48    52

 1 df1 = df.shift(periods=3)
 2 print(df1)
 3 
 4 ### 结果
 5 #             Col1  Col2  Col3
 6 # 2020-01-01   NaN   NaN   NaN
 7 # 2020-01-02   NaN   NaN   NaN
 8 # 2020-01-03   NaN   NaN   NaN
 9 # 2020-01-04  10.0  13.0  17.0
10 # 2020-01-05  20.0  23.0  27.0

 1 df1 = df.shift(periods=1, axis="columns")
 2 print(df1)
 3 
 4 ### 结果
 5 #             Col1  Col2  Col3
 6 # 2020-01-01   NaN    10    13
 7 # 2020-01-02   NaN    20    23
 8 # 2020-01-03   NaN    15    18
 9 # 2020-01-04   NaN    30    33
10 # 2020-01-05   NaN    45    48

 1 df1 = df.shift(periods=3, fill_value=0)
 2 print(df1)
 3 
 4 ### 结果
 5 #             Col1  Col2  Col3
 6 # 2020-01-01     0     0     0
 7 # 2020-01-02     0     0     0
 8 # 2020-01-03     0     0     0
 9 # 2020-01-04    10    13    17
10 # 2020-01-05    20    23    27

 1 df1 = df.shift(periods=3, freq="D")
 2 print(df1)
 3 
 4 ### 结果
 5 #             Col1  Col2  Col3
 6 # 2020-01-04    10    13    17
 7 # 2020-01-05    20    23    27
 8 # 2020-01-06    15    18    22
 9 # 2020-01-07    30    33    37
10 # 2020-01-08    45    48    52

 1 df1 = df.shift(periods=3, freq="infer")
 2 print(df1)
 3 
 4 ### 结果
 5 #             Col1  Col2  Col3
 6 # 2020-01-04    10    13    17
 7 # 2020-01-05    20    23    27
 8 # 2020-01-06    15    18    22
 9 # 2020-01-07    30    33    37
10 # 2020-01-08    45    48    52

2. 数据转换

　　数据转换一般包括一列数据转换为多列数据、行列转换、DataFrame转换为字典、DataFrame转换为列表和DataFrame转换为元组等。

2.1. 该篇主要内容

行列转换
Series转换为字典
Series转换为列表
DataFrame转换为HTML网页格式

注意：此外还有很多方法，有需要请参考官方文档

2.2. 行列转换

　　实现DataFrame的行列转换，使用的方法是df.T，语法定义如下：

property DataFrame.T

返回值：DataFrame

代码示例：

 1 df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
 2 print(df)
 3 df1 = df.T
 4 print(df1)
 5 
 6 ### 结果
 7 #    col1  col2
 8 # 0     1     3
 9 # 1     2     4
10 
11 #       0  1
12 # col1  1  2
13 # col2  3  4

2.3. Series转换为字典

　　实现Series转换为字典，使用方法语法Series.to_dict()，定义如下：

Series.to_dict(*, into=<class 'dict'>)

参数说明：

into：class, default dict

　　The collections.abc.MutableMapping subclass to use as the return object. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

返回值：collections.abc.MutableMapping

　　Key-value representation of Series.

代码示例：

1 s = pd.Series([1, 2, 3, 4])
2 s.to_dict()
3 
4 ### 结果
5 # {0: 1, 1: 2, 2: 3, 3: 4}

1 from collections import OrderedDict, defaultdict
2 s.to_dict(into=OrderedDict)
3 
4 ### 结果
5 # OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])

1 dd = defaultdict(list)
2 s.to_dict(into=dd)
3 
4 ### 结果
5 # defaultdict(<class 'list'>, {0: 1, 1: 2, 2: 3, 3: 4})

2.4. Series转换为列表

　　实现Series转换为列表，使用方法语法Series.to_list()，定义如下：

Series.to_list()

返回值：list

代码示例：

1 s = pd.Series([1, 2, 3])
2 s.to_list()
3 
4 ### 结果
5 # [1, 2, 3]

1 idx = pd.Index([1, 2, 3])
2 
3 ### 结果
4 # Index([1, 2, 3], dtype='int64')
5 
6 idx.to_list()
7 
8 ### 结果
9 # [1, 2, 3]

2.5. DataFrame转换为HTML网页格式

　　实现DataFrame转换为HTML网页格式，使用方法语法Series.to_html()，定义如下：

DataFrame.to_html(buf=None, *, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)

参数说明：

buf：str, Path or StringIO-like, optional, default None

　　Buffer to write to. If None, the output is returned as a string.

columns：array-like, optional, default None

　　The subset of columns to write. Writes all columns by default.

col_space：str or int, list or dict of int or str, optional

　　The minimum width of each column in CSS length units. An int is assumed to be px units..

header：bool, optional

　　Whether to print column labels, default True.

index：bool, optional, default True

　　Whether to print index (row) labels.

na_rep：str, optional, default ‘NaN’

　　String representation of NaN to use.

formatters：list, tuple or dict of one-param. functions, optional

　　Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

float_format：one-parameter function, optional, default None

　　Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

sparsify：bool, optional, default True

　　Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

index_names：bool, optional, default True

　　Prints the names of the indexes.

justify：str, default None

　　How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

left
right
center
justify
justify-all
start
end
inherit
match-parent
initial
unset.

max_rows：int, optional

　　Maximum number of rows to display in the console.

max_cols：int, optional

　　Maximum number of columns to display in the console.

show_dimensions：bool, default False

　　Display DataFrame dimensions (number of rows by number of columns).

decimal：str, default ‘.’

　　Character recognized as decimal separator, e.g. ‘,’ in Europe.

bold_rows：bool, default True

　　Make the row labels bold in the output.

classes：str or list or tuple, default None

　　CSS class(es) to apply to the resulting html table.

escape：bool, default True

　　Convert the characters <, >, and & to HTML-safe sequences.

notebook：{True, False}, default False

　　Whether the generated HTML is for IPython Notebook.

border：int

　　A border=border attribute is included in the opening <table> tag. Default pd.options.display.html.border.

table_id：str, optional

　　A css id is included in the opening <table> tag if specified.

render_links：bool, default False

　　Convert URLs to HTML links.

encoding：str, default “utf-8”

　　Set character encoding.

返回值：str or None

　　If buf is None, returns the result as a string. Otherwise returns None.

代码示例：

 1 df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})
 2 df1 = df.to_html()
 3 print(df1)
 4 
 5 ### 结果
 6 # <table border="1" class="dataframe">
 7 #   <thead>
 8 #     <tr style="text-align: right;">
 9 #       <th></th>
10 #       <th>col1</th>
11 #       <th>col2</th>
12 #     </tr>
13 #   </thead>
14 #   <tbody>
15 #     <tr>
16 #       <th>0</th>
17 #       <td>1</td>
18 #       <td>4</td>
19 #     </tr>
20 #     <tr>
21 #       <th>1</th>
22 #       <td>2</td>
23 #       <td>3</td>
24 #     </tr>
25 #   </tbody>
26 # </table>

时间：2024年2月7日

posted @ 2024-02-07 09:20 一路狂奔的乌龟阅读(66) 评论(0) 收藏举报

刷新页面返回顶部

一路狂奔的乌龟

别听世俗的耳语，去看自己喜欢的风景。

Python数据分析-数据移位与数据转换

1. 数据移位

2. 数据转换

2.1. 该篇主要内容

2.2. 行列转换

2.3. Series转换为字典

2.4. Series转换为列表

2.5. DataFrame转换为HTML网页格式

公告