pandas介绍

index

行索引

columns

列索引（列名/列标签）

Series

行（一维数组）

定义

pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False):
"""
data: 类数组/可迭代对象/字典/标量值。
      必选，
      如果是字典，按字典书写的顺序写入Series
index: 类数组/
       可选，
       必须可哈希，但不要求唯一，但长度要跟data的长度保持一致。
       默认是一个从0~len(data)的RangeIndex对象。
       如果data是一个字典：
           没有提供index实参，则使用data的key作为index；
           提供了index实参，则会重新索引(见示例4);
dtype: 数据类型（字符串/numpy.dtype对象/ExtensionDtype）
       可选，如果没有提供实参，则从data中继承
name: 给Series命名（字符串）
      可选
copy: 是否复制data（布尔型）
      可选，默认False

"""

示例

1. 以原始列表作为data。修改series的内容，不会影响到原始列表的内容（不管copy是不是True）

import pandas as pd

r = [1, 2]
ser = pd.Series(r, copy=True)
ser.iloc[0] = 999
# r为： [1, 2]
# ser为：
#    0    999
#    1      2
#    dtype: int64

2. 以numpy的数组作为data，copy=True。修改series的内容，会影响numpy数组的内容

import pandas as pd
import numpy as np
r = np.array([1, 2])
ser = pd.Series(r, copy=False)
ser.iloc[0] = 999

# r为：array([999,   2])

3. 以字典作为data，以字典的key作为index

import pandas as pd

d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'b', 'c'])
# ser为
#     a   1
#     b   2
#     c   3
#     dtype: int64

4. 以字典作为data，并自定义索引

自定义索引在字典中存在的key会保留，字典中不存在的key会被覆盖，被覆盖的key的值为NaN（a保留，b和c分别被x和y覆盖，且值为NaN）

源码的解释是：由于输入数据类型的原因，Series对原始数据有一个“视图”，因此数据也会被更改

import pandas as pd

d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'y', 'z'])
# ser为：
#     a   1
#     y   NaN
#     z   NaN
#     dtype: float64

DataFrame

定义

pandas.DataFrame(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None):
"""
data: ndarray, Iterable, dict, or DataFrame
        字典内可包含Series，数组，常量，类
        如果data是一个字典，列名会继承自字典的key
        如果字典包含了自带index的Series，按Series的索引对齐
index: Index or array-like
        用于生成DataFrame的索引，如果没有指定，则默认生成RangeIndex对象作为索引
columns: Index or array-like
        如果没有指定，则默认生成RangeIndex对象作为列名
dtype: dtype, default None
        指定data的数据类型
        如果没有指定，则根据data自动推断
copy: bool or None, default None
        Copy data from inputs.
        For dict data, the default of None behaves like ``copy=True``.  For DataFrame
        or 2d ndarray input, the default of None behaves like ``copy=False``.
"""

示例

1. 使用字典作为data，不指定dtype，由pandas推断出数据类型

import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

# df为：
       col1  col2
    0     1     3
    1     2     4

# 推断出的数据类型为int64
df.dtypes
col1    int64
col2    int64
dtype: object

2. 使用字典作为data，指定dtype

import pandas as pd
import numpy as np

df = pd.DataFrame(data=d, dtype=np.int8)

# dtype的类型为int8
    col1    int8
    col2    int8
    dtype: object

3. 使用包含Series的字典作为data

import pandas as pd

d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
df = pd.DataFrame(data=d, index=[0, 1, 2, 3])

"""
 df为：
       col1  col2
    0     0   NaN
    1     1   NaN
    2     2   2.0
    3     3   3.0
"""

4. 使用不带列名的ndarray作为data

import pandas as pd
import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(data, columns=['a', 'b', 'c'])

"""
df为
       a  b  c
    0  1  2  3
    1  4  5  6
    2  7  8  9
"""

5. 使用带列名的ndarray作为data

import numpy as np
import pandas as pd

data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)], dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df = pd.DataFrame(data, columns=['c', 'a'])

"""
df为
       c  a
    0  3  1
    1  6  4
    2  9  7
"""

6. 使用类作为data

import pandas as pd
from dataclasses import make_dataclass

Point = make_dataclass("Point", [("x", int), ("y", int)])
df = pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])

"""
df为
       x  y
    0  0  0
    1  0  3
    2  2  3
"""

posted on 2022-06-13 15:55 hflsp 阅读(131) 评论(0) 收藏举报

刷新页面返回顶部

pandas介绍

index

columns

Series

定义

示例

DataFrame

定义

示例

公告