Numpy入门（1） - 数据类型及数组创建

常量

NumPy中常见常量共4种。

1. numpy.nan

表示空值。其中 nan = NaN = NAN

import numpy as np
x = np.array([1, 2, 3, 4, np.nan, 5])
print(x)

>> [ 1.  2.  3.  4. nan  5.]

Note：两个 np.nan 不相等

print(np.nan == np.nan)

>> False

2. numpy.inf

表示无穷大。其中 inf = Inf = infty = Infinity = PINF

3. numpy.pi

表示圆周率。

print(np.pi)

>> 3.141592653589793

4. numpy.e

表示自然对数e。

print(np.e)

>> 2.718281828459045

数据类型

Python 原生的数据类型相对较少， bool、int、float、str等。这在不需要关心数据在计算机中表示的所有方式的应用中是方便的。然而，对于科学计算，通常需要更多的控制。为了加以区分 NumPy 在这些类型名称末尾都加了“_”。

下表列举了常用 NumPy 基本类型。

类型	备注	说明
bool_ = bool8	8位	布尔类型
int8 = byte	8位	整型
int16 = short	16位	整型
int32 = intc	32位	整型
int_ = int64 = long = int0 = intp	64位	整型
uint8 = ubyte	8位	无符号整型
uint16 = ushort	16位	无符号整型
uint32 = uintc	32位	无符号整型
uint64 = uintp = uint0 = uint	64位	无符号整型
float16 = half	16位	浮点型
float32 = single	32位	浮点型
float_ = float64 = double	64位	浮点型
str_ = unicode_ = str0 = unicode		Unicode 字符串
datetime64		日期时间类型
timedelta64		表示两个时间之间的间隔

numpy 的数值类型实际上是 dtype 对象的实例。

class dtype(object):
    def __init__(self, obj, align=False, copy=False):
        pass

其中obj参数表示要转化的数据类型对象，在 NumPy 中，每个内建类型都有一个唯一定义它的字符代码，如下：

字符	对应类型	备注
b	boolean	'b1'
i	signed integer	'i1', 'i2', 'i4', 'i8'
u	unsigned integer	'u1', 'u2' ,'u4' ,'u8'
f	floating-point	'f2', 'f4', 'f8'
c	complex floating-point
m	timedelta64	表示两个时间之间的间隔
M	datetime64	日期时间类型
O	object
S	(byte-)string	S3表示长度为3的字符串
U	Unicode	Unicode 字符串
V	void

其中 i1 表示 int8，i2 表示 int16 等等。例如：

a = np.dtype('i1')
print(a.type)
>> <class 'numpy.int8'>
print(a.itemsize)
>> 1
a = np.dtype('i2')
print(a.type)  
>> <class 'numpy.int16'>
print(a.itemsize)  
>> 2

可以使用 iinfo 类查看整形数据的限制。

ii16 = np.iinfo(np.int16)
print(ii16.min)  
>> -32768
print(ii16.max)  
>> 32767

ii32 = np.iinfo(np.int32)
print(ii32.min)  
>> -2147483648
print(ii32.max)  
>> 2147483647

Python 的浮点数通常是64位浮点数，几乎等同于 np.float64。

NumPy和Python整数类型的行为在整数溢出方面存在显着差异，与 NumPy 不同，Python 的 int 是灵活的。这意味着Python整数可以扩展以容纳任何整数，并占用对应的内存数量且不会溢出。

a = 0
print(sys.getsizeof(a))
>> 24
a = 1
print(sys.getsizeof(a))
>> 28
a = np.iinfo(np.int32).max + 1
print(sys.getsizeof(a))
>> 32

而对于 NumPy 数据类型，超出其限制则会报 Warning

a = np.int64(np.iinfo(np.int64).max) + 1
>> <ipython-input-48-26c3cb429bd3>:4: RuntimeWarning: overflow encountered in long_scalars
  a = np.int64(np.iinfo(np.int64).max) + 1

时间日期和时间增量

在 NumPy 中，我们很方便的将字符串转换成时间日期类型 datetime64（datetime 已被 python 包含的日期时间库所占用）。

datatime64 是带单位的日期时间类型，其单位如下：

日期单位	代码含义	时间单位	代码含义
Y	年	h	小时
M	月	m	分钟
W	周	s	秒
D	天	ms	毫秒
-	-	us	微秒
-	-	ns	纳秒
-	-	ps	皮秒
-	-	fs	飞秒
-	-	as	阿托秒

从字符串创建 datetime64 类型时，默认情况下，NumPy 会根据字符串自动选择对应的单位。

a = np.datetime64('2020-03-01')
print(a, a.dtype)  
>> 2020-03-01 datetime64[D]

a = np.datetime64('2020-03')
print(a, a.dtype)  
>> 2020-03 datetime64[M]

a = np.datetime64('2020-03-08 20:00:05')
print(a, a.dtype)  
>> 2020-03-08T20:00:05 datetime64[s]

a = np.datetime64('2020-03-08 20:00')
print(a, a.dtype)  
>> 2020-03-08T20:00 datetime64[m]

a = np.datetime64('2020-03-08 20')
print(a, a.dtype)  
>> 2020-03-08T20 datetime64[h]

与 deltatime 相似，deltatime64 用于表示两个 datetime64 之间的差。Note：年（'Y'）和月（'M'）不能用于和其他单位换算。

a = np.datetime64('2020-03-08') - np.datetime64('2020-03-07')
b = np.datetime64('2020-03-08') - np.datetime64('202-03-07 08:00')
c = np.datetime64('2020-03-08') - np.datetime64('2020-03-07 23:00', 'D')

print(a, a.dtype)  
>> 1 days timedelta64[D]
print(b, b.dtype)  
>> 956178240 minutes timedelta64[m]
print(c, c.dtype)  
>> 1 days timedelta64[D]

a = np.datetime64('2020-03') + np.timedelta64(20, 'D')
b = np.datetime64('2020-06-15 00:00') + np.timedelta64(12, 'h')
print(a, a.dtype)  
>> 2020-03-21 datetime64[D]
print(b, b.dtype)  
>> 2020-06-15T12:00 datetime64[m]

a = np.timedelta64(1, 'Y')
print(np.timedelta64(a, 'D'))
>> TypeError: Cannot cast NumPy timedelta64 scalar from metadata [Y] to [D] according to the rule 'same_kind'

数组的创建

NumPy 提供的最重要的数据结构是 ndarray，它是 python 中 list 的扩展。

1. 根据现有数据创建 ndarray

(a) 通过 array()创建

# 创建一维数组
a = np.array([0, 1, 2, 3, 4])
b = np.array((0, 1, 2, 3, 4))
print(a, type(a))
>> [0 1 2 3 4] <class 'numpy.ndarray'>
print(b, type(b))
>> [0 1 2 3 4] <class 'numpy.ndarray'>

# 创建二维数组
c = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28, 29, 30],
              [31, 32, 33, 34, 35]])
print(c, type(c))
>> [[11 12 13 14 15]
   [16 17 18 19 20]
   [21 22 23 24 25]
   [26 27 28 29 30]
   [31 32 33 34 35]] <class 'numpy.ndarray'>

# 创建三维数组
d = np.array([[(1.5, 2, 3), (4, 5, 6)],
              [(3, 2, 1), (4, 5, 6)]])
print(d, type(d))
>> [[[1.5 2.  3. ]
   [4.  5.  6. ]]
   
   [[3.  2.  1. ]
   [4.  5.  6. ]]] <class 'numpy.ndarray'>

(b) 通过 asarray() 创建

array() 和 asarray() 都可以将结构数据转化为 ndarray，但是 array() 和 asarray() 主要区别就是当数据源是 ndarray 时，array()仍然会 copy 出一个副本，占用新的内存，但不改变 dtype 时 asarray()不会。

x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
y = np.array(x)
z = np.asarray(x)
x[1][2] = 2
print(x,type(x),x.dtype)
>> [[1 1 1]
   [1 1 2]
   [1 1 1]] <class 'numpy.ndarray'> int32

print(y,type(y),y.dtype)
>> [[1 1 1]
   [1 1 1]
   [1 1 1]] <class 'numpy.ndarray'> int32

print(z,type(z),z.dtype)
>> [[1 1 1]
   [1 1 2]
   [1 1 1]] <class 'numpy.ndarray'> int32

给函数绘图的时候可能会用到fromfunction()，该函数可从函数中创建数组。

def f(x, y):
    return 10 * x + y

x = np.fromfunction(f, (5, 4), dtype=int)
print(x)
>> [[ 0  1  2  3]
   [10 11 12 13]
   [20 21 22 23]
   [30 31 32 33]
   [40 41 42 43]]

2. 使用 0 和 1 填充

（a) 0 数组

zeros()：返回给定形状和类型的 0 数组。
zeros_like()：返回与给定数组形状和类型相同的 0 数组

x = np.zeros(5)
print(x)  
>> [0. 0. 0. 0. 0.]
x = np.zeros([2, 3])
print(x)
>> [[0. 0. 0.]
   [0. 0. 0.]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.zeros_like(x)
print(y)
>> [[0 0 0]
   [0 0 0]]

(b) 1 数组

ones()：返回给定形状和类型的 1 数组。
ones_like()：返回与给定数组形状和类型相同的 1 数组。

empty()：返回一个空数组，数组元素为随机数。
empty_like()：返回与给定数组具有相同形状和类型的新数组。

x = np.empty(5)
print(x)
>> [1.95821574e-306 1.60219035e-306 1.37961506e-306 
   9.34609790e-307 1.24610383e-306]

(d) 单位数组

eye()：返回一个对角线上为1，其它地方为零的单位数组。
identity()：返回一个方的单位数组。

x = np.eye(4)
print(x)
>> [[1. 0. 0. 0.]
   [0. 1. 0. 0.]
   [0. 0. 1. 0.]
   [0. 0. 0. 1.]]

x = np.eye(2, 3)
print(x)
>> [[1. 0. 0.]
   [0. 1. 0.]]

x = np.identity(4)
print(x)
>> [[1. 0. 0. 0.]
   [0. 1. 0. 0.]
   [0. 0. 1. 0.]
   [0. 0. 0. 1.]]

(e) 对角数组

diag()：提取对角线或构造对角数组。

x = np.arange(9).reshape((3, 3))
print(x)
>> [[0 1 2]
   [3 4 5]
   [6 7 8]]
print(np.diag(x))  
>> [0 4 8]
print(np.diag(x, k=1))  
>> [1 5]
print(np.diag(x, k=-1))  
>> [3 7]

v = [1, 3, 5, 7]
x = np.diag(v)
print(x)
>> [[1 0 0 0]
   [0 3 0 0]
   [0 0 5 0]
   [0 0 0 7]]

(f) 常数数组

full()：返回一个常数数组。
full_like()：返回与给定数组具有相同形状和类型的常数数组。

x = np.full((2,), 7)
print(x)
>> [7 7]

x = np.full(2, 7)
print(x)
> [7 7]

x = np.full((2, 7), 7)
print(x)
>> [[7 7 7 7 7 7 7]
   [7 7 7 7 7 7 7]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.full_like(x, 7)
print(y)
>> [[7 7 7]
   [7 7 7]]

3. 利用数值范围来创建 ndarray

arange()：返回给定间隔内的均匀间隔的值。
linspace()：返回指定间隔内的等间隔数字。
logspace()：返回数以对数刻度均匀分布。
numpy.random.rand()： 返回一个由[0,1)内的随机数组成的数组。

x = np.arange(5)
print(x)  
>> [0 1 2 3 4]

x = np.arange(3, 7, 2)
print(x)  
>> [3 5]

x = np.linspace(start=0, stop=2, num=9)
print(x)  
>> [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]

x = np.logspace(0, 1, 5)
print(np.around(x, 2))
>> [ 1.    1.78  3.16  5.62 10.  ]            
# np.around 返回四舍五入后的值，可指定精度。
# around(a, decimals=0, out=None)
# a 输入数组
# decimals 要舍入的小数位数。 默认值为0。 如果为负，整数将四舍五入到小数点左侧的位置


x = np.linspace(start=0, stop=1, num=5)
x = [10 ** i for i in x]
print(np.around(x, 2))
>> [ 1.    1.78  3.16  5.62 10.  ]

x = np.random.random(5)
print(x)
>> [0.41768753 0.16315577 0.80167915 0.99690199 0.11812291]

x = np.random.random([2, 3])
print(x)
>> [[0.41151858 0.93785153 0.57031309]
   [0.13482333 0.20583516 0.45429181]]

4. 结构数组的创建

结构数组，首先需要定义结构，然后利用np.array()来创建数组，其参数dtype为定义的结构。

(a) 利用字典定义结构

personType = np.dtype({
    'names': ['name', 'age', 'weight'],
    'formats': ['U30', 'i8', 'f8']})

a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
>> [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
   <class 'numpy.ndarray'>

(b) 利用包含多个元组的列表定义结构

personType = np.dtype([('name', 'U30'), ('age', 'i8'), ('weight', 'f8')])
a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
>> [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
   <class 'numpy.ndarray'>

数组的属性

在使用 numpy 时，你会想知道数组的某些信息。很幸运，在这个包里边包含了很多便捷的方法，可以给你想要的信息。

numpy.ndarray.ndim：用于返回数组的维数（轴的个数）也称为秩，一维数组的秩为 1，二维数组的秩为 2，以此类推。
numpy.ndarray.shape：表示数组的维度，返回一个元组，这个元组的长度就是维度的数目，即 ndim 属性(秩)。
numpy.ndarray.size：数组中所有元素的总量，相当于数组的shape中所有元素的乘积，例如矩阵的元素总量为行与列的乘积。
numpy.ndarray.dtype：ndarray 对象的元素类型。
numpy.ndarray.itemsize：以字节的形式返回数组中每一个元素的大小。

a = np.array([1, 2, 3, 4, 5])
print(a.shape)  
>> (5,)
print(a.dtype)  
>> int32
print(a.size)  
>> 5
print(a.ndim)  
>> 1
print(a.itemsize)  
>> 4

b = np.array([[1, 2, 3], [4, 5, 6.0]])
print(b.shape)  
>> (2, 3)
print(b.dtype)  
>> float64
print(b.size)  
>> 6
print(b.ndim)  
>> 2
print(b.itemsize)  
>> 8

posted @ 2020-10-20 20:07 L1ght 阅读(492) 评论(0) 收藏举报

刷新页面返回顶部

L1ght

Numpy入门（1） - 数据类型及数组创建

常量

数据类型

时间日期和时间增量

数组的创建

公告