一、Numpy介绍

一个强大的N维数组对象
支持大量的数据运算
集成C / C++和Fortran代码的工具
众多机器学习框架的基础库(Scipy/Pandas/scikit-learn/Tensorflow)

numpy在存储数据的时候，数据与数据的地址都是连续的，这样就给我们操作带来了好处，处理速度快。在计算机内存里是存储在一个连续空间上的，而对于这个连续空间，我们如果创建 Array 的方式不同，在这个连续空间上的排列顺序也有不同。

创建array的默认方式是 “C-type” 以 row 为主在内存中排列
如果是 “Fortran” 的方式创建的，就是以 column 为主在内存中排列

二、numpy属性

1、ndarray N维数组类型

　　数组属性

属性名字	属性解释
ndarray.shape	数组维度的元组
ndarray.flags	有关阵列内存布局的信息
ndarray.ndim	数组维数
ndarray.size	数组中的元素数量
ndarray.itemsize	一个数组元素的长度（字节）
ndarray.nbytes	数组元素消耗的总字节数

　　数组类型，dtype是numpy.dtype类型

名称	描述	简写
np.bool	用一个字节存储的布尔类型（True或False）	'b'
np.int8	一个字节大小，-128 至 127	'i'
np.int16	整数，-32768 至 32767	'i2'
np.int32	整数，-2 31 至 2 32 -1	'i4'
np.int64	整数，-2 63 至 2 63 - 1	'i8'
np.uint8	无符号整数，0 至 255	'u'
np.uint16	无符号整数，0 至 65535	'u2'
np.uint32	无符号整数，0 至 2 ** 32 - 1	'u4'
np.uint64	无符号整数，0 至 2 ** 64 - 1	'u8'
np.float16	半精度浮点数：16位，正负号1位，指数5位，精度10位	'f2'
np.float32	单精度浮点数：32位，正负号1位，指数8位，精度23位	'f4'
np.float64	双精度浮点数：64位，正负号1位，指数11位，精度52位	'f8'
np.complex64	复数，分别用两个32位浮点数表示实部和虚部	'c8'
np.complex128	复数，分别用两个64位浮点数表示实部和虚部	'c16'
np.object_	python对象	'O'
np.string_	字符串	'S'
np.unicode_	unicode类型	'U'

# 创建一个数组
a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,2,3,4])
c = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

# 类型，大小，字节数
a.dtype            # dtype('int64')
a.size               # 元素的个数 6
a.nbytes           # 总字节数 48
a.itemsize         # 一个元素的长度            

# 形状比较
# 1维形状 (4,)
# 2维形状 (2,3)
# 3维形状 (2, 2, 3)
a.shape                  # (2, 3)
b.shape                   #( 4,)
c.shape                 # (2, 2, 3)

aa = np.array([[1,2,3],[4,5,6]], dtype=np.float32)
aa.dtype         # dtype('float32')



# 创建数组的时候指定类型
arr = np.array(['python','tensorflow','scikit-learn','numpy'],dtype = np.string_)
arr.dtype      #array([b'python', b'tensorflow', b'scikit-learn', b'numpy'], dtype='|S12')

2、数据基本操作

2.1 创建0和1的数组

zero = np.zeros([3, 4])
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

ones = np.ones([3,4])
array([[1，1， 1， 1]
         [1，1， 1， 1]
         [1，1， 1， 1]]）

2.2 利用已有数组创建

# 源码
def asarray(a, dtype=None, order=None):
    return array(a, dtype, copy=False, order=order)

a = np.array([[1,2,3],[4,5,6]])
# 从现有的数组当中创建，默认设置copy=True
a1 = np.array(a)
# 相当于索引的形式，并没有真正的创建一个新的，默认设置copy=False
a2 = np.asarray(a)
# np.asanyarray 会返回 ndarray 或者ndarray的子类，而np.asarray 只返回 ndarray.
a3 = np.asanyarray(a)

2.3创建固定范围的数组

# np.linspace (start, stop, num, endpoint, retstep, dtype)   生成等间隔的序列
# start 序列的起始值
# stop 序列的终止值，
# 如果endpoint为true，该值包含于序列中
# num 要生成的等间隔样例数量，默认为50
# endpoint 序列中是否包含stop值，默认为ture
# retstep 如果为true，返回样例，以及连续数字之间的步长
# dtype 输出ndarray的数据类型

# 生成等间隔的数组
np.linspace(0, 100, 10)

#一个参数 默认起点0，步长为1 输出：[0 1 2]
a = np.arange(3)

#两个参数 默认步长为1 输出[3 4 5 6 7 8]
a = np.arange(3,9)

#三个参数 起点为0，终点为4，步长为0.1
a = np.arange(0, 3, 0.1)

2.4创建随机数组

# 创建均匀分布的数组
# 0~1
print(np.random.rand(10))       # [0.1316822  0.14809709 0.1721296  0.31960958 0.45602859 0.23868581
 0.3029913  0.13963264 0.34909405 0.16349485]

# 默认范围一个数
print(np.random.uniform(0, 100))      # 79.1599055190787

# 随机整数
print(np.random.randint(10))     # 6

# 正态分布
print(np.random.normal(0,1, (5, 10)))

2.5数据操作

arr = np.random.normal(0,1, (5, 10))
arr.shape   # (5, 10)
# 获取第一个数据的前5个数据
arr[0, 0:5]     # [-0.16934799  1.42139779 -0.42300578  0.87905768 -0.09310058]

# 通过索引获取数据
arr[0, 0, 1]

# 修改形状 
reshape：有返回值，所谓有返回值，即不对原始多维数组进行修改；
resize：无返回值，所谓无返回值，即会对原始多维数组进行修改；
# 让arr的行与arr列反过来
# 在转换形状的时候，一定要注意数组的元素匹配
arr.reshape([10, 5])   # 返回新数组
arr.resize([10,5])      # 修改原数组


# 修改类型
# ndarray.astype(type)
arr.reshape([10, 5]).astype(np.int32)

# 修改小数位数
# ndarray.round(arr, out) Return a with each element rounded to the given number of decimals.
np.round(arr[:2, :5], 4)

# 数组转换
arr.shape     # (5,10)
arr.T.shape    # (10,5)

# 将arr转换成bytes
arr.tostring()

3、逻辑运算

# 逻辑判断
temp > 0.5

# 赋值
temp[temp > 0.5] = 1

3.1通用函数

# np.all()
#判断[0:2,0:5]是否全是大于0的
np.all(arr[0:2,0:5] > 0)     #返回False/True

# np.unique()
# 返回新的数组的数值，不存在重复的值

#将序列中数值值唯一且不重复的值组成新的序列
change_int = arr[0:2,0:5].astype(int)
np.unique(change_int)

3.2 np.where（三元运算符）

# 通过使用np.where能够进行更加复杂的运算

np.where()
np.where(temp > 0, 1, 0)    # 把t中小于0的替换成0，其他的替换成1.

# 复合逻辑需要结合np.logical_and和np.logical_or使用
# 判断大于0.5并且小于1的，换为1，否则为0
np.where(np.logical_and(temp > 0.5, temp < 1), 1, 0)
# 判断 大于0.5或者小于-0.5的，换为1，否则为0
np.where(np.logical_or(temp > 0.5, temp < -0.5), 1, 0)

4、统计运算

min(a[, axis, out, keepdims]) Return the minimum of an array or minimum along an axis.
max(a[, axis, out, keepdims]) Return the maximum of an array or maximum along an axis.
median(a[, axis, out, overwrite_input, keepdims]) Compute the median along the specified axis.
mean(a[, axis, dtype, out, keepdims]) Compute the arithmetic mean along the specified axis.
std(a[, axis, dtype, out, ddof, keepdims]) Compute the standard deviation along the specified axis.
var(a[, axis, dtype, out, ddof, keepdims]) Compute the variance along the specified axis.

# axis 轴的取值并不一定，Numpy中不同的API轴的值都不一样，在这里，axis 0代表列, axis 1代表行去进行统计

np.max(arr, axis=1)
np.min(arr, axis=1)
np.std(arr, axis=1)
np.mean(arr, axis=1)

# 获取变化最大
np.argmax(arr, axis=1)
np.argmax(arr, axis=0)

5、数据间运算

当操作两个数组时，numpy会逐个比较它们的shape（构成的元组tuple），只有在下述情况下，两个数组才能够进行数组与数组

维度相等
shape（其中相对应的一个地方为1）

例如

arr1 = np.array([[1,2,3,2,1,4], [5,6,1,2,3,1]])
arr2 = np.array([[1], [3]])

6、矩阵运算

矩阵，英文matrix，和array的区别矩阵必须是2维的,但是array可以是多维的。

np.mat()

将数组转换成矩阵类型

6.1 矩阵乘法运算

在进行矩阵运算的时候，可以直接使用一个乘法运算API

mp.matmul

a = np.array([[80,86],
[82,80],
[85,78],
[90,90],
[86,82],
[82,90],
[78,80],
[92,94]])
b = np.array([[0.7],[0.3]])

ab = np.matmul(a, b)
print(ab)

7、合并、分割

合并：

numpy.concatenate((a1, a2, ...), axis=0)
numpy.hstack(tup) Stack arrays in sequence horizontally (column wise).
numpy.vstack(tup) Stack arrays in sequence vertically (row wise).

arr = np.random.normal(0,1,(6,10))
arr1 = arr[:2,:5]
arr2 = arr[2:4,:5]

# axis=0时候，按照数组的行方向拼接在一起  相当于vstack
arr12 = np.concatenate([arr1, arr2], axis=0)
print(arr12)
# axis=1时候，按照数组的列方向拼接在一起  相当于hstack
arr21 = np.concatenate([arr1, arr2], axis=1)
print(arr21)

# np.hstack([arr1, arr2])
# np.vstack([arr1, arr2])

分割：

numpy.split(ary, indices_or_sections, axis=0) Split an array into multiple sub-arrays.

arr = np.random.normal(0,1,(6,10))
arr1 = arr[:2,:5]
arr2 = arr[2:4,:5]

# axis=0时候，按照数组的行方向拼接在一起  相当于vstack
arr12 = np.concatenate([arr1, arr2], axis=0)
print(arr12)
# arr12 = [[ 2.36101473  0.59607656  0.05558251  1.76005188 -1.09546167]
 [-0.0909139  -2.4578533   0.03143425 -0.92920756 -0.75200382]
 [-0.78871339 -0.16194897 -0.6846899   0.26065397 -0.46875865]
 [ 0.58308561  0.3746763   0.71939439 -0.08448596 -1.78409148]]

aa = np.split(arr12,4,axis=0)  # 分割成四个数组，按行切割

aa = np.split(arr12,5,axis=0)  # 分割成五个数组，按列切割

8、IO操作与数据处理

Numpy读取

genfromtxt(fname[, dtype, comments, ...]) Load data from a text file, with missing values handled as specified.

测试数据，test.csv文件内容

import numpy as np

test = np.genfromtxt("test.csv", delimiter=',')

test  # 当我们读取本地的文件为float的时候，如果有缺失(或者为None)，就会出现nan

# 结果array([[  nan,   nan,   nan,   nan],

       [  1. , 123. ,   1.4,  23. ],

       [  2. , 110. ,   nan,  18. ],

       [  3. ,   nan,   2.1,  19. ]])

test.shape   # （4，4）
type(np.nan)    # float
# numpy不适合数据处理，pandas工具比较合适

posted on 2020-04-22 22:09 __夕颜阅读(1382) 评论(0) 收藏举报

刷新页面返回顶部