数据分析之Numpy

一、创建ndarray

1、使用np.array()创建

一维数组

1 import numpy as np
2 
3 np.array([1,2,3,4,5])

二维数组

1 import numpy as np
2 
3 np.array([[1,2,3],['a','b',1.1]])

注意:

  • numpy默认ndarray的所有元素的类型是相同的
  • 如果传进来的列表中包含不同的类型,则统一为同一类型,优先级:str>float>int

 

2、使用np的routines函数创建

1、 np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None) 等差数列

 1 
 2 
 3 np.linspace(1,100,num=50)
 4 
 5 array([  1.        ,   3.02040816,   5.04081633,   7.06122449,
 6          9.08163265,  11.10204082,  13.12244898,  15.14285714,
 7         17.16326531,  19.18367347,  21.20408163,  23.2244898 ,
 8         25.24489796,  27.26530612,  29.28571429,  31.30612245,
 9         33.32653061,  35.34693878,  37.36734694,  39.3877551 ,
10         41.40816327,  43.42857143,  45.44897959,  47.46938776,
11         49.48979592,  51.51020408,  53.53061224,  55.55102041,
12         57.57142857,  59.59183673,  61.6122449 ,  63.63265306,
13         65.65306122,  67.67346939,  69.69387755,  71.71428571,
14         73.73469388,  75.75510204,  77.7755102 ,  79.79591837,
15         81.81632653,  83.83673469,  85.85714286,  87.87755102,
16         89.89795918,  91.91836735,  93.93877551,  95.95918367,
17         97.97959184, 100.        ])

 2、np.arange([start, ]stop, [step, ]dtype=None)

1 np.arange(1,100,2)
2 
3 # 执行结果
4 array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,
5        35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67,
6        69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99])

3、np.random.randint(low, high=None, size=None, dtype='l')  # (常用)

1 np.random.seed(10)
2 arr = np.random.randint(0,100,size=(5,6))

二、ndarray的属性

4个必记参数: ndim:维度 shape:形状(各维度的长度) size:总长度

np.ndim , np.shape, np.size

 

三、ndarray的基本操作

1 np.random.seed(10)
2 arr = np.random.randint(0,100,size=(5,6))
3 
4 array([[ 9, 15, 64, 28, 89, 93],
5        [29,  8, 73,  0, 40, 36],
6        [16, 11, 54, 88, 62, 33],
7        [72, 78, 49, 51, 54, 77],
8        [69, 13, 25, 13, 92, 86]])

 

1、索引

取索引为1、2的两行数据

arr[ [1,2] ]

1 arr[[1,2]]
2 
3 array([[29,  8, 73,  0, 40, 36],
4        [16, 11, 54, 88, 62, 33]])

 

2、切片

获取二维数组前两行

1 arr[0:2]
2 
3 array([[ 9, 15, 64, 28, 89, 93],
4        [29,  8, 73,  0, 40, 36]])

获取二维数组前两列

arr[ :,0:2] (逗号前边代表行切片,后面代表列切片)

1 arr[:,0:2]
2 
3 array([[ 9, 15],
4        [29,  8],
5        [16, 11],
6        [72, 78],
7        [69, 13]])

 

#获取二维数组前两行和前两列数据

1 arr[0:2,0:2]
2 
3 array([[ 9, 15],
4        [29,  8]])

 

反转

#将数组的行倒序

1 arr[::-1]
2 
3 array([[69, 13, 25, 13, 92, 86],
4        [72, 78, 49, 51, 54, 77],
5        [16, 11, 54, 88, 62, 33],
6        [29,  8, 73,  0, 40, 36],
7        [ 9, 15, 64, 28, 89, 93]])

 

#列倒序

1 arr[:,::-1]
2 
3 array([[93, 89, 28, 64, 15,  9],
4        [36, 40,  0, 73,  8, 29],
5        [33, 62, 88, 54, 11, 16],
6        [77, 54, 51, 49, 78, 72],
7        [86, 92, 13, 25, 13, 69]])

 

#全部倒序

1 arr[::-1,::-1]
2 
3 array([[86, 92, 13, 25, 13, 69],
4        [77, 54, 51, 49, 78, 72],
5        [33, 62, 88, 54, 11, 16],
6        [36, 40,  0, 73,  8, 29],
7        [93, 89, 28, 64, 15,  9]])

 

三、变形

使用arr.reshape()函数,注意参数是一个tuple!

1.将一维数组变形成多维数组

1 arr_1.reshape((-1,15))
2 
3 array([[ 9, 15, 64, 28, 89, 93, 29,  8, 73,  0, 40, 36, 16, 11, 54],
4        [88, 62, 33, 72, 78, 49, 51, 54, 77, 69, 13, 25, 13, 92, 86]])

2.将多维数组变形成一维数组

arr_1 = arr.reshape((30,))

 

4、级联

np.concatenate()  # 实际操作中级联多为二维数组

(jupyter)

1 arr
2 
3 array([[ 9, 15, 64, 28, 89, 93],
4        [29,  8, 73,  0, 40, 36],
5        [16, 11, 54, 88, 62, 33],
6        [72, 78, 49, 51, 54, 77],
7        [69, 13, 25, 13, 92, 86]])

np.concatenate((arr,arr),axis=1)

1 np.concatenate((arr,arr),axis=1) # 按照行级联
2 
3 array([[ 9, 15, 64, 28, 89, 93,  9, 15, 64, 28, 89, 93],
4        [29,  8, 73,  0, 40, 36, 29,  8, 73,  0, 40, 36],
5        [16, 11, 54, 88, 62, 33, 16, 11, 54, 88, 62, 33],
6        [72, 78, 49, 51, 54, 77, 72, 78, 49, 51, 54, 77],
7        [69, 13, 25, 13, 92, 86, 69, 13, 25, 13, 92, 86]])

# 将axis参数改为0表示按列级联,要保证对齐

 

四、ndarray的聚合操作

1. 求和np.sum

1 arr.sum(axis=0) # 按照列求和
2 
3 array([195, 125, 265, 180, 337, 325])

2. 最大最小值:np.max/ np.min

同理

3.平均值:np.mean()

 

五、ndarray的排序

np.sort()与ndarray.sort()都可以,但有区别:

  • np.sort()不改变输入
  • ndarray.sort()本地处理,不占用空间,但改变输入
1 np.sort(arr,axis=0)
2 
3 array([[ 9,  8, 25,  0, 40, 33],
4        [16, 11, 49, 13, 54, 36],
5        [29, 13, 54, 28, 62, 77],
6        [69, 15, 64, 51, 89, 86],
7        [72, 78, 73, 88, 92, 93]])

 

1 arr.sort(axis=0)
2 
3 array([[ 9,  8, 25,  0, 40, 33],
4        [16, 11, 49, 13, 54, 36],
5        [29, 13, 54, 28, 62, 77],
6        [69, 15, 64, 51, 89, 86],
7        [72, 78, 73, 88, 92, 93]])

 

 

二、使用matplotlib.pyplot获取一个numpy数组,数据来源于一张图片

import matplotlib.pyplot as plt

img_arr = plt.imread('./bobo.jpg')
 
plt.imshow(img_arr)  # 这里展示图片

 

posted @ 2019-03-06 11:05  神神气气  阅读(163)  评论(0编辑  收藏  举报