2.2.3 准备数据:归一化数值
1 def autoNorm(dataSet):
2 minVals = dataSet.min(0) #min(0)从列中选取最小值,注意参数为0
3 maxVals = dataSet.max(0) #max(0)从列中选取最大值,注意参数为0
4 ranges = maxVals - minVals #取值范围
5 normdataSet = zeros(shape(dataSet))
6 m = dataSet.shape[0] #获取样本条目数
7 normDataSet = dataSet - tile(minVals,(m,1)) #tile函数获得一个m行阵列:[minVals,minVals,...]
8 normDataSet = normDataSet/tile(ranges,(m,1)) #特征值相除,得到归一化结果
9 return normDataSet,ranges,minVals
下列变量:normMat: 特征值归一化结果, shape:(1000,3)
ranges:每列特征值的范围(最大值-最小值),shape:(3,), 因为有3个特征列
minVals:每个特征列的最小值,shape:(3,)
1 normMat,ranges,minVals = kNN.autoNorm(datingDataMat)
2
3 ranges
4 Out[174]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])
5
6 normMat
7 Out[175]:
8 array([[0.44832535, 0.39805139, 0.56233353],
9 [0.15873259, 0.34195467, 0.98724416],
10 [0.28542943, 0.06892523, 0.47449629],
11 ...,
12 [0.29115949, 0.50910294, 0.51079493],
13 [0.52711097, 0.43665451, 0.4290048 ],
14 [0.47940793, 0.3768091 , 0.78571804]])
15
16 len(normMat)
17 Out[176]: 1000
18
19 ranges
20 Out[177]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])
21
22 minVals
23 Out[178]: array([0. , 0. , 0.001156])

浙公网安备 33010602011771号