从0开始的机器学习——knn算法篇(2)

scikit-learn中的knn算法:

from sklearn.neighbors import KNeighborsClassfier  //KNeighborsClassifier里面包含了封装好的knn算法

//sklearn里面的所有算法都是以面向对象的形式封装的,所以使用时候需要先创建一个对象。

 KNN_classifier = KNeighborsClassfier(n_neighbors=6)  //创建对象。n_neighbors就是k值。

 KNN_classifier.fit(X_train,y_train) //fit就是拟合过程。任何一个机器学习算法都需要fit。fit里面传入的参数就是训练集和标签集

KNN_classifier.predict() //这样模型就存在于该对象中了。括号内需要传入一个矩阵。如果需要预测的是一个数值,也要写成矩阵形式传入。

x_predict = x.reshape(1,-1)  //只有一个数据的话写成矩阵形式

y_predict = KNN_classifier.predict(x_predict) //进行预测

y_predict(0) //即可输出预测值

 

 

自己写并且封装一个knn算法:

import numpy as np
from collections import Counter
from math import sqrt


class KNNClasssifier:
def __init__(self,k):
"""初始化knn分类器"""
assert k>=1, "k must be valid"
self.k = k
self._X_train = None
"_表示私有"
self._y_train = None

def fit(self,X_train,y_train):
"""根据训练数据集X_train和y_train来训练knn分类器"""
assert X_train.shape[0] == y_train.shape[0], \
"the size of X_train must be equal to the size of y_train"
assert self.k <= X_train.shape[0], \
"the size of X_train must be at least k"

self._X_train = X_train
"接收用户传入的参数并赋值给变量"
self._y_train = y_train
return self

def predict(self,X_predict):
"""给定待预测数据集X_predict,返回表示X_predict的结果向量"""
assert self._X_train is not None and self._y_train is not None
"must fit before predict"
assert X_predict.shape[1] == self._X_train.shape[1]
"the feature number of X_predict must be equal to X_train"

y_predict = [self._predict(x) for x in X_predict]
return np.array(y_predict)

def _predict(self,x):
"""给定单个带预测数据x,返回x的预测值"""
assert x.shape[0] == self._X_train.shape[1]
"the feature number of x must be equal to X_train"

distances = [sqrt(np.num((x_train - x) ** 2))

for x_train in self.X_train]

nearest = np.argsort(distances)

topK_y = [self.y_train[i] for i in nearest[:self.k]]

votes = Counter(topK_y)

return votes.most_common(1)[0][0]

def __repr__(self):
return "KNN(k=%d)" % self.k


然后就可以在Anaconda里用魔法命令调用了。

 

posted @ 2020-07-15 10:44  菜鸟成长手札  阅读(224)  评论(0)    收藏  举报