【sklearn】【Nearest Neighbors】【1.6】

相关学习链接:

1. KD-tree和Ball-tree: http://blog.csdn.net/skyline0623/article/details/8154911

  1.1 KD-tree树的构造相对Ball-tree简单,不会存在子树重复的情况,Ball-tree会出现子树对应空间重叠的情况。

  2.2 因为在寻找子树的过程中我们需要计算点到子树对应空间的最小和最大距离来判断是否需要继续往下递归寻找邻近点,显然“点到球体的距离”计算要比“点到立方体的距离”计算要简单很多,这样可以减少很多计算量。因此在高纬的情况下Ball-tree比KD-tree要快。

2. 无监督的最邻近:http://blog.csdn.net/mebiuw/article/details/51051453

  2.1 实际上就是给定一个点,然后返回离这个点最近的K个点

3. LOF:https://zhuanlan.zhihu.com/p/28178476 http://blog.csdn.net/mr_tyting/article/details/77371157

  3.1 LOF(局部异常因子)主要是用来判断异常值,LOF越大说明该点越孤立,越小说明周围越密集。

 

 

sklearn.neighbors 含义 样例
neighbors..BallTree Ball-tree
from sklearn.neighbors import BallTree
import numpy as np

train_pt = np.random.random((50,3))
test_pt = np.random.random((3,3))
bt = BallTree(train_pt, leaf_size=3)
dist, ind = bt.query(test_pt, k = 4)
print dist
print ind

output:
[[ 0.19646157  0.21506229  0.24690478  0.25248628]
 [ 0.15633642  0.31421209  0.35672915  0.3729417 ]
 [ 0.16029188  0.17718276  0.2472842   0.25073488]]
[[34 24 30 42]
 [18 45 49 12]
 [31 14 20 22]]
neighbors.DistanceMetric 计算点之间的距离,可以通过参数控制算法
from sklearn.neighbors import DistanceMetric

dist = DistanceMetric.get_metric('euclidean')
x = [[0,0,0],[2,2,2]]
pdst = dist.pairwise(x)
print pdst
print dist.dist_to_rdist(pdst)

output:
[[ 0.          3.46410162]
 [ 3.46410162  0.        ]]
[[  0.  12.]
 [ 12.   0.]]
neighbors.KDTree KD-Tree
from sklearn.neighbors import KDTree
import numpy as np

train_pt = np.random.random((50,3))
test_pt = np.random.random((3,3))
bt = KDTree(train_pt, leaf_size=3)
dist, ind = bt.query(test_pt, k = 4)
print dist
print ind

output:
[[ 0.17394122  0.19157476  0.20874202  0.23487346]
 [ 0.11966198  0.12684004  0.18264254  0.20663349]
 [ 0.0728606   0.23566015  0.33500421  0.35400021]]
[[33 41 22 32]
 [18 48 38  0]
 [19 26 25 30]]
neighbors.KernelDensity     
neighbors.KNeighborsClassifier KNN算法
from sklearn.neighbors import KNeighborsClassifier

train_x = [[0], [1], [2], [3], [4]]
train_y = [0, 0, 1, 1, 1]
neighbor = KNeighborsClassifier(n_neighbors=3, algorithm='kd_tree')
neighbor.fit(train_x, train_y)
print neighbor.predict([[1],[3]])
print neighbor.predict_proba([[1],[3]])

output:
[0 1]
[[ 0.66666667  0.33333333]
 [ 0.          1.        ]]
neighbors.KNeighborsRegressor K临近回归,其实就是将离该点最近的k个点的值取平均值
from sklearn.neighbors import KNeighborsRegressor

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
train_y = [0, 1, 2, 3, 4]
neighbor = KNeighborsRegressor(n_neighbors=2)
neighbor.fit(train_x, train_y)
print neighbor.predict([[2.6,2.6]])

output:
[ 2.5]
neighbors.LocalOutlierFactor    
neighbors.RadiusNeighborsClassifier 临近点分类算法,计算在radius范围的所有点,分类最多的点为最终结果
from sklearn.neighbors import RadiusNeighborsClassifier

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
train_y = [0, 0, 1, 1, 1]
neighbor = RadiusNeighborsClassifier(radius=2)
neighbor.fit(train_x, train_y)
print neighbor.predict([[2,2]])

output:
[1]
neighbors.RadiusNeighborsRegressor 临近点回归,计算在radius范围的所有点,然后求这些点的平均值
from sklearn.neighbors import RadiusNeighborsRegressor

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
train_y = [0, 2, 3, 4, 5]
neighbor = RadiusNeighborsRegressor(radius=2)
neighbor.fit(train_x, train_y)
print neighbor.predict([[2.1,2.1]])

output:
[ 3.]
neighbors.NearestCentroid 分类算法,先计算训练集中每个分类的质心,测试集只需要找到离自己最近的那个质心就是自己的分类
from sklearn.neighbors import NearestCentroid

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4],[5,5]]
train_y = [1, 1, 2, 2, 2, 3]
neighbor = NearestCentroid()
neighbor.fit(train_x, train_y)
print neighbor.predict([[1.5,1.5]])

output:
[1]
neighbors.NearestNeighbors 无监督k临近算法 
from sklearn.neighbors import NearestNeighbors

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
neighbor = NearestNeighbors(n_neighbors=3, radius=1.5)
neighbor.fit(train_x)
print neighbor.kneighbors([[1,1]], n_neighbors=4)
print neighbor.radius_neighbors([[1,1]])
print neighbor.kneighbors_graph([[2,2],[3,3]]).toarray()
print neighbor.radius_neighbors_graph([[4,4]], radius=3).toarray()

output:
(array([[ 0. ,  1.41,  1.41,  2.82]]), array([[1, 2, 0, 3]]))

(array([array([ 1.41,  0. ,  1.41])], dtype=object), array([array([0, 1, 2])], dtype=object))

[[ 0.  1.  1.  1.  0.]
[ 0.  0.  1.  1.  1.]]

[[ 0.  0.  1.  1.  1.]]
neighbors.kneighbors_graph 计算每个点最近的k个点
from sklearn.neighbors import kneighbors_graph

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
result = kneighbors_graph(train_x, n_neighbors=2, mode='connectivity')
print result.toarray()

output:
[[ 1.  1.  0.  0.  0.]
 [ 1.  1.  0.  0.  0.]
 [ 0.  1.  1.  0.  0.]
 [ 0.  0.  1.  1.  0.]
 [ 0.  0.  0.  1.  1.]]
neighbors.radius_neighbors_graph 计算离自己距离在radius内的所有点
from sklearn.neighbors import radius_neighbors_graph

train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
result = radius_neighbors_graph(train_x, radius=3, mode='connectivity')
print result.toarray()

output:
[[ 1.  1.  1.  0.  0.]
 [ 1.  1.  1.  1.  0.]
 [ 1.  1.  1.  1.  1.]
 [ 0.  1.  1.  1.  1.]
 [ 0.  0.  1.  1.  1.]] 
     

 

posted @ 2018-03-15 13:26  aclove  阅读(204)  评论(0)    收藏  举报