【sklearn】【Nearest Neighbors】【1.6】
相关学习链接:
1. KD-tree和Ball-tree: http://blog.csdn.net/skyline0623/article/details/8154911。
1.1 KD-tree树的构造相对Ball-tree简单,不会存在子树重复的情况,Ball-tree会出现子树对应空间重叠的情况。
2.2 因为在寻找子树的过程中我们需要计算点到子树对应空间的最小和最大距离来判断是否需要继续往下递归寻找邻近点,显然“点到球体的距离”计算要比“点到立方体的距离”计算要简单很多,这样可以减少很多计算量。因此在高纬的情况下Ball-tree比KD-tree要快。
2. 无监督的最邻近:http://blog.csdn.net/mebiuw/article/details/51051453
2.1 实际上就是给定一个点,然后返回离这个点最近的K个点
3. LOF:https://zhuanlan.zhihu.com/p/28178476 http://blog.csdn.net/mr_tyting/article/details/77371157
3.1 LOF(局部异常因子)主要是用来判断异常值,LOF越大说明该点越孤立,越小说明周围越密集。
sklearn.neighbors | 含义 | 样例 |
neighbors..BallTree | Ball-tree |
from sklearn.neighbors import BallTree
import numpy as np
train_pt = np.random.random((50,3))
test_pt = np.random.random((3,3))
bt = BallTree(train_pt, leaf_size=3)
dist, ind = bt.query(test_pt, k = 4)
print dist
print ind
output:
[[ 0.19646157 0.21506229 0.24690478 0.25248628]
[ 0.15633642 0.31421209 0.35672915 0.3729417 ]
[ 0.16029188 0.17718276 0.2472842 0.25073488]]
[[34 24 30 42]
[18 45 49 12]
[31 14 20 22]]
|
neighbors.DistanceMetric | 计算点之间的距离,可以通过参数控制算法 |
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('euclidean')
x = [[0,0,0],[2,2,2]]
pdst = dist.pairwise(x)
print pdst
print dist.dist_to_rdist(pdst)
output:
[[ 0. 3.46410162]
[ 3.46410162 0. ]]
[[ 0. 12.]
[ 12. 0.]]
|
neighbors.KDTree | KD-Tree |
from sklearn.neighbors import KDTree
import numpy as np
train_pt = np.random.random((50,3))
test_pt = np.random.random((3,3))
bt = KDTree(train_pt, leaf_size=3)
dist, ind = bt.query(test_pt, k = 4)
print dist
print ind
output:
[[ 0.17394122 0.19157476 0.20874202 0.23487346]
[ 0.11966198 0.12684004 0.18264254 0.20663349]
[ 0.0728606 0.23566015 0.33500421 0.35400021]]
[[33 41 22 32]
[18 48 38 0]
[19 26 25 30]]
|
neighbors.KernelDensity | ||
neighbors.KNeighborsClassifier | KNN算法 |
from sklearn.neighbors import KNeighborsClassifier
train_x = [[0], [1], [2], [3], [4]]
train_y = [0, 0, 1, 1, 1]
neighbor = KNeighborsClassifier(n_neighbors=3, algorithm='kd_tree')
neighbor.fit(train_x, train_y)
print neighbor.predict([[1],[3]])
print neighbor.predict_proba([[1],[3]])
output:
[0 1]
[[ 0.66666667 0.33333333]
[ 0. 1. ]]
|
neighbors.KNeighborsRegressor | K临近回归,其实就是将离该点最近的k个点的值取平均值 |
from sklearn.neighbors import KNeighborsRegressor train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]] train_y = [0, 1, 2, 3, 4] neighbor = KNeighborsRegressor(n_neighbors=2) neighbor.fit(train_x, train_y) print neighbor.predict([[2.6,2.6]]) output: [ 2.5] |
neighbors.LocalOutlierFactor | ||
neighbors.RadiusNeighborsClassifier | 临近点分类算法,计算在radius范围的所有点,分类最多的点为最终结果 |
from sklearn.neighbors import RadiusNeighborsClassifier train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]] train_y = [0, 0, 1, 1, 1] neighbor = RadiusNeighborsClassifier(radius=2) neighbor.fit(train_x, train_y) print neighbor.predict([[2,2]]) output: [1] |
neighbors.RadiusNeighborsRegressor | 临近点回归,计算在radius范围的所有点,然后求这些点的平均值 |
from sklearn.neighbors import RadiusNeighborsRegressor train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]] train_y = [0, 2, 3, 4, 5] neighbor = RadiusNeighborsRegressor(radius=2) neighbor.fit(train_x, train_y) print neighbor.predict([[2.1,2.1]]) output: [ 3.] |
neighbors.NearestCentroid | 分类算法,先计算训练集中每个分类的质心,测试集只需要找到离自己最近的那个质心就是自己的分类 |
from sklearn.neighbors import NearestCentroid train_x = [[0,0],[1,1],[2,2],[3,3],[4,4],[5,5]] train_y = [1, 1, 2, 2, 2, 3] neighbor = NearestCentroid() neighbor.fit(train_x, train_y) print neighbor.predict([[1.5,1.5]]) output: [1] |
neighbors.NearestNeighbors | 无监督k临近算法 |
from sklearn.neighbors import NearestNeighbors
train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]]
neighbor = NearestNeighbors(n_neighbors=3, radius=1.5)
neighbor.fit(train_x)
print neighbor.kneighbors([[1,1]], n_neighbors=4)
print neighbor.radius_neighbors([[1,1]])
print neighbor.kneighbors_graph([[2,2],[3,3]]).toarray()
print neighbor.radius_neighbors_graph([[4,4]], radius=3).toarray()
output:
(array([[ 0. , 1.41, 1.41, 2.82]]), array([[1, 2, 0, 3]]))
(array([array([ 1.41, 0. , 1.41])], dtype=object), array([array([0, 1, 2])], dtype=object))
[[ 0. 1. 1. 1. 0.]
[ 0. 0. 1. 1. 1.]]
[[ 0. 0. 1. 1. 1.]]
|
neighbors.kneighbors_graph | 计算每个点最近的k个点 |
from sklearn.neighbors import kneighbors_graph train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]] result = kneighbors_graph(train_x, n_neighbors=2, mode='connectivity') print result.toarray() output: [[ 1. 1. 0. 0. 0.] [ 1. 1. 0. 0. 0.] [ 0. 1. 1. 0. 0.] [ 0. 0. 1. 1. 0.] [ 0. 0. 0. 1. 1.]] |
neighbors.radius_neighbors_graph | 计算离自己距离在radius内的所有点 |
from sklearn.neighbors import radius_neighbors_graph train_x = [[0,0],[1,1],[2,2],[3,3],[4,4]] result = radius_neighbors_graph(train_x, radius=3, mode='connectivity') print result.toarray() output: [[ 1. 1. 1. 0. 0.] [ 1. 1. 1. 1. 0.] [ 1. 1. 1. 1. 1.] [ 0. 1. 1. 1. 1.] [ 0. 0. 1. 1. 1.]] |