Kmeans聚类分析质心更新计算过程

# 1. 生成示例数据
X, _ = make_blobs(n_samples=10, centers=2, cluster_std=0.90, random_state=0)

生成的数据:
[[ 1.10590929  5.61263348]
 [-0.24242331  1.4859204 ]
 [ 1.83134965  4.16756584]
 [ 2.45474443  1.19797055]
 [ 3.39993869  0.71302122]
 [ 1.66120403  4.41329484]
 [ 2.8332601   0.22971514]
 [ 2.65707227  3.42423724]
 [ 0.88337311  4.67332598]
 [ 2.33702845  0.12897749]]
 
# 2.初始化质心
n_samples=10
n_clusters = 2
centroids = X[np.random.choice(n_samples, n_clusters, replace=False)]

随机选择的质心坐标:
[[-0.24242331  1.4859204 ]
 [ 2.8332601   0.22971514]]
 
# 3. 计算样本点到质心的距离,这里选的是曼哈顿距离
distances = np.array([
                np.sum(np.abs(X - centroid), axis=1)
                for centroid in centroids
            ])
计算结果:
    质心1       质心2
5.47504569	7.11026915
0	        4.33188867
4.75541841	4.93976114
2.98511758	1.34677108
4.41526117	1.04998467
4.83100179	5.35563577
4.33188867	0
4.83781242	3.37070992
4.31320201	6.39349783
3.93639467	0.59696929
# 4. 分配样本到最近的质心
labels = np.argmin(distances, axis=0)
分配结果:
[0 0 0 1 1 0 1 1 0 1]
#5. 更新质心
new_centroids = np.array([
                X[labels == i].mean(axis=0)
                for i in range(n_clusters)
            ])
分配到质心1的样本 
[1.10590929	5.61263348
-0.24242331	1.4859204
1.83134965	4.16756584
1.66120403	4.41329484
0.88337311	4.67332598]
求平均值:
[1.047882554	4.070548108]
分配到质心2的样本
[2.45474443	1.19797055
3.39993869	0.71302122
2.8332601	0.22971514
2.65707227	3.42423724
2.33702845	0.12897749]
求平均值:
[2.736408788	1.138784328]
质心的新坐标:
[[1.04788256 4.07054811]
 [2.73640879 1.13878433]]

posted @ 2025-06-27 10:53  华小电  阅读(18)  评论(0)    收藏  举报