# 1. 生成示例数据
X, _ = make_blobs(n_samples=10, centers=2, cluster_std=0.90, random_state=0)
生成的数据:
[[ 1.10590929 5.61263348]
[-0.24242331 1.4859204 ]
[ 1.83134965 4.16756584]
[ 2.45474443 1.19797055]
[ 3.39993869 0.71302122]
[ 1.66120403 4.41329484]
[ 2.8332601 0.22971514]
[ 2.65707227 3.42423724]
[ 0.88337311 4.67332598]
[ 2.33702845 0.12897749]]
# 2.初始化质心
n_samples=10
n_clusters = 2
centroids = X[np.random.choice(n_samples, n_clusters, replace=False)]
随机选择的质心坐标:
[[-0.24242331 1.4859204 ]
[ 2.8332601 0.22971514]]
# 3. 计算样本点到质心的距离,这里选的是曼哈顿距离
distances = np.array([
np.sum(np.abs(X - centroid), axis=1)
for centroid in centroids
])
计算结果:
质心1 质心2
5.47504569 7.11026915
0 4.33188867
4.75541841 4.93976114
2.98511758 1.34677108
4.41526117 1.04998467
4.83100179 5.35563577
4.33188867 0
4.83781242 3.37070992
4.31320201 6.39349783
3.93639467 0.59696929
# 4. 分配样本到最近的质心
labels = np.argmin(distances, axis=0)
分配结果:
[0 0 0 1 1 0 1 1 0 1]
#5. 更新质心
new_centroids = np.array([
X[labels == i].mean(axis=0)
for i in range(n_clusters)
])
分配到质心1的样本
[1.10590929 5.61263348
-0.24242331 1.4859204
1.83134965 4.16756584
1.66120403 4.41329484
0.88337311 4.67332598]
求平均值:
[1.047882554 4.070548108]
分配到质心2的样本
[2.45474443 1.19797055
3.39993869 0.71302122
2.8332601 0.22971514
2.65707227 3.42423724
2.33702845 0.12897749]
求平均值:
[2.736408788 1.138784328]
质心的新坐标:
[[1.04788256 4.07054811]
[2.73640879 1.13878433]]