K-Means 算法

K-Means 是一种常用的无监督聚类算法，主要功能是将一组未标注的数据样本自动划分为 \(K\) 个簇（Cluster）。它的核心目标可理解为：将数据点根据彼此之间的相似程度（通常用欧氏距离度量）进行分组，使得组内数据越紧凑、组间越分散越好。

K-Means 的数学目标可简单表示为最小化簇内平均距离或方差，常用的形式是“最小化 SSE（Sum of Squared Errors）”：

\[\text{SSE} = \sum_{k=1}^{K} \sum_{\mathbf{x} \in S_k} \|\mathbf{x} - \mathbf{\mu}_k\|^2 \]

其中，\(\mathbf{\mu}_k\) 是第 \(k\) 个簇中心，\(S_k\) 是第 \(k\) 个簇中的所有点。通过不断更新簇分配和簇中心，算法收敛后可以得到相对稳定的聚类结果。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# 准备数据：生成 3 类数据，每类 100 个点
np.random.seed(42)
cluster_1 = np.random.randn(100, 2) + np.array([2, 2])
cluster_2 = np.random.randn(100, 2) + np.array([-2, -2])
cluster_3 = np.random.randn(100, 2) + np.array([2, -2])
X = np.vstack([cluster_1, cluster_2, cluster_3])

# 创建并训练 K-Means 模型，将数据分为 3 个簇
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# 获得聚类结果（标签）并打印聚类中心
labels = kmeans.labels_
centers = kmeans.cluster_centers_
print("cluster centers:")
print(np.round(centers, 2))

# 可视化结果
plt.figure(figsize=(6, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', marker='o', s=30)
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='X', s=200, label='Cluster Centers')
plt.title("K-Means Example")
plt.legend()
plt.show()

cluster centers:
[[-1.97 -1.96]
 [ 1.87 -2.22]
 [ 1.87  1.98]]

posted @ 2025-06-20 21:03 Undefined443 阅读(15) 评论(0) 收藏举报

刷新页面返回顶部

undefined443

K-Means 算法

公告