高斯朴素贝叶斯分类的原理解释和手写代码实现

Gaussian Naive Bayes (GNB) 是一种基于概率方法和高斯分布的机器学习的分类技术。朴素贝叶斯假设每个参数（也称为特征或预测变量）具有预测输出变量的独立能力。所有参数的预测组合是最终预测，它返回因变量被分类到每个组中的概率，最后的分类被分配给概率较高的分组（类）。

什么是高斯分布？

高斯分布也称为正态分布，是描述自然界中连续随机变量的统计分布的统计模型。正态分布由其钟形曲线定义，正态分布中两个最重要的特征是均值 (μ) 和标准差 (σ)。平均值是分布的平均值，标准差是分布在平均值周围的“宽度”。

重要的是要知道正态分布的变量 (X) 从 -∞ < X < +∞ 连续分布（连续变量），并且模型曲线下的总面积为 1。

导入必要的库：

现在创建一个预测变量呈正态分布的数据集。

#Creating values for FeNO with 3 classes:
FeNO_0 = np.random.normal(20, 19, 200)
FeNO_1 = np.random.normal(40, 20, 200)
FeNO_2 = np.random.normal(60, 20, 200)
#Creating values for FEV1 with 3 classes:
FEV1_0 = np.random.normal(4.65, 1, 200)
FEV1_1 = np.random.normal(3.75, 1.2, 200)
FEV1_2 = np.random.normal(2.85, 1.2, 200)
#Creating values for Broncho Dilation with 3 classes:
BD_0 = np.random.normal(150,49, 200)
BD_1 = np.random.normal(201,50, 200)
BD_2 = np.random.normal(251, 50, 200)
#Creating labels variable with three classes:(2)disease (1)possible disease (0)no disease:
not_asthma = np.zeros((200,), dtype=int)
poss_asthma = np.ones((200,), dtype=int)
asthma = np.full((200,), 2, dtype=int)
#Concatenate classes into one variable:
FeNO = np.concatenate([FeNO_0, FeNO_1, FeNO_2])
FEV1 = np.concatenate([FEV1_0, FEV1_1, FEV1_2])
BD = np.concatenate([BD_0, BD_1, BD_2])
dx = np.concatenate([not_asthma, poss_asthma, asthma])
#Create DataFrame:
df = pd.DataFrame()
#Add variables to DataFrame:
df['FeNO'] = FeNO.tolist()
df['FEV1'] = FEV1.tolist()
df['BD'] = BD.tolist()
df['dx'] = dx.tolist()
#Check database:
df