题解 | DeepML Solutions
Contents
-
- Matrix times Vector
-
- Calculate Mean by Row or Column
-
- Scalar Multiplication of a Matrix
-
- Linear Regression Using Normal Equation
-
- Linear Regression Using Gradient Descent
-
- K-Means Clustering
-
- Principal Component Analysis (PCA) Implementation
-
- Decision Tree Learning
-
- Sigmoid Activation Function Understanding
-
- Single Neuron
-
- Transformation Matrix from Basis B to C
-
- Divide Dataset Based on Feature Threshold
-
- One-Hot Encoding of Nominal Values
-
- Implement AdaBoost Fit Method
-
- Implementation of Log Softmax Function
-
- Implement Ridge Regression Loss Function
-
- Leaky ReLU Activation Function
-
- Linear Kernel Function
-
- Implement Gradient Descent Variants with MSE Loss
-
- Implement Reduced Row Echelon Form (RREF) Function
-
- Implement Adam Optimization Algorithm
-
- Implement Gini Impurity Calculation for a Set of Classes
-
- Calculate Image Brightness
-
- Calculate Performance Metrics for a Classification Model
-
- Binomial Distribution Probability
-
- Phi Transformation for Polynomial Features
-
- Positional Encoding Calculator
-
- Adam Optimizer
-
- Calculate the Phi Coefficient
-
- Implement the Softplus Activation Function
-
- Implement the Softsign Activation Function
-
- Implement the Swish Activation Function
-
- Implement Masked Self-Attention
-
- Vector Element-wise Sum
1. Matrix times Vector
(Easy)
4. Calculate Mean by Row or Column
Code
def calculate_matrix_mean(matrix: list[list[float]], mode: str) -> list[float]:
matrix = matrix if mode == 'row' else zip(*matrix)
import numpy as np
means = []
for row in matrix:
means.append(np.mean(np.array(list(row))))
return means
5. Scalar Multiplication of a Matrix
(Easy)
14. Linear Regression Using Normal Equation
前置知识
正规方程
线性回归的目标是找到一组参数 \(\theta\),使得预测值 \(X\theta\) 与真实值 \(y\) 的均方误差(MSE)最小化:
正规方程通过求导并令导数为零,可以直接解出最优参数 \(\theta\):
Code
def linear_regression_normal_equation(X: list[list[float]], y: list[float]) -> list[float]:
X = np.array(X)
y = np.array(y)
theta = np.linalg.inv(X.T @ X) @ X.T @ y
return np.round(theta, 4).flatten().tolist()
15. Linear Regression Using Gradient Descent
前置知识
梯度下降法
权重更新规则如下。
详解见 No. 47.
Code
Numpy 的矩阵乘法规则为 X(m, n) @ Y(n, p),注意将数组 y(3,) 调整为向量 y(3, 1).
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
m, n = X.shape
theta = np.zeros((n, 1))
for i in range(iterations):
h = np.dot(X, theta) # 3 x 1
# -1 means AUTO
err = h - y.reshape(-1, 1) # 3 x 1
update = np.dot(X.T, err) / m # 2 x 1
theta -= alpha * update
return theta
17. K-Means Clustering
前置知识
K-均值聚类
聚类是 NP-Hard 问题,因此使用 K-means 等启发式算法求解。对样本集 \(D = \{x_1, x_2, ..., x_m\}\),假设簇划分为 \((C_1, C_2, ..., C_k)\),K-means 的目标是最小化平方误差
算法流程如下。
- 初始化 \(C_t = \varnothing, (t = 1, 2, ..., k)\)
- 对每一个 \(x \in D\),计算 \(x\) 与各个质心向量的距离 \(d_i, (i = 1, 2, ..., k)\),将 \(x\) 标记为最小的 \(d_i\) 对应的类别 \(\lambda_i\),更新 \(C_i = C_i \cup \{x\}\)
- 对每一个簇 \(C_i\),使用 \(C_i\) 中所有样本点更新质心\[\mu_i = \frac{1}{|C_i|} \sum_{x \in C_i} x \]
- 重复 (1) ~ (3) 直至 \(\mu = \{\mu_1, \mu_2, ..., \mu_k\}\) 不再更新
- 输出簇划分 \(C\)
Code
算法实现中,使用欧氏距离进行聚类。
def k_means_clustering(points: list[tuple[float, float]], k: int, initial_centroids: list[tuple[float, float]], max_iterations: int) -> list[tuple[float, float]]:
def euc_distance(p1, p2):
n_dims = len(p1)
return sum((p1[i] - p2[i]) ** 2 for i in range(n_dims)) ** 0.5
def do_clustering(centroids):
labels = []
for point in points:
dist = [euc_distance(point, centroids[i]) for i in range(k)]
labels.append(dist.index(min(dist)))
return labels
def update_centroids(labels, old_centroids, use_random=True):
new_centroids = []
for i in range(k):
subpoints = [points[j] for j in range(len(points)) if labels[j] == i]
if subpoints:
n_dims = len(subpoints[0])
new_centroid = (*(
sum(point[i] for point in subpoints) / len(subpoints) for i in range(n_dims)
),)
new_centroids.append(new_centroid)
else:
if use_random:
import random
new_centroids.append(random.choice(points))
else:
new_centroids.append(old_centroids[i])
return new_centroids
current_centroids = initial_centroids
for _ in range(max_iterations):
current_labels = do_clustering(current_centroids)
new_centroids = update_centroids(current_labels, current_centroids, True)
if all(euc_distance(c, n) < 1e-6 for c, n in zip(current_centroids, new_centroids)):
break
current_centroids = new_centroids
return current_centroids
19. Principal Component Analysis (PCA) Implementation
前置知识
主成分分析 (PCA)
PCA 是数据降维常用算法。设有 ( n ) 条 ( d ) 维数据,过程如下。
- 将原始数据按列组成 ( n ) 行 ( d ) 列矩阵 ( X )
- 将 ( X ) 的每一列(代表一个属性)进行标准化,即 \(x_i \leftarrow \frac{x_i - \mu_i}{\sigma_i}\)
- 求出标准化矩阵的协方差矩阵
- 求出协方差矩阵的特征值及对应的特征向量
- 将特征向量按对应特征值大小从上到下按行排列成矩阵,取前 ( k ) 行组成矩阵 ( P )
- ( Y = PX ) 即为降维到 ( k ) 维后的数据。
题目要求返回前 \(k\) 个特征向量即可,故算法对应步骤 (1) ~ (5).
Code
import numpy as np
def pca(data, k):
# Standardize the data
data_standardized = (data - np.mean(data, axis=0)) / np.std(data, axis=0)
# Compute the covariance matrix
covariance_matrix = np.cov(data_standardized, rowvar=False)
# Eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)
# Sort the eigenvectors by decreasing eigenvalues
idx = np.argsort(eigenvalues)[::-1]
eigenvalues_sorted = eigenvalues[idx]
eigenvectors_sorted = eigenvectors[:,idx]
# Select the top k eigenvectors (principal components)
principal_components = eigenvectors_sorted[:, :k]
return np.round(principal_components, 4)
20. Decision Tree Learning
前置知识
ID3 决策树
本题目要求实现一颗 ID3 决策树。ID3 算法的核心是根据信息增益来选择进行划分的特征,然后递归地构建决策树。
首先计算样本总熵
引入条件熵
代码实现。根据信息 \(Y\) 进行分类后的 \(|Y|\) 个子集分别计算自己的熵,即为 \(E(Y/x_i)\). 见参考代码 chooseBestFeatureToSplit 函数。在 splitDataSet 函数中还对当前特征进行了移除,这是为了便于递归建树。
最后,引入信息增益
有了以上概念后,ID3 便可基于信息增益 \(G\) 进行特征选取,思路是每次选择属性 \(A\) 使得 \(\argmax_{A} G(A)\),其意义是每次选取使得数据集 \(D\) 不确定性降低最多的属性进行分类。
22. Sigmoid Activation Function Understanding
前置知识
Sigmoid
Code
return round(1 / (1 + math.exp(-z)), 4)
24. Single Neuron
前置知识
MSE (均方误差)
Code
import math
def single_neuron_model(features: list[list[float]], labels: list[int], weights: list[float], bias: float) -> (list[float], float):
# Your code here
def sigmoid(z):
return 1 / (1 + math.exp(-z))
def dot(x, w):
return sum([x_ * w_ for x_, w_ in zip(x, w)])
probabilities = []
for feature_vectors in features:
Z = dot(feature_vectors, weights) + bias
probabilities.append(sigmoid(Z))
mse = sum([(p - float(l)) ** 2 for p, l in zip(probabilities, labels)]) / len(labels)
# ROUND
probabilities = [round(p, 4) for p in probabilities]
mse = round(mse, 4)
return probabilities, mse
27. Transformation Matrix from Basis B to C
前置知识
基变换矩阵
基 \(B\) 到基 \(C\) 的变换矩阵 \(P\) 使得 \(BX=CX'\) 且 \(X' = PX\),因此
Code
def transform_basis(B: list[list[int]], C: list[list[int]]) -> list[list[float]]:
import numpy as np
B, C = np.array(B), np.array(C)
C_inv = np.linalg.inv(C)
P = C_inv @ B
return P
31. Divide Dataset Based on Feature Threshold
这道题是 Easy,不做赘述。
34. One-Hot Encoding of Nominal Values
编写一个独热码转换函数。学习 np 用法,特别是广播处理。
Code
import numpy as np
def to_categorical(x, n_col=None):
n_col = n_col or np.amax(x) + 1
one_hots = np.zeros((x.shape[0], n_col))
one_hots[np.arange(x.shape[0]), x] = 1 # broadcast
return one_hots
38. Implement AdaBoost Fit Method
前置知识
Boosting
集成学习方法,其主要思想就是将弱的基学习器提升 (boost) 为强学习器。具体步骤如下:
- 先用每个样本权重相等的训练集训练一个初始的基学习器;
- 根据上轮得到的学习器对训练集的预测表现情况调整训练集中的样本权重, 然后据此训练一个新的基学习器;
- 重复步骤 2 直到得到 \(M\) 个基学习器,最终的集成结果是 \(M\) 个基学习器的组合。
由此看出,Boosting 是一个串行的过程。
AdaBoost
AdaBoost 主要记住两个重要环节
- 得到基学习器的权重系数 \(\alpha_m\)
- 更新训练样本权重 \(W\)
具体过程如下。
考虑二分类问题,给定数据集 \(D = \{ (x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N) \}\),其中 \(x_i\) 是一个含有 \(d\) 个元素的列向量,即 \(x_i \in \mathcal{X} \subseteq \mathbb{R}^d\),而 \(y_i\) 是标量,\(y \in \{ +1, -1 \}\)。
1) 初始化样本权重
2) 对 \(m = 1, 2, \ldots, M\),重复以下操作得到 \(M\) 个基学习器:
- 按照样本的权重分布 \(W_m\) 训练数据得到第 \(m\) 个基学习器 \(G_m(x)\):
- 计算 \(G_m(x)\) 在加权训练数据集上的分类误差率。其中 \(I(\cdot)\) 是指示函数,括号里的条件成立其取值为 1,否则为 0。\(P(G_m(x_i) \neq y_i)\) 表示编号为 \(m\) 的基分类器 \(G_m(x)\) 对 \(x_i\) 的预测标记与 \(x_i\) 的真实标记 \(y_i\) 之间不一致的概率,记为 \(\epsilon\)。
3) 计算 \(G_m(x)\) 的权重系数(即最终集成时使用的基学习器的权重)
4) 更新训练样本权重:
5) 其中,\(Z_m\) 是归一化因子,目的是为了使 \(W_{m+1}\) 的所有元素和为 1。即
6) 构建最终的分类器线性组合
7) 最终的分类器为
Summary
代码其实并不复杂,建议伴随理解。主要记住 \(\alpha_m\) 和 \(W\) 的更新。
本文转载自 博客园
39. Implementation of Log Softmax Function
前置知识
SoftMax
Log SoftMax
直接计算上式对数会导致数值不稳定。为此,减去 \(\max x\)
Code
import numpy as np
def log_softmax(scores: list) -> np.ndarray:
sub_max = scores - np.max(scores)
log_submax = np.log(np.sum(np.exp(sub_max)))
return sub_max - log_submax
43. Implement Ridge Regression Loss Function
前置知识
岭回归
MSE 损失 + 权重的 \(\ell_2\) 正则。
Code
def ridge_loss(X: np.ndarray, w: np.ndarray, y_true: np.ndarray, alpha: float) -> float:
y_pred = X @ w
mse = np.mean((y_true - y_pred) ** 2)
l2_penalty = np.sum(w ** 2)
loss = mse + alpha * l2_penalty
return loss
44. Leaky ReLU Activation Function
前置知识
Leaky ReLU
Leaky ReLU 通过允许小的负斜率来解决 Dying ReLU 问题。
45. Linear Kernel Function
可以手动实现
return sum(x1_ * x2_ for x1_, x2_ in zip(x1, x2))
也可直接调用 np.inner.
47. Implement Gradient Descent Variants with MSE Loss
前置知识
本题要求对一个简单的全连接层 (无偏置)
使用梯度下降法进行优化。损失函数为 MSE,其梯度
Batch Gradient Descent
处理整个数据集,计算平均梯度更新参数
\(m\) 为样本容量。
Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent
最常用的方式,每次处理小批量数据。
\(b\) 为 batch_size.
Code
import numpy as np
def gradient_descent(X, y, weights, learning_rate, n_iterations, batch_size=1, method='batch'):
m = len(y)
if method == 'batch':
batch_size = m
elif method == 'stochastic':
batch_size = 1
elif method == 'mini_batch':
batch_size = batch_size
for _ in range(n_iterations):
for i in range(0, m, batch_size):
X_batch = X[i: i + batch_size]
y_batch = y[i: i + batch_size]
pred = X_batch.dot(weights)
err = pred - y_batch
gradient = 2 * X_batch.T.dot(err) / batch_size # broadcast
# update
weights -= learning_rate * gradient
return weights
48. Implement Reduced Row Echelon Form (RREF) Function
Deep-ML | Implement Reduced Row Echelon Form (RREF) Function
前置知识
高斯消元法
- 遍历每一列:从左到右处理每一列,找到当前列的主元(非零元素)。
- 交换行:将包含主元的行交换到当前处理的行。
- 归一化主元行:将主元行除以主元值,使主元变为 1。
- 消去其他行:使用主元行消去其他行在当前列的元素,使得该列其他元素为 0。
- 移动到下一列:处理下一列,直到所有主元处理完毕或超出矩阵行数。
Code
import numpy as np
def rref(matrix):
mat = matrix.astype(np.float64)
rows, cols = mat.shape
pivot = 0
for col in range(cols):
if pivot >= rows:
break
# 找到当前列中绝对值最大的行
current_col = mat[pivot:, col]
max_row = np.argmax(np.abs(current_col)) + pivot
if mat[max_row, col] == 0:
continue
# 交换行
if max_row != pivot:
mat[[pivot, max_row]] = mat[[max_row, pivot]]
# 归一化主元行
pivot_val = mat[pivot, col]
mat[pivot] = mat[pivot] / pivot_val
# 消去其他行
for i in range(rows):
if i != pivot:
factor = mat[i, col]
mat[i] -= factor * mat[pivot]
pivot += 1
return mat
49. Implement Adam Optimization Algorithm
本题与 No. 87 重复。
64. Implement Gini Impurity Calculation for a Set of Classes
Deep-ML | Implement Gini Impurity Calculation for a Set of Classes
前置知识
基尼不纯度
常用于计算决策树节点的最优分割。
其中 \(p_i = \frac{n_i}{n}\) 为某一类别的概率。
Code
import numpy as np
from collections import Counter
def gini_impurity(y):
"""
Calculate Gini Impurity for a list of class labels.
:param y: List of class labels
:return: Gini Impurity rounded to three decimal places
"""
n_labels = len(y)
probs = [float(p) / n_labels for _, p in Counter(y).items()]
return round(1 - sum(p ** 2 for p in probs), 3)
70. Calculate Image Brightness
本题注意输入合法性检查。不做赘述。
77. Calculate Performance Metrics for a Classification Model
Deep-ML | Calculate Performance Metrics for a Classification Model
前置知识
混淆矩阵
Accuracy
根据 Confusion Matrix 定义,有
Precision
Negative Predictive Value
Recall
Specificity
F1 Score
Trade-off of \(FN\)s and \(FP\)s.
Code
def performance_metrics(actual: list[int], predicted: list[int]) -> tuple:
# Implement your code here
n = len(actual)
TP = sum(1 if a == p == 1 else 0 for a, p in zip(actual, predicted))
TN = sum(1 if a == p == 0 else 0 for a, p in zip(actual, predicted))
FN = sum(actual) - TP
FP = sum(predicted) - TP
accuracy = float(TP + TN) / (TP + TN + FP + FN)
precision = float(TP) / (TP + FP)
negativePredictive = float(TN) / (TN + FN)
recall = float(TP) / (TP + FN)
specificity = float(TN) / (TN + FP)
f1 = 2.0 / (1 / precision + 1 / recall)
confusion_matrix = [[TP, FN], [FP, TN]]
return confusion_matrix, round(accuracy, 3), round(f1, 3), round(specificity, 3), round(negativePredictive, 3)
79. Binomial Distribution Probability
前置知识
二项分布
Code
import math
def binomial_probability(n, k, p):
C = math.comb(n, k)
probability = C * pow(p, k) * pow(1 - p, n - k)
return round(probability, 5)
84. Phi Transformation for Polynomial Features
(Easy)
85. Positional Encoding Calculator
前置知识
Transformer 位置编码器
对维度 \(i \in d_{model}\) 和位置 \(pos\),位置编码的计算公式如下。对于偶数维度:
对于奇数维度:
Code
(Generated by GPT-4o)
import numpy as np
def pos_encoding(position: int, d_model: int):
# 检查输入的有效性
if position == 0 or d_model <= 0:
return -1
# 创建位置编码数组
pos_enc = np.zeros((1, position, d_model), dtype=np.float16)
# 计算位置编码
for pos in range(position):
for i in range(d_model):
if i % 2 == 0:
pos_enc[0, pos, i] = np.sin(pos / (10000 ** (i / d_model)))
else:
pos_enc[0, pos, i] = np.cos(pos / (10000 ** ((i - 1) / d_model)))
pos_enc = np.float16(pos_enc)
return pos_enc
87. Adam Optimizer
前置知识
Adam
Adam (Kingma. Ba. 2014) 是有效的优化器,它使用指数加权移动平均值来估算梯度的动量和二次矩。具体地,
通常 \(\beta_1 = 0.9, \beta_2 = 0.999\),接下来修正偏差,得到 \(\hat{v}\) 与 \(\hat{s}\)
最后,以非常类似于 RMSProp 算法的方式重新缩放梯度,得到 \(g^t\) 的更新方程
更新参数
以下伪代码描述了 Adam 的整个过程。

Code
import numpy as np
def adam_optimizer(parameter, grad, m, v, t, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
"""
Update parameters using the Adam optimizer.
Adjusts the learning rate based on the moving averages of the gradient and squared gradient.
:param parameter: Current parameter value
:param grad: Current gradient
:param m: First moment estimate
:param v: Second moment estimate
:param t: Current timestep
:param learning_rate: Learning rate (default=0.001)
:param beta1: First moment decay rate (default=0.9)
:param beta2: Second moment decay rate (default=0.999)
:param epsilon: Small constant for numerical stability (default=1e-8)
:return: tuple: (updated_parameter, updated_m, updated_v)
"""
# Your code here
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * np.square(grad)
m_bias_corr = m / (1 - np.power(beta1, t))
v_bias_corr = v / (1 - np.power(beta2, t))
grad_1 = learning_rate * m_bias_corr / (np.sqrt(v_bias_corr) + epsilon)
parameter -= grad_1
return np.round(parameter,5), np.round(m,5), np.round(v,5)
95. Calculate the Phi Coefficient
前置知识
The Phi Coefficient
Code
def phi_corr(x: list[int], y: list[int]) -> float:
n = len(x)
x_00 = sum(1 for i in range(n) if x[i] == y[i] == 0)
x_11 = sum(1 for i in range(n) if x[i] == y[i] == 1)
x_10 = sum(1 for i in range(n) if x[i] == 1 != y[i])
x_01 = sum(1 for i in range(n) if x[i] == 0 != y[i])
sqrt_ = math.sqrt((x_00 + x_01) * (x_10 + x_11) * (x_00 + x_10) * (x_01 + x_11))
val = (x_00 * x_11 - x_01 * x_10) / sqrt_
return round(val,4)
100. Implement the Softplus Activation Function
前置知识
SoftPlus
Code
return round(np.log(1 + np.exp(x)), 4)
100. Implement the Softsign Activation Function
前置知识
SoftSign
Code
return round(x / (1 + abs(x)), 4)
102. Implement the Swish Activation Function
前置知识
Swish Activation Function
Code
sigma = lambda x: 1 / (1 + np.exp(-x))
return x * sigma(x)
107. Implement Masked Self-Attention
前置知识
掩码自注意力
Masked attention is a variation of the attention mechanism used primarily in sequence modeling tasks, such as language modeling and text generation. The key idea behind masked attention is to control the flow of information by selectively masking certain elements in the input sequence. This ensures that the model attends only to valid positions when computing attention scores.
已经获取 \(Q \in \mathbb{R}^{n \times d_K}, K \in \mathbb{R}^{m \times d_K}, V \in \mathbb{R}^{m \times d_V}\)。计算注意力分数
掩码形如
应用掩码,
计算 Softmax 前首先减去每行最大评分,以稳定数值计算
输出结果
Code
import numpy as np
def compute_qkv(X: np.ndarray, W_q: np.ndarray, W_k: np.ndarray, W_v: np.ndarray):
Q = np.dot(X, W_q)
K = np.dot(X, W_k)
V = np.dot(X, W_v)
return Q, K, V
def masked_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray, mask: np.ndarray) -> np.ndarray:
d_k = Q.shape[1]
scores = np.matmul(Q, K.T) / np.sqrt(d_k)
# Apply mask
# 可以直接相加 (数值 mask)
scores = scores + mask
# 或使用 np.wehre (布尔 mask)
scores = np.where(mask, -1e9, scores)
attention_weights = np.exp(scores - np.max(scores, axis=1, keepdims=True))
attention_weights = attention_weights / np.sum(attention_weights, axis=1, keepdims=True)
return np.matmul(attention_weights, V)
121. Vector Element-wise Sum
Code
return [sum(c) for c in zip(a, b)] if len(a) == len(b) else -1

浙公网安备 33010602011771号