6001 week1 quiz

✅ 基础卷（第 3 套）50 题（单选 + 多选，中英双语 + 详细解释）

Q1 (Single)

What does supervised learning require?
监督学习必须具备什么？

A. Labels
B. Clusters
C. Only features
D. No data needed

Answer: A
Explanation:
Supervised learning requires labeled data (X, y).
监督学习需要带标签的数据（X, y）。

Q2 (Single)

Which task belongs to regression?
以下哪个属于回归任务？

A. Predicting temperature tomorrow
B. Spam / not spam
C. Fraud / not fraud
D. Dog or cat classification

Answer: A
Explanation:
Regression predicts continuous numeric values.
回归用于预测连续数值型结果。

Q3 (Single)

Which model outputs probabilities?
哪个模型会输出概率？

A. Logistic Regression
B. KMeans
C. PCA
D. KNN (distance only)

Answer: A
Explanation:
Logistic regression outputs probability via the sigmoid function.
逻辑回归通过 sigmoid 输出概率。

Q4 (Single)

Which metric is sensitive to class imbalance?
哪个指标容易被类别不平衡误导？

A. Accuracy
B. Recall
C. Precision
D. F1-score

Answer: A
Explanation:
Accuracy can be very high even if the model ignores the minority class.
准确率在类别不平衡时容易误导。

Q5 (Single)

Which algorithm is used for clustering?
哪个算法用于聚类？

A. K-Means
B. Linear Regression
C. Logistic Regression
D. Decision Tree

Answer: A
Explanation:
K-Means is a classical unsupervised clustering algorithm.
K-Means 是经典聚类算法。

Q6 (Single)

Which one is a supervised learning task?
哪个是监督学习任务？

A. Customer churn prediction
B. Finding customer groups
C. PCA
D. Word embedding clustering

Answer: A
Explanation:
Churn prediction uses labeled outcomes (churn = yes/no).
流失预测属于监督学习，因为有标签。

Q7 (Single)

What does K in “KNN” represent?
KNN 中的 K 代表什么？

A. Number of neighbors
B. Number of clusters
C. Number of features
D. Number of trees

Answer: A
Explanation:
K is the number of nearest neighbors used for voting.
K 表示用于投票的邻居数量。

Q8 (Single)

What is the purpose of train/test split?
划分训练集与测试集的目的是什么？

A. Evaluate generalization
B. Increase accuracy
C. Reduce number of samples
D. Remove outliers

Answer: A
Explanation:
Test set measures model performance on unseen data.
测试集用于衡量模型的泛化能力。

Q9 (Single)

Which model is sensitive to feature scaling?
哪个模型对特征缩放敏感？

A. KNN
B. Decision Tree
C. Random Forest
D. Naive Bayes

Answer: A
Explanation:
KNN uses distance, so scaling strongly affects results.
KNN 基于距离，对特征缩放非常敏感。

Q10 (Single)

Sigmoid outputs values between:
Sigmoid 输出的值在什么范围？

A. 0 to 1
B. –1 to 1
C. 0 to infinity
D. –infinity to +infinity

Answer: A
Explanation:
Sigmoid squashes numbers into (0, 1).
Sigmoid 把输入压缩到 (0,1)。

Q11 (Single)

Which indicates overfitting?
哪种情况表示过拟合？

A. High train accuracy, low test accuracy
B. Low train, low test
C. Low train, high test
D. Both high

Answer: A
Explanation:
Overfitting = memorizing training data but failing on new data.
过拟合=训练很好，测试很差。

Q12 (Single)

Which is a continuous target?
以下哪个是连续型目标？

A. Salary prediction
B. Fraud detection
C. Spam detection
D. Disease yes/no

Answer: A
Explanation:
Salary is continuous → regression.
工资是连续值。

Q13 (Single)

Which metric is best for “finding all positive cases”?
如果希望尽可能识别所有正例，应关注哪个指标？

A. Recall
B. Precision
C. Accuracy
D. RMSE

Answer: A
Explanation:
Recall measures ability to find positives.
召回率衡量识别正样本的能力。

Q14 (Single)

Which method reduces dimensionality?
哪个方法用于降维？

A. PCA
B. Logistic Regression
C. Random Forest
D. KNN

Answer: A
Explanation:
PCA projects data to fewer dimensions.
PCA 将高维数据映射到低维空间。

Q15 (Single)

Which algorithm uses entropy or Gini?
哪个算法使用熵或基尼指数？

A. Decision Tree
B. KMeans
C. KNN
D. PCA

Answer: A
Explanation:
Decision trees split nodes using impurity measures.
决策树用熵/Gini 进行分裂。

Q16 (Multiple)

Which tasks belong to classification?
以下哪些属于分类任务？（多选）

A. Predicting loan default (yes/no)
B. Spam filtering
C. Predicting student score
D. Image dog/cat recognition

Answer: A, B, D

Explanation:
Classification works with discrete labels.
分类任务预测离散标签。

Q17 (Single)

Which part belongs to data preprocessing?
以下哪个属于数据预处理？

A. Scaling
B. Model selection
C. Hyperparameter tuning
D. Evaluation

Answer: A
Explanation:
Scaling is preprocessing before training.
缩放属于模型训练前的预处理步骤。

Q18 (Single)

Which algorithm splits data into K non-overlapping subsets?
哪个方法把数据切成 K 个互斥子集？

A. K-fold cross-validation
B. Bagging
C. Boosting
D. SMOTE

Answer: A
Explanation:
K-fold CV splits data for stable evaluation.
K 折交叉验证将数据分成 K 份以便更稳健评估。

Q19 (Single)

Which step must be done after train/test split?
以下哪一步必须在 train/test split 之后做？

A. Fit scaler on training only
B. Scale both sets independently
C. Fit scaler on whole dataset
D. Randomly resplit test set

Answer: A
Explanation:
Fit scaler on train → apply to test → avoid leakage.
仅在训练集上 fit 缩放器，再应用于测试集，避免泄漏。

Q20 (Single)

Which is true about K-Means?
关于 K-Means 下列哪项正确？

A. Must specify number of clusters
B. Produces labels automatically
C. Needs Y labels
D. Works only on text

Answer: A
Explanation:
K-Means requires choosing K.
K-Means 必须指定 K。

Q21 (Single)

Which represents “false positive”?
以下哪个代表假阳性 FP？

A. Predict positive but actually negative
B. Predict positive and actually positive
C. Predict negative and actually positive
D. Predict negative and actually negative

Answer: A
Explanation:
FP = false alarm.
假阳性表示误报。

Q22 (Single)

What happens if learning rate is too high?
学习率太大时会发生什么？

A. Divergence
B. Slow learning
C. Perfect accuracy
D. Model stops training

Answer: A
Explanation:
Large LR → updates overshoot → divergence.
学习率大→跳动太大→不收敛。

Q23 (Multiple)

Which can reduce overfitting?
哪些方法能减少过拟合？（多选）

A. Regularization
B. More training data
C. Early stopping
D. Increasing parameters without limit

Answer: A, B, C

Explanation:
Regularization, more data, early stopping help generalization.
正则化、更多数据、提前停止都能减少过拟合。

Q24 (Single)

Which of the following is a hyperparameter?
以下哪个属于超参数？

A. Learning rate
B. Model weights
C. Training labels
D. Actual predictions

Answer: A
Explanation:
Hyperparameters are chosen before training.
超参数需要在训练前设定。

Q25 (Single)

Which algorithm groups similar samples based on distance?
哪个算法根据距离把样本分组？

A. KMeans
B. Linear Regression
C. Logistic Regression
D. Bernoulli NB

Answer: A
Explanation:
KMeans uses Euclidean distance to cluster.
KMeans 使用欧氏距离聚类。

Q26 (Single)

Which method is used to handle missing values?
处理缺失值常用哪种方法？

A. Mean imputation
B. Raise error
C. Delete all data
D. Replace with random labels

Answer: A
Explanation:
Mean/median imputation is common for numerical features.
均值填充是常见做法。

Q27 (Single)

Which problem uses binary classification?
以下哪个是二分类？

A. Cancer detection (yes/no)
B. Predicting weight
C. Predicting exam score
D. Predicting house price

Answer: A

Q28 (Single)

Which is a measure of model error for regression?
哪个是回归评估指标？

A. RMSE
B. Accuracy
C. Recall
D. Precision

Answer: A

Q29 (Single)

Which model uses “distance voting”?
哪个模型使用“距离投票”？

A. KNN
B. Logistic Regression
C. PCA
D. Decision Tree

Answer: A

Q30 (Single)

Which algorithm uses centroids?
哪个算法使用质心（centroid）？

A. KMeans
B. Logistic Regression
C. KNN
D. Naive Bayes

Answer: A

Q31 (Single)

Which model requires no training phase?
哪个模型没有训练阶段？

A. KNN
B. Logistic Regression
C. Random Forest
D. SVM

Answer: A
Explanation:
KNN is a lazy learner: all computation happens at prediction time.
KNN 没有训练，预测时才计算。

Q32 (Single)

Which term describes “model is too simple”?
描述“模型太简单”的术语是？

A. Underfitting
B. Overfitting
C. Regularization
D. Scaling

Answer: A

Q33 (Single)

What is a decision boundary?
什么是决策边界？

A. The line that separates classes
B. A cluster center
C. Feature mean
D. Loss minimum

Answer: A
Explanation:
Decision boundary divides classes in feature space.
决策边界是在特征空间划分类别的线/面。

Q34 (Single)

Which technique is used to reduce noise in data?
哪个方法用于减少数据噪声？

A. Smoothing
B. Overfitting
C. PCA increase
D. More labels

Answer: A

Q35 (Single)

Which algorithm assigns each sample to the nearest centroid?
哪个算法把样本分配给最近的质心？

A. KMeans
B. Random Forest
C. Logistic Regression
D. KNN

Answer: A

Q36 (Single)

Which measure is used in linear regression?
线性回归通常优化哪个？

A. Mean Squared Error
B. Entropy
C. Gini
D. Silhouette

Answer: A

Q37 (Single)

Which model is prone to overfitting if not pruned?
哪个模型如果不剪枝容易过拟合？

A. Decision Tree
B. Logistic Regression
C. Naive Bayes
D. PCA

Answer: A

Q38 (Single)

Which is NOT a preprocessing step?
以下哪个不是数据预处理？

A. Feature scaling
B. One-hot encoding
C. Measuring accuracy
D. Handling missing values

Answer: C

Q39 (Multiple)

Which require feature scaling?
哪些模型需要特征缩放？（多选）

A. KNN
B. KMeans
C. Logistic Regression
D. Decision Tree

Answer: A, B, C

Q40 (Single)

Which is the purpose of PCA?
PCA 的目的是什么？

A. Reduce dimensionality
B. Increase number of samples
C. Create clusters
D. Remove labels

Answer: A

Q41 (Single)

Which model uses linear decision boundaries?
哪个模型使用线性决策边界？

A. Logistic Regression
B. KNN
C. Random Forest
D. KMeans

Answer: A

Q42 (Single)

Which statement about recall is correct?
关于召回率，下列哪个正确？

A. High recall means few false negatives
B. High recall means few false positives
C. Recall = TN/(TN+FP)
D. Recall ignores positives

Answer: A

Q43 (Single)

Which is a type of unsupervised learning?
哪一个是无监督学习？

A. Clustering
B. Regression
C. Classification
D. Logistic Regression

Answer: A

Q44 (Multiple)

Which algorithms can solve classification?
哪些算法能用于分类？（多选）

A. Logistic Regression
B. Decision Tree
C. KNN
D. PCA

Answer: A, B, C

Q45 (Single)

Which describes the bias–variance tradeoff?
偏差—方差权衡描述的是？

A. Simple models → high bias
B. Simple models → high variance
C. Complex models → low variance always
D. No relationship exists

Answer: A

Q46 (Single)

Which algorithm is based on probability?
哪个模型基于概率？

A. Naive Bayes
B. Decision Tree
C. KMeans
D. PCA

Answer: A

Q47 (Single)

Which model uses “if-else rule paths”?
哪个模型使用“if…else”规则路径？

A. Decision Tree
B. Logistic Regression
C. PCA
D. KNN

Answer: A

Q48 (Single)

Which algorithm predicts by finding the most similar samples?
哪个算法通过寻找最相似样本进行预测？

A. KNN
B. Logistic Regression
C. PCA
D. Naive Bayes

Answer: A

Q49 (Multiple)

Which can cause overfitting?
以下哪些会导致过拟合？（多选）

A. Too complex model
B. Too few samples
C. No regularization
D. Early stopping

Answer: A, B, C

Q50 (Single)

Which aims to evaluate model performance on unseen data?
哪个用于评估模型在未见数据上的表现？

A. Test set
B. Train set
C. Validation only
D. Features only

Answer: A

✅ **预测考试题库（Set 1）

（单选 + 多选，中英双语，含答案与解析）**

Q1.（单选）Supervised vs Unsupervised

Which of the following best describes supervised learning?
以下哪项最能描述监督学习？

A. Learning from labeled data
B. Discovering patterns without labels
C. Searching for maximum rewards
D. Automatically clustering similar items

✔ 答案：A

解释：
监督学习必须有标签 Y，可以进行分类或回归。

Q2.（单选）Classification vs Regression

Which task is typically solved using regression rather than classification?
下列哪个任务通常属于回归而不是分类？

A. Predicting whether an email is spam
B. Predicting customer churn (yes/no)
C. Predicting house prices
D. Fraud detection

✔ 答案：C

解释：
预测连续值（如房价） → 回归；其余都是二分类问题。

Q3.（单选）K-Means

What type of machine learning algorithm is K-Means?
K-Means 属于哪种机器学习算法？

A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Semi-supervised learning

✔ 答案：B

解释：
K-Means 是典型的非监督学习聚类算法，不需要标签。

Q4.（单选）Label Cost

Labeling data is labor-intensive. Which approach helps when only a small portion of data is labeled?
标注数据成本非常高，当只有少部分数据有标签时，下面哪种方法最适用？

A. Clustering
B. Semi-Supervised Learning
C. Reinforcement Learning
D. PCA

✔ 答案：B

解释：
半监督学习结合少量标签 + 大量无标签数据，是解决标签稀缺最常用方法。

Q5.（单选）Underfitting

Which of the following indicates underfitting?
下列哪项表明出现了欠拟合？

A. Low train accuracy, high test accuracy
B. High train accuracy, low test accuracy
C. Low train accuracy, low test accuracy
D. High train accuracy, high test accuracy

✔ 答案：C

解释：
欠拟合：训练和测试都表现差，模型太简单。

Q6.（单选）Overfitting

Which scenario best describes overfitting?
以下哪种情况最符合过拟合？

A. Model performs poorly on both train & test
B. Model performs very well on train but poorly on test
C. Model performs poorly on train but well on test
D. Model has high bias but low variance

✔ 答案：B

解释：
过拟合：训练集表现好，测试集表现差 → 模型记住了训练数据。

Q7.（多选）How to reduce overfitting?

Which methods help reduce overfitting? (Multiple answers)
以下哪些方法可以降低过拟合？（多选）

A. Increasing model complexity
B. Adding regularization (L1/L2)
C. Using cross-validation
D. Early stopping
E. Training longer even when validation loss goes up

✔ 答案：B, C, D

解释：

正则化、交叉验证、提前停止均可减少过拟合
A 会导致更严重过拟合
E 是典型错误（会让过拟合更严重）

Q8.（单选）Bias–Variance Tradeoff

High bias is typically associated with…
高偏差（Bias）通常意味着：

A. Model is too complex
B. Model is too simple
C. Model overfits
D. Model has high variance

✔ 答案：B

解释：
高偏差 → 模型太简单 → 欠拟合。

Q9.（多选）Metrics for imbalanced data

Which metrics are ideal for evaluating highly imbalanced datasets (e.g., fraud detection)?
哪个评价指标适合评估高度不平衡的数据（如欺诈检测）？（多选）

A. Accuracy
B. Recall
C. Precision
D. AUC
E. Mean Squared Error (MSE)

✔ 答案：B, C, D

解释：

不平衡数据中 Accuracy 会误导
Recall 关键（避免漏掉欺诈）
Precision 和 AUC 也重要
MSE 是回归评价指标

Q10.（单选）Cross-Validation

What is the purpose of k-fold cross-validation?
k 折交叉验证的主要目的是什么？

A. Increase training speed
B. Use more complex models
C. Obtain more stable and reliable evaluation
D. Reduce data size

✔ 答案：C

解释：
CV 用于得到稳定的模型表现，减少随机划分带来的偏差。

Q11.（单选）Confusion Matrix

Which value in a confusion matrix represents “false negatives”?
混淆矩阵中哪个代表“假负类”（漏报）？

A. TP
B. FP
C. FN
D. TN

✔ 答案：C

解释：
FN = 模型预测 0，但真实是 1 → 最危险（例如漏掉欺诈或癌症）。

Q12.（多选）Logistic Regression

Which statements about Logistic Regression are correct?
以下关于逻辑回归的说法哪些是正确的？（多选）

A. It outputs probabilities
B. It uses the sigmoid function
C. It is used for binary classification
D. It is used for clustering
E. It is linear in parameters

✔ 答案：A, B, C, E

解释：
逻辑回归是线性模型，通过 Sigmoid 输出概率，用于二分类。

Q13.（单选）PCA

What is PCA mainly used for?
PCA 主要用于？

A. Classification
B. Dimensionality reduction
C. Clustering
D. Increasing variance in data

✔ 答案：B

解释：
PCA = 降维方法，用于去除噪声和降维再可视化。

Q14.（单选）AUC

What does a model with AUC = 0.5 indicate?
AUC = 0.5 的模型代表？

A. Perfect model
B. Good model
C. Random guessing
D. Always predicts positive

✔ 答案：C

Q15.（多选）Gradient Descent

Which statements about gradient descent are correct?
以下关于梯度下降的说法哪些正确？（多选）

A. It minimizes a loss function
B. Learning rate controls step size
C. Too large learning rate may cause divergence
D. Too small learning rate makes convergence slow
E. It guarantees reaching the global minimum

✔ 答案：A, B, C, D

解释：
梯度下降无法保证全局最优（非凸函数）。

下面是根据你上传的 MD 题库样式（Quiz Samples）并结合已上传讲义 PDF + 课前/课后录播文字生成的 50 题完整考卷（单选 / 多选）。每题都给出答案（Answer / 答案）和详细解释（Explanation / 详细解释）（中英双语）。题型与难度仿照你提供的样式与课堂重点。
（此套题基于你上传的 Quiz Samples.md、Chapter1 PDF 与课堂录播文字生成）。

说明：标注为 (Multiple) 的题为多选题（可有多个正确选项）；其余为单选题。
每题先是英文题目与选项，紧接中文翻译。答案后给出简短清晰的解释（中英对照）。

Q1 (Single)

Which of the following best describes supervised learning?
A. The model learns from labeled data
B. The model finds hidden patterns without labels
C. The model interacts with an environment to maximize rewards
D. The model clusters similar data points together

哪项最能描述监督学习？
A. 模型从有标签数据中学习
B. 模型在无标签数据中发现隐藏模式
C. 模型与环境交互以最大化回报
D. 模型将相似数据点聚类

Answer / 答案: A
Explanation / 解释:
Supervised learning requires labeled targets (Y) and learns mapping X → Y (classification/regression).
监督学习依赖标签 Y，通过样本学习 X→Y 的映射（分类或回归）。

Q2 (Single)

Which of the following is NOT a supervised learning task?
A. Classification
B. Regression
C. Clustering
D. Decision Trees

下列哪项不是监督学习任务？
A. 分类
B. 回归
C. 聚类
D. 决策树

Answer / 答案: C
Explanation / 解释:
Clustering is unsupervised (no labels). Decision trees are a supervised algorithm when trained with labels.
聚类属于无监督学习（无标签）；决策树在有标签时是监督算法。

Q3 (Single)

K-Means algorithm belongs to which learning type?
A. Supervised
B. Unsupervised
C. Reinforcement
D. Semi-supervised

K-Means 属于哪类学习？
A. 监督学习
B. 非监督学习
C. 强化学习
D. 半监督学习

Answer / 答案: B
Explanation / 解释:
K-Means clusters unlabeled data into groups — classic unsupervised method.
K-Means 对无标签数据做聚类，是典型的非监督方法。

Q4 (Single)

Which metric is most misleading on highly imbalanced classification data?
A. Precision
B. Recall
C. Accuracy
D. AUC

在高度不平衡分类任务中，哪个指标最容易误导？
A. 精确率
B. 召回率
C. 准确率
D. AUC

Answer / 答案: C
Explanation / 解释:
Accuracy can be high by predicting the majority class always; it hides poor minority-class detection.
准确率在不平衡数据上可能很高（总预测为多数类），掩盖了对少数类的遗漏。

Q5 (Single)

Which loss is commonly used for regression tasks?
A. Cross-entropy
B. Mean Squared Error (MSE)
C. Hinge loss
D. Log loss

回归任务常用哪种损失？
A. 交叉熵
B. 均方误差（MSE）
C. Hinge 损失
D. 对数损失

Answer / 答案: B
Explanation / 解释:
MSE measures squared difference between predictions and targets — standard for regression.
MSE 用于衡量预测与真实值的平方误差，是回归常用损失。

Q6 (Single)

Which activation/function compresses linear output into [0,1] for binary classification?
A. ReLU
B. Sigmoid
C. Softmax
D. Tanh

哪种函数会把线性输出压缩到 [0,1] 以用于二分类？
A. ReLU
B. Sigmoid
C. Softmax
D. Tanh

Answer / 答案: B
Explanation / 解释:
Sigmoid maps real numbers to (0,1), interpretable as probability for binary classification.
Sigmoid 将实数映射到 (0,1)，便于作为二分类概率解释。

Q7 (Multiple)

Which methods help reduce overfitting? (Multiple)
A. Regularization (L1/L2)
B. Cross-validation
C. Early stopping
D. Increasing model complexity
E. Training until validation loss increases

以下哪些方法可降低过拟合？（多选）
A. 正则化（L1/L2）
B. 交叉验证
C. 提前停止
D. 增加模型复杂度
E. 在验证损失上升时继续训练

Answer / 答案: A, B, C
Explanation / 解释:
Regularization, CV and early stopping reduce overfitting. Increasing complexity and training past validation deterioration worsen overfitting.
正则化、交叉验证、提前停止能抑制过拟合；增加复杂度或在验证损失开始上升时继续训练会加剧过拟合。

Q8 (Single)

What is underfitting generally associated with?
A. High variance
B. High bias
C. Perfect training accuracy
D. Large model capacity

欠拟合通常与以下哪项相关？
A. 高方差
B. 高偏差
C. 训练准确率完美
D. 大模型容量

Answer / 答案: B
Explanation / 解释:
Underfitting = high bias (model too simple to capture patterns).
欠拟合即高偏差：模型过于简单，无法拟合数据规律。

Q9 (Single)

If a trained model has very high train accuracy but low test accuracy, this indicates:
A. Underfitting
B. Overfitting
C. Good generalization
D. Data leakage

训练集准确率高但测试集准确率低表示：
A. 欠拟合
B. 过拟合
C. 良好泛化
D. 数据泄漏

Answer / 答案: B
Explanation / 解释:
Model memorized training data (low generalization) — classic overfitting.
模型记住训练数据但不能泛化，属于过拟合。

Q10 (Single)

Which metric is threshold-independent and useful for imbalanced data?
A. Precision
B. Recall
C. F1-score
D. AUC (ROC area)

哪个指标与阈值无关，适合不平衡数据？
A. 精确率
B. 召回率
C. F1 分数
D. AUC（ROC 曲线下面积）

Answer / 答案: D
Explanation / 解释:
AUC measures ranking ability across thresholds, robust for imbalanced sets.
AUC 在不同阈值下评估模型排序能力，对不平衡数据更稳定。

Q11 (Single)

Which of the following is a density-based clustering algorithm?
A. K-Means
B. DBSCAN
C. Hierarchical Agglomerative
D. Gaussian Mixture Model

以下哪项是基于密度的聚类算法？
A. K-Means
B. DBSCAN
C. 层次聚合聚类
D. 高斯混合模型

Answer / 答案: B
Explanation / 解释:
DBSCAN defines clusters by dense regions and can detect arbitrary-shaped clusters and noise.
DBSCAN 基于点密度，能发现任意形状簇并处理噪声。

Q12 (Multiple)

Which are common ways to handle class imbalance? (Multiple)
A. Assign higher weights to minority class
B. Create synthetic minority samples (e.g., SMOTE)
C. Undersample majority class
D. Only use accuracy as metric

处理类别不平衡的常见方法有哪些？（多选）
A. 给少数类更高权重
B. 生成少数类合成样本（如 SMOTE）
C. 对多数类欠采样
D. 只用准确率作为度量

Answer / 答案: A, B, C
Explanation / 解释:
Weighting, synthetic sampling, and undersampling are standard. Accuracy-only is inappropriate for imbalance.
加权、合成样本、欠采样是常用办法；只看准确率在不平衡情形不可取。

Q13 (Single)

Which algorithm is most suitable for high-dimensional text features (sparse TF-IDF) in spam detection?
A. Linear models (Logistic/Linear SVM)
B. K-Means
C. Decision Trees (without pruning)
D. DBSCAN

在高维稀疏文本（TF-IDF）用于垃圾邮件时，最适合的算法是？
A. 线性模型（逻辑回归/线性 SVM）
B. K-Means
C. 决策树（不剪枝）
D. DBSCAN

Answer / 答案: A
Explanation / 解释:
Linear models scale well to high-dimensional sparse features and often perform strongly in text classification.
线性模型对高维稀疏向量（如 TF-IDF）高效且表现好。

Q14 (Single)

What is the primary objective when training a regression model?
A. Maximize number of classes
B. Minimize prediction error (e.g., MSE)
C. Maximize precision
D. Maximize recall

训练回归模型的主要目标是什么？
A. 增加类别数
B. 最小化预测误差（例如 MSE）
C. 最大化精确率
D. 最大化召回率

Answer / 答案: B
Explanation / 解释:
Regression minimizes prediction error such as mean squared error to best fit continuous targets.
回归目标是最小化预测误差（如 MSE），拟合连续目标值。

Q15 (Single)

What does the sigmoid function output represent in logistic regression?
A. Class label directly (0 or 1)
B. A probability between 0 and 1
C. Distance to nearest centroid
D. Error term

逻辑回归中 sigmoid 函数输出表示什么？
A. 直接类别标签（0 或 1）
B. 介于 0 和 1 的概率
C. 到最近质心的距离
D. 误差项

Answer / 答案: B
Explanation / 解释:
Sigmoid maps linear score to probability; threshold (e.g., 0.5) converts to label.
Sigmoid 将线性分数转为概率，再用阈值（如 0.5）映射为标签。

Q16 (Multiple)

Which statements about cross-validation are true? (Multiple)
A. It uses different train/validation splits to get stable estimates
B. k-fold CV always increases overfitting
C. Stratified k-fold preserves class proportions per fold
D. CV replaces need for a final test set

关于交叉验证，哪些说法正确？（多选）
A. 它使用不同训练/验证划分以获得稳定估计
B. k 折 CV 总是增加过拟合
C. 分层 k 折会在每折中保留类别比例
D. CV 可以替代最终测试集的需求

Answer / 答案: A, C
Explanation / 解释:
CV gives robust performance estimates; stratified CV keeps class ratios. CV does not inherently increase overfitting and does not replace a final hold-out test set for unbiased evaluation.
CV 提供稳定评估；分层 CV 保持类别比例。CV 不能替代最终独立测试集。

Q17 (Single)

What is the role of regularization (L2) in linear models?
A. Increase weights magnitude
B. Penalize large weights to reduce overfitting
C. Convert classification into regression
D. Add more features

正则化（L2）在线性模型中的作用是？
A. 增大权重幅度
B. 惩罚大权重以减少过拟合
C. 将分类转换为回归
D. 增加更多特征

Answer / 答案: B
Explanation / 解释:
L2 penalizes large coefficients, shrinking them and thus reducing model complexity/variance.
L2 惩罚大权重，使权重收缩，降低复杂度和方差，抑制过拟合。

Q18 (Single)

Which of these is NOT a dimensionality reduction technique?
A. PCA
B. t-SNE
C. One-hot encoding
D. Randomized PCA

以下哪个不是降维技术？
A. PCA
B. t-SNE
C. 独热编码（One-hot）
D. 随机化 PCA

Answer / 答案: C
Explanation / 解释:
One-hot increases dimensionality (sparse encoding), not reduce it. PCA/t-SNE reduce dimensions.
独热编码通常将类别扩展为高维稀疏向量，不是降维方法。

Q19 (Multiple)

Which metrics combine precision and recall? (Multiple)
A. F1-score
B. Accuracy
C. ROC AUC
D. Fβ-score

哪些指标结合了精确率与召回率？（多选）
A. F1 分数
B. 准确率
C. ROC AUC
D. Fβ 分数

Answer / 答案: A, D
Explanation / 解释:
F1 is harmonic mean of precision & recall; Fβ generalizes weighting recall more. Accuracy and AUC are different concepts.
F1 和 Fβ 都基于精确率与召回率的调和/加权平均；Accuracy/AUC 不是直接组合这两者。

Q20 (Single)

Which situation suggests you should collect more data according to the lecture flowchart?
A. Dataset size < 50 samples
B. Dataset perfectly balanced
C. Using deep learning with millions of samples
D. Model is linear and performs well

根据课堂流程图，什么时候应该收集更多数据？
A. 数据集 < 50 个样本
B. 数据集完全平衡
C. 在用深度学习并已有数百万样本
D. 模型为线性且表现良好

Answer / 答案: A
Explanation / 解释:
If sample size is very small (<~50), collect more data before modeling. (Lecture guidance).
课堂建议：样本过少（如 <50）应先收更多数据再建模。

Q21 (Single)

What does an AUC close to 1.0 indicate?
A. Random classifier
B. Perfect ranking capability
C. Always predicts negative
D. Model overfits

AUC 近 1.0 表示什么？
A. 随机分类器
B. 完美的排序能力
C. 总是预测负类
D. 模型过拟合

Answer / 答案: B
Explanation / 解释:
AUC ~1.0 indicates model distinguishes positive/negative well across thresholds (excellent).
AUC 接近 1 表示模型在不同阈值下具有很强的区分能力。

Q22 (Single)

Which is true about Decision Trees?
A. They always generalize well without tuning
B. They can overfit if depth is uncontrolled
C. They require data to be linear
D. They cannot handle categorical variables

关于决策树，哪个说法正确？
A. 不需调参总能很好泛化
B. 如果深度不受控会过拟合
C. 要求数据线性
D. 不能处理类别型变量

Answer / 答案: B
Explanation / 解释:
Deep/untuned trees overfit; trees can handle nonlinearity and categorical features.
深树容易过拟合；决策树能捕捉非线性并处理类别变量（通常通过分裂）。

Q23 (Multiple)

Which of the following are examples of supervised learning tasks? (Multiple)
A. House price prediction
B. Customer segmentation by K-Means
C. Spam detection
D. PCA for visualization

下列哪些是监督学习任务？（多选）
A. 房价预测
B. 用 K-Means 做客户分群
C. 垃圾邮件检测
D. PCA 可视化

Answer / 答案: A, C
Explanation / 解释:
House price (regression) and spam detection (classification) are supervised. K-Means and PCA are unsupervised.
房价预测与垃圾邮件检测有标签属于监督；K-Means、PCA 属于无监督。

Q24 (Single)

Which technique is often used to synthesize new minority-class samples?
A. PCA
B. SMOTE
C. KNN (k=1)
D. Standard scaling

常用来合成少数类样本的技术是？
A. PCA
B. SMOTE
C. KNN (k=1)
D. 标准化

Answer / 答案: B
Explanation / 解释:
SMOTE creates synthetic minority samples by interpolating between neighbors.
SMOTE 通过邻居插值生成少数类合成样本，常用于不平衡问题处理。

Q25 (Single)

Which of the following best describes PCA?
A. Supervised classification algorithm
B. Dimensionality reduction via orthogonal projection
C. Density-based clustering
D. Method to increase feature count

PCA 最佳描述是？
A. 监督分类算法
B. 通过正交投影做降维
C. 基于密度的聚类
D. 增加特征数量的方法

Answer / 答案: B
Explanation / 解释:
PCA finds orthogonal components that capture maximum variance, reducing dimensions.
PCA 找到正交主成分以保留方差并降低维度。

Q26 (Single)

Which is a typical first step in supervised learning pipeline?
A. Model deployment
B. Data cleaning and EDA
C. Feature scaling after deployment
D. Final report writing

监督学习流程的典型第一步是？
A. 模型部署
B. 数据清洗与探索性数据分析（EDA）
C. 部署后做特征缩放
D. 撰写最终报告

Answer / 答案: B
Explanation / 解释:
Begin with data cleaning and exploratory analysis before feature engineering and modeling.
先做数据清洗与 EDA，再进行特征工程与建模。

Q27 (Multiple)

Which are reasons to use unsupervised learning? (Multiple)
A. No labels available
B. Discover hidden groupings
C. Direct prediction of future sales price
D. Dimensionality reduction for visualization

使用无监督学习的原因有哪些？（多选）
A. 没有标签
B. 发现隐藏群组
C. 直接预测未来销售价格
D. 用于降维可视化

Answer / 答案: A, B, D
Explanation / 解释:
Unsupervised is for unlabeled data: discover clusters, reduce dimensions; it doesn't directly predict future targets.
无监督用于无标签情况、分群与降维可视化，但不直接用于预测目标变量。

Q28 (Single)

Which of these is a sign of data leakage?
A. Validation performance much better than expected because validation used information derived from the test set
B. Using early stopping on validation set
C. Standardizing features using training statistics only
D. Using cross-validation properly

下列哪项表明存在数据泄漏？
A. 验证表现异常好，因为验证使用了来自测试集的信息
B. 在验证集上使用提前停止
C. 仅用训练集统计量做标准化
D. 正确使用交叉验证

Answer / 答案: A
Explanation / 解释:
Leakage occurs when information from test/validation leaks into training, inflating performance.
当验证/测试信息流入训练（或在选择特征时使用未来信息）会导致性能虚高，属于数据泄漏。

Q29 (Single)

What is early stopping used for?
A. Make training slower
B. Stop training when validation loss stops improving to avoid overfitting
C. Increase model capacity
D. Replace cross-validation

提前停止（early stopping）用于什么？
A. 让训练变慢
B. 当验证损失不再改善时停止训练以防过拟合
C. 增加模型容量
D. 替代交叉验证

Answer / 答案: B
Explanation / 解释:
Stop training when validation performance deteriorates to prevent overfitting.
当验证集 Loss 停止改善或上升时停止训练，可防止继续学习噪声导致过拟合。

Q30 (Single)

Which of the following best explains "bias" in ML context?
A. Variability of model predictions across datasets
B. Error from simplifying assumptions in the model
C. The model's runtime complexity
D. The size of training data

在机器学习中，“偏差（bias）”最好的解释是？
A. 模型在不同数据集上预测的波动性
B. 模型简化假设造成的误差
C. 模型运行时间复杂度
D. 训练数据大小

Answer / 答案: B
Explanation / 解释:
Bias is error due to model assumptions (too simple to capture true relationships).
偏差来源于模型的简化假设（模型太简单），导致系统性误差。

Q31 (Multiple)

Which algorithms are tree-based ensemble methods? (Multiple)
A. Random Forest
B. XGBoost
C. K-Means
D. AdaBoost

哪些算法是基于树的集成方法？（多选）
A. 随机森林
B. XGBoost
C. K-Means
D. AdaBoost

Answer / 答案: A, B, D
Explanation / 解释:
Random Forest, XGBoost, AdaBoost are tree-based ensembles; K-Means is clustering.
随机森林、XGBoost、AdaBoost 为基于树的集成方法；K-Means 用于聚类。

Q32 (Single)

Which evaluation shows how many positive samples were correctly identified?
A. Precision
B. Recall (TPR)
C. FPR
D. Accuracy

哪个评价显示被正确识别的正样本占真实正样本的比例？
A. 精确率
B. 召回率（真正率）
C. 假阳性率
D. 准确率

Answer / 答案: B
Explanation / 解释:
Recall = TP / (TP + FN): fraction of actual positives correctly found.
召回率衡量真实为正中被正确识别的比例（TP/(TP+FN)）。

Q33 (Single)

Which approach can be used to create features from text?
A. TF-IDF
B. One-hot categorical for large vocab equally sized
C. Scaling continuous features only
D. K-Means on raw text

下列哪种方法可用于从文本创建特征？
A. TF-IDF
B. 对大词汇表进行等长独热编码
C. 仅缩放连续特征
D. 在原始文本上直接做 K-Means

Answer / 答案: A
Explanation / 解释:
TF-IDF creates numeric features representing term importance — standard for text classification.
TF-IDF 将文本转为数值特征，反映词频与逆文档频率，常用于文本分类。

Q34 (Single)

Which of the following is NOT true about DBSCAN?
A. It can find clusters of arbitrary shape
B. It requires the number of clusters k as input
C. It labels noise points as outliers
D. It uses density parameters (eps, min_samples)

关于 DBSCAN，哪项不正确？
A. 能发现任意形状的簇
B. 需要输入簇数 k
C. 会把噪声点标为异常
D. 使用密度参数（eps, min_samples）

Answer / 答案: B
Explanation / 解释:
DBSCAN does not require k; it uses density thresholds instead.
DBSCAN 不需指定簇数 k，而是依据密度参数划分簇并识别噪声。

Q35 (Single)

Which step should be done before applying many ML algorithms when features have different scales?
A. Feature scaling (standardization/normalization)
B. Increase learning rate
C. Use deeper neural network
D. One-hot encode continuous variables

在特征尺度不同时，许多算法前应先做什么？
A. 特征缩放（标准化/归一化）
B. 增大学习率
C. 使用更深的神经网络
D. 对连续变量做独热编码

Answer / 答案: A
Explanation / 解释:
Scaling ensures features contribute comparably (important for distance-based and gradient methods).
特征缩放让不同量纲的特征在模型训练中具有可比贡献，尤其对距离或梯度算法很重要。

Q36 (Multiple)

Which of the following are true about logistic regression? (Multiple)
A. Outputs probabilities via sigmoid
B. Can be regularized with L1 or L2
C. Only suitable for regression tasks
D. Decision boundary is linear in input space

关于逻辑回归，哪些说法正确？（多选）
A. 通过 sigmoid 输出概率
B. 可用 L1 或 L2 正则化
C. 仅适合回归任务
D. 在输入空间中决策边界是线性的

Answer / 答案: A, B, D
Explanation / 解释:
Logistic outputs probabilities, supports regularization, and has linear decision boundary; it's for classification (not regression).
逻辑回归输出概率，可正则化，其决策边界在原始特征空间通常是线性的，并用于分类任务。

Q37 (Single)

Which of the following is a sign of a good clustering result when using K-Means?
A. High within-cluster variance
B. Low inertia (sum of squared distances to centroids)
C. Very large number of clusters always better
D. Clusters contain single unique sample each

使用 K-Means 时，哪个现象表明聚类结果良好？
A. 组内方差高
B. 惯性（到质心平方和）低
C. 簇数越多越好
D. 每簇只有一个样本

Answer / 答案: B
Explanation / 解释:
Lower inertia indicates tighter clusters; very many clusters or single-sample clusters are not necessarily desirable.
较低的惯性意味着簇内更紧凑；过多簇或每簇仅一样本通常不理想。

Q38 (Single)

What does "pseudo-labeling" refer to?
A. Using generated labels on unlabeled data via model predictions to augment training
B. Removing labels from dataset
C. Manually relabeling all data
D. Using PCA to create labels

“伪标签”指的是什么？
A. 用模型预测的标签给无标签数据打标以扩充训练集
B. 从数据集中移除标签
C. 手动重新标注所有数据
D. 用 PCA 创建标签

Answer / 答案: A
Explanation / 解释:
Pseudo-labeling uses model predictions (with high confidence) as labels for unlabeled data in semi-supervised workflows.
伪标签是在半监督中用模型对无标签数据的高置信预测作为标签来扩充训练数据。

Q39 (Single)

Which of the following best describes "feature engineering"?
A. Automatically training models without human input
B. Creating or transforming features to improve model performance
C. Deploying model to production
D. Evaluating model with test data

哪项最能描述特征工程？
A. 无需人工自动训练模型
B. 创建或转换特征以提升模型性能
C. 将模型部署到生产环境
D. 用测试数据评估模型

Answer / 答案: B
Explanation / 解释:
Feature engineering is human-guided creation/transformation of inputs (e.g., bins, interactions, aggregates) to help models learn.
特征工程是人工设计/转换输入特征（如分箱、交互、聚合），以提升模型表现。

Q40 (Multiple)

Which of the following are true for ROC curve? (Multiple)
A. Plots TPR vs FPR at various thresholds
B. Area under ROC (AUC) = 0.5 indicates random classifier
C. ROC depends on specific classification threshold only
D. ROC is useful for imbalanced datasets

以下关于 ROC 曲线哪些正确？（多选）
A. 在不同阈值下绘制 TPR 对 FPR 曲线
B. AUC = 0.5 表示随机分类器
C. ROC 只与某个特定阈值有关
D. ROC 对不平衡数据有用

Answer / 答案: A, B, D
Explanation / 解释:
ROC summarizes TPR/FPR across thresholds; AUC=0.5 is random; ROC/AUC are useful for imbalanced data since threshold-independent.
ROC 在不同阈值下评估模型；AUC=0.5 表明随机性能；ROC/AUC 在不平衡数据场景常被采用。

Q41 (Single)

Which is a downside of very deep decision trees?
A. Better interpretability
B. Low computational cost on training
C. Prone to overfitting
D. Always better test performance

非常深的决策树的缺点是什么？
A. 更易解释
B. 训练计算成本低
C. 易过拟合
D. 总是测试表现更好

Answer / 答案: C
Explanation / 解释:
Deep trees fit noise and tend to overfit unless pruned or regularized.
深树容易拟合噪声，若不剪枝或正则化会导致过拟合。

Q42 (Single)

What does "inertia" measure in K-Means?
A. Number of clusters
B. Sum of squared distances of samples to their nearest cluster center
C. Ratio of between-cluster variance to within-cluster variance
D. Density of a cluster

K-Means 中的“惯性（inertia）”测量什么？
A. 簇数
B. 样本到最近质心的平方距离和
C. 簇间方差与簇内方差比
D. 簇的密度

Answer / 答案: B
Explanation / 解释:
Inertia = sum of squared distances to centroid; lower is tighter clustering.
惯性是样本到其簇质心平方距离之和，越小表示簇更紧凑。

Q43 (Single)

Which of these is true about SMOTE?
A. It undersamples majority class
B. It oversamples minority class by creating synthetic samples
C. It always fixes class imbalance perfectly without side effects
D. It is a clustering algorithm

关于 SMOTE 哪项正确？
A. 它对多数类欠采样
B. 通过生成合成样本对少数类过采样
C. 总能完美无副作用地解决不平衡
D. 它是聚类算法

Answer / 答案: B
Explanation / 解释:
SMOTE synthesizes new minority examples; it can help but may introduce noise or overlapped classes.
SMOTE 生成少数类合成样本，有时改善不平衡，但可能带来噪声或类重叠问题。

Q44 (Single)

Which is a typical reason to use stratified k-fold CV?
A. Ensure equal fold sizes only
B. Maintain class distribution in each fold for classification tasks
C. Speed up training dramatically
D. Avoid feature scaling

使用分层 k 折 CV 的典型原因是什么？
A. 仅确保折大小相等
B. 在分类任务中保持每折的类别分布
C. 显著加快训练
D. 避免特征缩放

Answer / 答案: B
Explanation / 解释:
Stratified CV keeps class proportions in folds, important for imbalanced classification.
分层抽样确保各折中类别比例一致，对不平衡分类尤其重要。

Q45 (Multiple)

Which items are part of a standard ML pipeline? (Multiple)
A. Data cleaning & EDA
B. Feature engineering & scaling
C. Model selection & hyperparameter tuning
D. Ignoring validation and testing

标准机器学习流水线包含哪些环节？（多选）
A. 数据清洗与探索性分析
B. 特征工程与缩放
C. 模型选择与超参调优
D. 忽略验证与测试

Answer / 答案: A, B, C
Explanation / 解释:
A robust pipeline includes cleaning, feature prep, model training and tuning; ignoring validation/test is incorrect.
完整流程包含数据清洗、特征工程、模型选择与调参；不能忽略验证或测试步骤。

Q46 (Single)

Which dataset size suggestion was mentioned in lecture as a rough guideline?
A. Less than 50 is fine for deep learning
B. 1000+ samples are commonly recommended for practical ML projects
C. Always need millions of samples for simple regression
D. Dataset size never matters

课堂中提及的粗略数据量建议是什么？
A. 少于 50 适合深度学习
B. 实践项目常建议 1000+ 样本
C. 简单回归总需数百万样本
D. 数据量无关紧要

Answer / 答案: B
Explanation / 解释:
Lecture suggested typical datasets of a few hundred to thousands+; avoid extremely small (<50) sets.
课堂建议数据量通常在数百到数万范围，过少（<50）需先收集更多数据。

Q47 (Single)

Which of the following is commonly used to evaluate regression models?
A. MSE / RMSE
B. Precision
C. ROC AUC
D. Confusion Matrix

下面哪个常用于评估回归模型？
A. MSE / RMSE
B. 精确率
C. ROC AUC
D. 混淆矩阵

Answer / 答案: A
Explanation / 解释:
MSE/RMSE measure average squared error (or its root) — standard regression metrics.
MSE/RMSE 衡量平均平方误差（或其平方根），是回归的常用指标。

Q48 (Multiple)

Which statements are correct about model deployment considerations? (Multiple)
A. Monitor model performance in production
B. Retrain only once and never again
C. Watch for data drift and concept drift
D. Ensure reproducible preprocessing/feature pipeline

关于模型部署，下列哪些说法正确？（多选）
A. 在生产中监控模型性能
B. 只训练一次，永不再训练
C. 关注数据漂移和概念漂移
D. 确保预处理/特征流水线可重现

Answer / 答案: A, C, D
Explanation / 解释:
Deployment needs monitoring, retraining on drift, and reproducible pipelines. “Train once and never retrain” is false.
部署时需监控性能、注意数据/概念漂移并保证预处理可重现；不应认为永不再训。

Q49 (Single)

In churn prediction, which metric is usually prioritized to avoid losing customers?
A. Precision
B. Recall
C. Accuracy
D. Inertia

在流失预测中，为了避免丢失客户，通常优先考虑哪个指标？
A. 精确率
B. 召回率
C. 准确率
D. 惯性

Answer / 答案: B
Explanation / 解释:
Recall (finding as many churners as possible) is prioritized to minimize missed churners (FN).
召回率能减少漏判流失（FN），对业务更重要。

Q50 (Multiple)

Which of the following actions can help when features are highly correlated (multicollinearity)? (Multiple)
A. Remove or combine correlated features
B. Use regularization (Ridge)
C. Ignore and use OLS without change
D. Use dimensionality reduction (PCA)

当特征高度相关时，可采取哪些措施？（多选）
A. 删除或合并相关特征
B. 使用正则化（岭回归）
C. 忽略并直接用普通最小二乘法（OLS）
D. 使用降维（PCA）

Answer / 答案: A, B, D
Explanation / 解释:
Remove/combine features, apply regularization, or reduce dimensions help; ignoring collinearity with OLS can make coefficients unstable.
删除/合并变量、用正则化或降维是常见解决方法；直接忽略会导致系数不稳。

好的——下面是再次生成的 50 题（单选 / 多选），完全覆盖课程中所有基础内容与重要考点（第 1–8 章 + 课堂/练习要点）。每题是中英双语题目与选项，并紧接着给出 答案（Answer / 答案） 与 详细解释（Explanation / 详细解释）（中英双语）。题型有单选与多选（标注为 Multiple 的为多选）。
我已尽量把每个知识点都覆盖到：定义、算法用途、评价指标、预处理、正则化、泛化、交叉验证、聚类、降维、不平衡处理、文本特征、模型部署、常见陷阱（如数据泄漏）等。

说明：题目编号 1–50。多选题在题干后标明 (Multiple)。每题均有答案与中英解释。

Q1 (Single)

What does "supervised learning" require that unsupervised learning does not?
监督学习相比无监督学习需要下列哪项？
A. Labels (Y)
B. Features (X)
C. Clustering algorithm
D. Dimensionality reduction

Answer / 答案: A
Explanation / 解释:
Supervised learning needs labeled targets (Y) for training; unsupervised works only with X.
监督学习需要标签 Y 来学习映射关系；无监督仅使用特征 X。

Q2 (Single)

Which algorithm is typically used for binary classification and produces probabilities?
哪种算法常用于二分类并输出概率？
A. K-Means
B. Logistic Regression
C. PCA
D. DBSCAN

Answer / 答案: B
Explanation / 解释:
Logistic regression outputs probabilities via the sigmoid function and is a standard binary classifier.
逻辑回归通过 sigmoid 输出概率，是二分类的标准方法。

Q3 (Single)

Which loss is commonly used for binary classification models?
二分类模型常用哪种损失？
A. Mean Squared Error
B. Cross-Entropy (Log Loss)
C. Hinge Loss only
D. Silhouette Score

Answer / 答案: B
Explanation / 解释:
Binary cross-entropy (log loss) measures difference between predicted probabilities and true labels; it's standard for probabilistic classifiers.
二分类交叉熵衡量预测概率与真实标签差距，是概率分类器常用损失。

Q4 (Multiple)

Which of the following are true about softmax? (Multiple)
关于 softmax 哪些说法正确？（多选）
A. It maps a vector to a probability distribution over classes
B. It is used for multi-class (mutually exclusive) classification
C. It outputs values in (0,1) that sum to 1
D. It is identical to sigmoid for binary classification

Answer / 答案: A, B, C
Explanation / 解释:
Softmax converts scores to a probability distribution across classes (sum=1) and is used for mutually exclusive multi-class problems. For binary classification, sigmoid is commonly used; softmax with two outputs is mathematically related but not identical in practice.
softmax 将分数转为概率分布（和为 1），用于互斥多分类。二分类常用 sigmoid；softmax 的两类形式与 sigmoid 相关但用途与实现不同。

Q5 (Single)

Which metric should you prefer when false negatives are especially costly?
当漏报（FN）代价极高时，应优先关注哪个指标？
A. Precision
B. Recall
C. Accuracy
D. Inertia

Answer / 答案: B
Explanation / 解释:
Recall = TP / (TP + FN) measures how many actual positives are found; high recall reduces false negatives. Use when missing positives is costly (e.g., fraud, disease).
召回率衡量真实正样本中被识别的比例；高召回能减少漏报，适合漏报代价大的场景。

Q6 (Single)

What is "precision" measuring?
精确率（Precision）衡量的是什么？
A. Fraction of predicted positives that are true positives
B. Fraction of actual positives that are detected
C. Overall accuracy
D. Model complexity

Answer / 答案: A
Explanation / 解释:
Precision = TP / (TP + FP): among positive predictions, the proportion that are correct.
精确率=TP/(TP+FP)，表示被预测为正的样本中有多少是真正的正例。

Q7 (Single)

Which method reduces variance by averaging many variants of a model trained on bootstrap samples?
哪种方法通过对自助采样训练的多个模型取平均来降低方差？
A. Bagging (e.g., Random Forest)
B. Boosting
C. K-Means
D. PCA

Answer / 答案: A
Explanation / 解释:
Bagging builds many models on bootstrap samples and averages/votes to reduce variance; Random Forest is a bagging variant for trees.
Bagging 在 bootstrap 数据上训练多模型并取平局/投票，能降低方差，随机森林是树的 bagging 版本。

Q8 (Multiple)

Which of the following help with imbalanced classification? (Multiple)
以下哪些有助于处理类别不平衡？（多选）
A. Using class weights in loss function
B. Oversampling minority (e.g., SMOTE)
C. Undersampling majority class
D. Using accuracy as the only metric

Answer / 答案: A, B, C
Explanation / 解释:
Class weights, oversampling, undersampling help. Accuracy alone is misleading for imbalanced data.
加权、过采样、欠采样都是处理不平衡的常用方法；仅用准确率会造成误导。

Q9 (Single)

Which of the following is true about PCA?
关于 PCA 哪项为真？
A. It is supervised
B. It reduces dimensionality by maximizing variance
C. It guarantees class separation
D. It increases number of features

Answer / 答案: B
Explanation / 解释:
PCA is unsupervised and projects data to orthogonal components maximizing variance; it does not use labels nor guarantee separation by class.
PCA 是无监督的，通过正交变换保留最大方差以降维，但不考虑类别信息。

Q10 (Single)

What does a confusion matrix summarize?
混淆矩阵总结了什么？
A. Predictions vs actual labels for classification tasks
B. Correlation between features
C. Feature importance ranking
D. Cluster assignments only

Answer / 答案: A
Explanation / 解释:
Confusion matrix shows TP/FP/FN/TN counts comparing predicted vs actual labels—fundamental for classification evaluation.
混淆矩阵以表格形式展示预测与真实标签的组合（TP/FP/FN/TN），是分类评估基础。

Q11 (Single)

Which distance metric is most sensitive to magnitude differences and therefore often benefits from scaling?
哪种距离度量对数值幅度敏感，因此通常需要特征缩放？
A. Euclidean distance
B. Cosine similarity
C. Jaccard index
D. Hamming distance

Answer / 答案: A
Explanation / 解释:
Euclidean distance depends on absolute magnitudes; scaling/standardization ensures features contribute comparably. Cosine uses angles so less sensitive to scale.
欧氏距离依赖数值大小，需缩放；余弦关注方向，对尺度不敏感。

Q12 (Multiple)

Which of the following are true about cross-entropy loss for classification? (Multiple)
关于交叉熵损失，哪些说法正确？（多选）
A. It penalizes confident wrong predictions heavily
B. It is suitable for probabilistic outputs
C. It is only used for regression tasks
D. Minimizing it improves predicted probabilities

Answer / 答案: A, B, D
Explanation / 解释:
Cross-entropy penalizes confident wrong predictions and is suitable for probability outputs; minimizing it yields better-calibrated probabilities. It's not for regression.
交叉熵对确信的错误惩罚大，适用于概率输出，最小化可改善概率预测；不是回归损失。

Q13 (Single)

Which of the following statements about Gradient Descent is FALSE?
关于梯度下降，下列哪项是错误的？
A. Learning rate controls step size
B. It always converges to global minimum for non-convex loss
C. Too large learning rate may diverge training
D. It relies on gradients of loss wrt parameters

Answer / 答案: B
Explanation / 解释:
Gradient descent does not guarantee global minimum for non-convex losses (e.g., deep nets); it may find local minima or saddle points.
对于非凸函数（如深度网络），梯度下降不保证找到全局最优，只能趋向局部最优或鞍点。

Q14 (Single)

Which technique is best to encode high-cardinality categorical variables for tree-based models?
对于基于树的模型，哪种方式常用于编码高基数类别特征？
A. One-hot encoding (always)
B. Target / mean encoding with regularization
C. PCA on categories
D. TF-IDF

Answer / 答案: B
Explanation / 解释:
Target/mean encoding compresses high-cardinality categories into numeric values (with smoothing to avoid leakage). One-hot can explode dimensionality. Tree models can handle ordinal/mean-encoded features well.
目标均值编码在高基数情形下有效（需平滑/防泄漏），独热会维度爆炸。

Q15 (Multiple)

Which statements are true about model regularization? (Multiple)
A. L1 (Lasso) can produce sparse weights (feature selection)
B. L2 (Ridge) penalizes squared weights to shrink them
C. Regularization only applies to neural networks
D. Regularization helps reduce overfitting

Answer / 答案: A, B, D
Explanation / 解释:
L1 induces sparsity, L2 shrinks weights; regularization applies to many models (linear, tree prunings, neural nets) and reduces overfitting.
L1 可做特征选择，L2 收缩系数，正则化普遍适用并有助于抑制过拟合。

Q16 (Single)

Which evaluation approach is most appropriate to choose hyperparameters robustly?
选择超参数最稳健的评估方法是哪种？
A. Single train/test split
B. K-fold cross-validation
C. Only training loss
D. Random guess

Answer / 答案: B
Explanation / 解释:
K-fold CV averages performance across splits for robust hyperparameter selection; single split is susceptible to randomness.
k 折交叉验证在不同切分上平均性能，更稳健地选择超参。

Q17 (Single)

Which algorithm is most appropriate for anomaly detection where anomalies are rare and arbitrary-shaped clusters exist?
当异常稀少且簇形状任意时，最适合的算法是？
A. K-Means
B. DBSCAN
C. Linear Regression
D. Naive Bayes

Answer / 答案: B
Explanation / 解释:
DBSCAN can find arbitrary-shaped dense clusters and mark low-density points as noise/anomalies. K-Means assumes spherical clusters.
DBSCAN 能发现任意形状簇并识别低密度噪声，适用于异常检测；K-Means 假设簇近似球形。

Q18 (Multiple)

Which of the following data preprocessing steps can help gradient-based models converge faster? (Multiple)
下列哪些预处理有助于梯度类模型更快收敛？（多选）
A. Feature scaling (standardization)
B. Mean-centering
C. Random shuffling of data during mini-batch SGD
D. One-hot encoding of continuous variables

Answer / 答案: A, B, C
Explanation / 解释:
Scaling and centering help gradients be well-behaved; shuffling reduces correlation between batches. One-hot on continuous variables is wrong.
特征缩放/均值中心化能稳定梯度，随机打乱减小批间关联；应避免对连续变量做独热编码。

Q19 (Single)

Which technique helps visualize high-dimensional data clusters preserving local structure?
哪个方法在保持局部结构上可视化高维聚类效果？
A. t-SNE
B. PCA (first 2 PCs) always better
C. Linear Regression
D. Standard Scaling

Answer / 答案: A
Explanation / 解释:
t-SNE is designed to preserve local neighborhoods for visualization; PCA preserves global variance but may not reveal local clusters as well.
t-SNE 更擅长保持局部结构用于可视化，PCA 强调方差解释，可能丢失局部簇信息。

Q20 (Single)

Which of the following is a typical cause of data leakage?
下列哪个是数据泄漏的典型原因？
A. Feature calculated using future information (e.g., label-derived aggregates)
B. Using training statistics for scaling test data only
C. Using cross-validation properly
D. Train-test split done before feature engineering that uses only training info

Answer / 答案: A
Explanation / 解释:
Using features that incorporate future/label info leaks target information into predictors, inflating performance. Proper scaling uses training stats applied to test.
若特征包含未来或标签派生信息，会把答案泄露给模型，导致结果虚高。正确流程是用训练集统计量变换测试集。

Q21 (Single)

Which statement best describes overfitting?
下列哪项最能描述过拟合？
A. Low training error and low test error
B. Low training error but high test error
C. High training error and low test error
D. Model always underestimates

Answer / 答案: B
Explanation / 解释:
Overfitting: model fits noise, performs well on training but poorly on unseen test data.
过拟合指训练误差低但测试误差高，模型记住训练数据而不泛化。

Q22 (Single)

Which of the following is NOT a tree ensemble method?
下列哪项不是树集成方法？
A. Random Forest
B. XGBoost
C. KNN
D. AdaBoost

Answer / 答案: C
Explanation / 解释:
KNN is a lazy instance-based method, not a tree ensemble. Random Forest / XGBoost / AdaBoost are ensemble techniques.
KNN 基于实例检索，不是树的集成方法；其他三者为常见集成算法。

Q23 (Multiple)

Which are characteristics of KNN? (Multiple)
KNN 的特点有哪些？（多选）
A. Non-parametric, instance-based
B. Sensitive to feature scaling
C. Fast at prediction for large datasets
D. Requires choosing k and distance metric

Answer / 答案: A, B, D
Explanation / 解释:
KNN is non-parametric and lazy (stores instances). Sensitive to scaling and needs k and distance metric; prediction can be slow on large datasets.
KNN 非参数、惰性学习、对尺度敏感，需要设定 k 和距离度量；大数据时预测成本高。

Q24 (Single)

Which of the following describes "stratified sampling"?
下面哪项描述了“分层抽样”？
A. Ensuring each fold/sample preserves class proportions
B. Random sampling without regard to labels
C. Sampling to increase majority class only
D. Only used for regression

Answer / 答案: A
Explanation / 解释:
Stratified sampling preserves class distribution in splits/folds, important for imbalanced classification.
分层抽样在拆分数据时保持类别比例，对不平衡问题尤为重要。

Q25 (Single)

Which technique is used to measure clustering quality without labels?
哪种方法用于在无标签情况下衡量聚类质量？
A. Silhouette score
B. Precision
C. Recall
D. ROC AUC

Answer / 答案: A
Explanation / 解释:
Silhouette score measures how well samples are matched to their own cluster vs nearest other cluster; it's unsupervised.
轮廓系数衡量样本簇内紧密度与簇间分离度，可在无标签时评估聚类质量。

Q26 (Multiple)

Which of the following can be considered ensemble learning methods? (Multiple)
以下哪些可视为集成学习方法？（多选）
A. Bagging
B. Boosting
C. Stacking
D. PCA

Answer / 答案: A, B, C
Explanation / 解释:
Bagging (e.g., Random Forest), boosting (XGBoost), and stacking (meta-learner combining models) are ensemble strategies. PCA is dimensionality reduction.
Bagging、Boosting、Stacking 都是集成学习范式；PCA 则是降维技术。

Q27 (Single)

Which of the following is true about AUC-ROC vs Precision-Recall curve?
关于 AUC-ROC 与 PR 曲线，下列哪项为真？
A. PR curve is more informative with highly imbalanced data for positive class
B. ROC is always superior
C. PR ignores precision
D. AUC-ROC and PR identical

Answer / 答案: A
Explanation / 解释:
Precision-Recall curves focus on positive class performance and are more informative under high class imbalance; ROC can be overly optimistic.
在正类极少时，PR 曲线更能反映模型对正类的检出与误报权衡；ROC 在不平衡下可能误导。

Q28 (Single)

Which of the following best describes "early stopping"?
哪项最佳描述“提前停止”？
A. Stop training when validation performance deteriorates
B. Stop training after fixed epoch regardless of validation
C. Always train until zero training loss
D. Discard validation set

Answer / 答案: A
Explanation / 解释:
Early stopping halts training when validation loss stops improving to prevent overfitting.
提前停止在验证集性能不再改善时停止训练，以防过拟合。

Q29 (Single)

Which model type is most natural for multi-label classification (each sample can belong to multiple classes)?
哪种模型类型更适合多标签分类（样本可属于多个类）？
A. Softmax multi-class output
B. Independent sigmoid outputs per label
C. Clustering algorithms only
D. PCA

Answer / 答案: B
Explanation / 解释:
For multi-label tasks, use independent sigmoid outputs per label (each label treated as separate binary classification); softmax enforces mutual exclusivity.
多标签通常采用每个标签独立的 sigmoid（多二分类器）；softmax 适用于互斥多类。

Q30 (Single)

What is a common use case for one-hot encoding?
独热编码的常见用途是什么？
A. Encode categorical variables into binary vectors for models that require numeric input
B. Normalize continuous features
C. Reduce dimensionality
D. Represent text semantics fully

Answer / 答案: A
Explanation / 解释:
One-hot converts categorical values into binary indicator vectors so many ML models can ingest them; it increases dimensionality.
独热把类别变量转为二元向量，供多数 ML 算法使用，但会增加维度。

Q31 (Multiple)

Which of the following are correct practices for handling missing values? (Multiple)
以下哪些是处理缺失值的正确做法？（多选）
A. Impute using mean/median for numerical data (when appropriate)
B. Use model-based imputation (e.g., KNN imputer)
C. Drop rows/columns if missingness is small or columns irrelevant
D. Always fill with zero without checking

Answer / 答案: A, B, C
Explanation / 解释:
Mean/median imputation, model-based imputation, or dropping when justified are common; filling zeros blindly can introduce bias.
均值/中位数填补、基于模型填补或在合理情况下删除缺失严重的列/行是可行方法；不可盲目填 0。

Q32 (Single)

Which learning paradigm does reinforcement learning fall under?
强化学习属于哪种学习范式？
A. Supervised learning
B. Unsupervised learning
C. Interaction with environment for reward (Reinforcement)
D. Dimensionality reduction

Answer / 答案: C
Explanation / 解释:
Reinforcement learning learns via environment interaction and reward signals, distinct from supervised/unsupervised paradigms.
强化学习通过与环境交互和回报信号学习，属于独立范式，不同于监督/无监督。

Q33 (Single)

What is "model calibration"?
什么是“模型校准”？
A. Adjusting predicted probabilities so they reflect true likelihoods
B. Increasing model complexity
C. Scaling features to zero mean
D. Pruning decision trees

Answer / 答案: A
Explanation / 解释:
Calibration ensures predicted probabilities correspond to observed frequencies (e.g., 0.8 predicted occurs ~80% in reality). Techniques: Platt scaling, isotonic regression.
校准使预测概率与实际发生率一致（如预测 0.8 的事件约 80% 发生），常用 Platt 或等距回归等方法。

Q34 (Single)

Which of following is a reason to use log transformation on a feature?
对特征做对数变换的理由之一是什么？
A. Reduce skewness and compress large ranges
B. Make categorical variables numeric
C. Increase outliers' influence
D. Convert nominal to ordinal

Answer / 答案: A
Explanation / 解释:
Log transform reduces right skew and compresses large magnitude differences, often improving linear model behavior.
对数可减轻右偏、压缩大范围数值，改善线性假设下的拟合效果。

Q35 (Multiple)

Which of the following are typical hyperparameters for tree-based models? (Multiple)
以下哪些是树模型的典型超参数？（多选）
A. Max depth
B. Number of estimators (trees)
C. Kernel function
D. Learning rate (for boosting)

Answer / 答案: A, B, D
Explanation / 解释:
Tree hyperparameters include max depth, number of trees; boosting methods also have learning rate. Kernel is for SVM.
树模型常调深度、树数量；boosting 还要调学习率。核函数属于 SVM。

Q36 (Single)

Which algorithm would you choose for text classification with word-order importance (e.g., sentiment)?
文本分类并且词序重要（如情感分析），通常选哪种方法？
A. Bag-of-words + logistic regression only
B. Sequence models (RNN/Transformer) or models using embeddings
C. K-Means clustering
D. PCA on raw text

Answer / 答案: B
Explanation / 解释:
Sequence models or transformer-based models capture word order and contextual semantics; bag-of-words loses order information.
序列模型或基于嵌入的 Transformer 能捕捉词序和上下文，适合情感分析。

Q37 (Single)

Which metric is sensitive to class prevalence for binary classification and better used when positives are rare?
对于二分类且正例稀少时更合适且对类别比例敏感的指标是？
A. Recall and Precision (use PR curve)
B. Accuracy
C. Inertia
D. R-squared

Answer / 答案: A
Explanation / 解释:
Precision/Recall and PR curves focus on positive class performance and are informative with rare positives; accuracy is misleading.
PR 曲线与精确/召回关注正类，适合正例稀少问题；准确率会误导。

Q38 (Single)

Which of the following best describes "feature importance" from tree models?
决策树模型中的“特征重要性”通常表示什么？
A. Contribution of feature to reducing impurity / improving splits
B. Number of zeros in feature vector
C. Correlation with target only
D. CPU cost to compute the feature

Answer / 答案: A
Explanation / 解释:
Tree feature importance commonly derived from impurity reduction (Gini/entropy) or split gains; it quantifies contribution to predictive splits.
树的特征重要性通常基于该特征在分裂中降低不纯度或带来的增益来衡量。

Q39 (Single)

Which sampling method can increase minority class examples by interpolation between neighbors?
哪种采样方法通过在邻居间插值增加少数类样本？
A. Random oversampling by duplication
B. SMOTE
C. Random undersampling
D. Stratified sampling

Answer / 答案: B
Explanation / 解释:
SMOTE synthesizes new minority samples by interpolating feature vectors of nearest neighbors, rather than duplicating.
SMOTE 通过在少数类邻居之间插值合成新样本，区别于简单复制。

Q40 (Single)

Which validation approach is advised when time-series data has temporal order?
时间序列数据有时间顺序时，应采用何种验证方式？
A. Random k-fold CV
B. Time-series split / rolling window validation
C. Stratified k-fold ignoring time
D. Shuffle data then CV

Answer / 答案: B
Explanation / 解释:
Time-series splits preserve temporal order (train on past, validate on future) to avoid leakage. Random shuffles break chronology and cause leakage.
时间序列应用滚动窗口或前后分割，使训练在过去、验证在未来，避免信息泄漏。

Q41 (Single)

Which of the following is true about ROC AUC = 0.5?
ROC AUC = 0.5 表示什么？
A. Perfect classifier
B. Equivalent to random guessing
C. Always high recall
D. Model is calibrated

Answer / 答案: B
Explanation / 解释:
AUC=0.5 implies no discriminative ability; classifier ranks positives and negatives randomly.
AUC=0.5 表示模型无区分能力，等同随机猜测。

Q42 (Multiple)

Which of these are recommended when deploying a model to production? (Multiple)
部署模型到生产时推荐哪些做法？（多选）
A. Monitor data drift and model performance
B. Log predictions and inputs for debugging
C. Keep preprocessing steps reproducible and versioned
D. Never retrain the model

Answer / 答案: A, B, C
Explanation / 解释:
Production needs monitoring, logging, reproducible pipelines and retraining triggers; "never retrain" is incorrect.
生产环境需监控漂移、日志记录、可复现预处理与版本控制，并设定再训练策略。

Q43 (Single)

For multi-class classification, which averaging method treats all classes equally when computing F1?
多类分类中，哪个平均方法在计算 F1 时对所有类别一视同仁？
A. Micro-average
B. Macro-average
C. Weighted-average by support
D. Sample-average

Answer / 答案: B
Explanation / 解释:
Macro-average computes metric per class then averages equally (regardless of class frequency); micro weights by support.
Macro 对每类同等平均；micro 根据样本数加权。

Q44 (Single)

What does "bias" usually refer to in the context of estimators?
估计器中的“偏差”通常指什么？
A. Systematic difference between expected estimator and true value
B. Variability across training sets
C. Random noise in data
D. Size of training set

Answer / 答案: A
Explanation / 解释:
Bias is systematic error: expected estimator minus true parameter; variance is variability across datasets.
偏差是估计的期望值与真实值之间的系统性差异；方差衡量不同训练集下的波动。

Q45 (Multiple)

Which of the following are common techniques for feature selection? (Multiple)
常见特征选择技术有哪些？（多选）
A. Univariate statistical tests (chi-square, ANOVA)
B. Recursive feature elimination (RFE)
C. Model-based selection (L1 regularized)
D. Random assignment of weights

Answer / 答案: A, B, C
Explanation / 解释:
Univariate tests, RFE, and model-based (e.g., L1) are standard feature selection methods; random weighting is not.
单变量检验、递归特征消除、基于模型的选择（L1）是常见方法；随机加权不是。

Q46 (Single)

Which of the following best reduces multicollinearity?
下列哪项最能减少多重共线性？
A. Remove or combine correlated features / use PCA
B. Increase learning rate
C. Use K-Means
D. Reduce sample size

Answer / 答案: A
Explanation / 解释:
Dropping/combining correlated features or using PCA reduces collinearity; regularization also helps.
删除/合并相关特征或用 PCA 降维能缓解共线性；正则化也是常用手段。

Q47 (Single)

Which algorithm tends to perform well on tabular data with heterogeneous features and missing values?
在包含异构特征和缺失值的表格数据上，通常表现良好的算法是？
A. Tree-based ensembles (Random Forest / XGBoost)
B. Vanilla linear regression without preprocessing
C. KMeans
D. t-SNE

Answer / 答案: A
Explanation / 解释:
Tree ensembles handle heterogeneous features, are robust to missing values (some implementations), and capture nonlinearity.
树集成能处理不同类型特征、对缺失值和非线性有较强鲁棒性（部分实现自带缺失处理）。

Q48 (Single)

Which of the following will NOT help prevent overfitting?
下列哪项不会帮助防止过拟合？
A. Increase regularization strength
B. Add more labeled data
C. Increase model capacity without regularization
D. Use cross-validation for model selection

Answer / 答案: C
Explanation / 解释:
Increasing model capacity without control increases overfitting risk. Other options help generalization.
在没有控制的情况下增加模型容量通常加剧过拟合；正则化、更多数据、CV 都有助。

Q49 (Single)

Which of the following is an advantage of using randomized search over grid search for hyperparameter tuning?
与网格搜索相比，随机搜索的优势是什么？
A. More efficient when only a subset of hyperparameters matter
B. Always finds global optimum
C. Does not require cross-validation
D. Guarantees lower computational cost in all cases

Answer / 答案: A
Explanation / 解释:
Random search explores more hyperparameter space efficiently and is effective when only few hyperparameters strongly affect performance. It does not guarantee global optimum.
随机搜索在高维超参空间更高效，适用于只有部分超参重要的情况，但不保证找到全局最优。

Q50 (Multiple)

Which are correct descriptions of "semi-supervised learning"? (Multiple)
哪几项正确描述半监督学习？（多选）
A. Uses a small amount of labeled data plus large unlabeled data to improve learning
B. Relies only on unlabeled data with no labels at all
C. May use pseudo-labeling or consistency regularization
D. Cannot be combined with supervised methods

Answer / 答案: A, C
Explanation / 解释:
Semi-supervised combines limited labels with abundant unlabeled data (e.g., pseudo-labeling, consistency loss). It complements supervised methods, not excludes them.
半监督结合少量标签与大量无标签（如伪标签、一致性正则）提升性能，可与监督方法结合使用。

posted @ 2025-11-28 19:28 Stéphane 阅读(1) 评论(0) 收藏举报

刷新页面返回顶部