机器学习资料汇总
对于集成机器学习的应用, 我认为下面3个思路比较好:
- 模型训练/推理 使用 SK-learn 做模型训练, 使用 SHAP 做模型解释, 然后将模型通过 sklearn-onnx 项目将模型导出成 onnx, 然后使用 ML.Net 使用模型来推理.
- 模型训练仍然由SK-learn 完成, 另外推理也交由 SK-learn 完成, 但数据处理过程使用 C#/Java 来完成, 即工程化这块交过C#/Java, 数据处理的结果通过 duckdb 形式传到Python端完成推理过程.
- 模型训练/推理/数据处理都交由ML.net完成.
通用入门知识
https://developers.google.com/machine-learning/crash-course
https://github.com/microsoft/ML-For-Beginners
机器学习开源的数据集
https://archive.ics.uci.edu/datasets
该网站包含很多种类的数据集, 并给出了使用不同算法的performance, 非常适合学习. 比如预测收入的数据集, https://archive.ics.uci.edu/dataset/2/adult
ML.net sample使用的dataset
https://github.com/dotnet/machinelearning-samples/blob/main/docs/DATASETS.md
rapaio jar自带的dataset
https://padreati.github.io/rapaio/tutorials/BuiltinDataSets.html
Python
在机器学习和深度学习领域, python毫无疑问生态最好. 其中机器学习领域 sklearn + SHAP 算是最主流的.
Github Machine Learning Repositories for Data Scientists
https://www.geeksforgeeks.org/15-github-machine-learning-repositories-for-data-scientists/
https://www.geeksforgeeks.org/gradientboosting-vs-adaboost-vs-xgboost-vs-catboost-vs-lightgbm/?ref=asr3
该网页包含了各种常用的ML算法和计算框架和超参调优工具和可解释性工具
Using XGBoost in Python Tutorial
https://www.datacamp.com/tutorial/xgboost-in-python
https://www.datacamp.com/tutorial/decision-tree-classification-python
https://www.datacamp.com/tutorial/machine-learning-python
张宇翔同学的机器学习课程结课论文, 整体的非常好, 使用Python实践了大多数机器学习算法
https://zjtdzyx.github.io/machine-learning-project/
https://github.com/zjtdzyx/machine-learning-project
使用 xgboost 分析预测收入
https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Census income classification with XGBoost.html#Load-dataset
https://github.com/shap/shap/blob/master/notebooks/tabular_examples/tree_based_models/Census income classification with XGBoost.ipynb
https://www.kaggle.com/code/grayphantom/income-prediction-using-random-forest-and-xgboost
https://www.kaggle.com/code/apantazo/income-census-adult-xgboost#Feature-Engineering
C# 类库
C# 领域微软 ML.Net 是最主流的机器学习框架, 该框架的一个优点是, 经历了很多版本, 但概念和核心API一直没有变化.
微软ML.net cookbook
https://github.com/dotnet/machinelearning/blob/main/docs/code/MlNetHighLevelConcepts.md
https://github.com/dotnet/machinelearning/blob/main/docs/code/MlNetCookBook.md
该 cookbook 比较老了, 但基本概念仍然适用
官方的tutorial
https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/
PDF文章: Introduction to ML.NET
https://assets.ctfassets.net/9n3x4rtjlya6/1WpeTHDK1eIRe1Toj0w8mU/eff3ee2e8eb5ed98c11bc3e46a716379/100533_1238993435_Jeff_Prosise_Machine_learning_for_C_developers_Introducing_ML.NET.pdf
PDF书籍:
https://ptgmedia.pearsoncmg.com/images/9780137383658/samplepages/9780137383658_Sample.pdf
博文: ML.NET:
https://www.todaysoftmag.com/article/3286/machine-learning-101-with-microsoft-ml-net-part-1-3
https://rubikscode.net/2021/04/12/machine-learning-with-ml-net-evaluation-metrics/
https://rubikscode.net/2021/04/26/machine-learning-with-ml-net-sentiment-analysis/
https://rubikscode.net/2021/09/27/net-interactive-jupyter-notebooks/
https://rubikscode.net/2022/08/29/machine-learning-with-ml-net-introduction/
https://www.codemag.com/Article/1911042/ML.NET-Machine-Learning-for-.NET-Developers
https://www.microsoftpressstore.com/articles/article.aspx?p=3129454&seqNum=2
ML.Net 的示例项目, 包含很多示例, 并且代码包含数据集
https://github.com/jeffprosise/ML.NET
使用 ML.Net 的示例项目, 包含很多示例, 并且代码包含数据集
https://github.com/feiyun0112/machinelearning-samples.zh-cn/tree/master
https://github.com/dotnet/machinelearning-samples
使用 C# 实现的机器学习算法库
https://github.com/mdabros/SharpLearning
https://github.com/mdabros/XGBoostSharp
Java 类库
Smile — Statistical Machine Intelligence and Learning Engine
同时支持SHAP,
https://github.com/haifengl/smile
https://haifengl.github.io/regression.html
https://haifengl.github.io/quickstart.html
tribuo: Oracle 出的机器学习库, apache 许可
https://tribuo.org/
rapaio: 偏向统计的数据挖掘库
https://github.com/padreati/rapaio
https://padreati.github.io/rapaio/tutorials/BuiltinDataSets.html
为 Jupyter 增加 Java Kernel 功能
https://github.com/padreati/rapaio-jupyter-kernel
xgboost 官方提供的 xgboost4j 类库
https://xgboost.readthedocs.io/en/latest/jvm/java_intro.html#
https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/README.md
weka 数据挖掘工具, 包含很多经典机器学习算法
https://ml.cms.waikato.ac.nz/weka/
Java 的 deeplearning4j 项目
https://github.com/deeplearning4j/deeplearning4j

浙公网安备 33010602011771号