[GPU] Machine Learning on C++

一、MPI为何物？

二、重新认识Spark

链接：https://www.zhihu.com/question/48743915/answer/115738668

马铁大神的phd thesis 总结里面说了一句话大概意思是说单纯的如果使用mpi 来实现一个算法比spark 快五六倍是很正常的但是spark 是一个 general 的 data flow 处理框架就是可以在数据的生命周期里面可以使用spark 之上的具体实现来处理数据 ml 只是一部分而已这就是spark 最大的卖点之一

所以你用这个Prophet平台来和spark 比 ml这方面的效率当然你要快了的因为还有很多ml 专业的平台都要比spark 快这就不列举了
因为spark 基于 mapreduce的这种program model 就不是适合ml的特别是ml 里面大量参数的模型比如lda 之类的

btw：如果作为一个严格的论文来看的话把spark 作为baseline 而不是做广泛的实验比较的话比如各种平台算法数据集算法

三、Microsoft Distributed Machine Learning Toolkit (DMTK)

链接来源：https://indico.cern.ch/event/605622/contributions/2482399/attachments/1418253/2172239/TMVA_ROOTMpi.pdf

Goto: https://github.com/Microsoft/DMTK

Ref: 微软分布式机器学习工具包DMTK——初窥门径

DMTK includes the following projects:

DMTK framework(Multiverso): The parameter server framework for distributed machine learning.
LightLDA: Scalable, fast and lightweight system for large-scale topic modeling.
LightGBM: LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Distributed word embedding: Distributed algorithm for word embedding implemented on multiverso.