2017 年 10月 20 日随笔档案 - wlu

2017年10月20日

摘要：调用Nndl实现的神经网络code，用ANN拟合二次方程。 ref: https://github.com/mnielsen/neural networks and deep learning 准备训练数据训练网络 a=[] f=[] for xi in np.array(xrange(0,100 阅读全文

posted @ 2017-10-20 13:36 wlu 阅读(2603) 评论(0) 推荐(0) 编辑

MLLib实践Naive Bayes

摘要：引言本文基于Spark (1.5.0) ml库提供的pipeline完整地实践一次文本分类。pipeline将串联单词分割(tokenize)、单词频数统计(TF)，特征向量计算(TF IDF)，朴素贝叶斯（Naive Bayes）模型训练等。本文将基于 "“20 NewsGroups”" 数据阅读全文

posted @ 2017-10-20 13:19 wlu 阅读(295) 评论(0) 推荐(0) 编辑

Debezium for PostgreSQL to Kafka

摘要： In this article, we discuss the necessity of segregate data model for read and write and use event sourcing for capture detailed data changing. These 阅读全文

posted @ 2017-10-20 13:18 wlu 阅读(3789) 评论(0) 推荐(0) 编辑

Apache Geode with Spark

摘要：在一些特定场景，例如streamingRDD需要和历史数据进行join从而获得一些profile信息，此时形成较小的新数据RDD和很大的历史RDD的join。 Spark中直接join实际上效率不高： RDD没有索引，join操作实际上是相互join的RDD进行hash然后shuffle到一起；实阅读全文

posted @ 2017-10-20 13:13 wlu 阅读(533) 评论(1) 推荐(0) 编辑

BigData and Machine Learning

公告