随笔分类 -  大数据

摘要:https://www.cnblogs.com/sddai/p/6110704.html 阅读全文
posted @ 2020-11-26 13:20 ChevisZhang 阅读(78) 评论(0) 推荐(0)
摘要:https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L8.pdf Mining Data Streams 1. 阅读全文
posted @ 2020-07-28 09:04 ChevisZhang 阅读(91) 评论(0) 推荐(0)
摘要:Pipeline: 1. process and learn from data 2. is a sequence of stages, the stage could be either a Transformer or an Estimator 3. input: DataFrame outpu 阅读全文
posted @ 2020-07-22 13:06 ChevisZhang 阅读(124) 评论(0) 推荐(0)
摘要:https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L7.pdf Machine Learning : 1. 阅读全文
posted @ 2020-07-20 12:16 ChevisZhang 阅读(215) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L6.pdf Table recall: 1. rows: entity 2. columns: attributes Spark SQL: 1. Spark SQL is not about sql, 阅读全文
posted @ 2020-07-18 23:06 ChevisZhang 阅读(140) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L5.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es NNS问题: 1. 对于两个d维向量需要计算 阅读全文
posted @ 2020-07-18 17:12 ChevisZhang 阅读(144) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L5.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es 阅读全文
posted @ 2020-07-01 17:10 ChevisZhang 阅读(118) 评论(0) 推荐(0)
摘要:又忘记保存了 血亏- - 阅读全文
posted @ 2020-07-01 08:00 ChevisZhang 阅读(122) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L4.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es MapReduce 总结: 1. MapRe 阅读全文
posted @ 2020-06-22 11:11 ChevisZhang 阅读(200) 评论(0) 推荐(0)
摘要:https://blog.csdn.net/weixin_38750084/article/details/82780519 sql中的group by一般用于 count() max()等函数时,一起用到,起到选择key主键的作用。 例子转自上文链接: 阅读全文
posted @ 2020-06-19 14:47 ChevisZhang 阅读(241) 评论(0) 推荐(0)
摘要:Resilient Distributed Dataset (RDD) https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es https://www.cse.unsw.edu.au/~cs9313/20T2/ 阅读全文
posted @ 2020-06-17 14:46 ChevisZhang 阅读(212) 评论(0) 推荐(0)
摘要:Resilient Distributed Dataset (RDD) https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L3.pdf 1.Features of RDD •In memory computation: 不同于mapreduce将inte 阅读全文
posted @ 2020-06-17 13:22 ChevisZhang 阅读(164) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L3.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es Introduction to MapRed 阅读全文
posted @ 2020-06-16 21:25 ChevisZhang 阅读(153) 评论(1) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L2.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es 回顾 3-replication 1)上周我 阅读全文
posted @ 2020-06-16 17:00 ChevisZhang 阅读(334) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L2.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es 第二节课花了40分钟讲,如果dataNode 阅读全文
posted @ 2020-06-15 17:11 ChevisZhang 阅读(135) 评论(0) 推荐(0)
摘要:https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L2.pdf https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es Hadoop: 1.•Stores big 阅读全文
posted @ 2020-06-15 15:30 ChevisZhang 阅读(144) 评论(0) 推荐(0)