随笔档案「2016年9月」 - yuerspring

摘要：本文内容来自（Spark高级数据分析）阅读全文

posted @ 2016-09-26 21:04 yuerspring 阅读(302) 评论(0) 推荐(0)

摘要：勾勒一幅图阅读全文

posted @ 2016-09-24 21:40 yuerspring 阅读(279) 评论(0) 推荐(0)

摘要：Mysql 的读写分离与 DB2 中不同数据库（同库中的不同表也可以使用replication）的replication 是异曲同工db2 示例 ODS -----> DWDW table 1 ------> DW table2Mysql 主从复制的几种方案从数... 阅读全文

posted @ 2016-09-24 21:31 yuerspring 阅读(149) 评论(0) 推荐(0)

摘要：有两个概念需要弄清楚：1 ROC2 AUCpackage org.apache.spark.mllib.classificationimport org.apache.log4j.Loggerimport org.apache.log4j.Levelimport or... 阅读全文

posted @ 2016-09-24 20:17 yuerspring 阅读(1136) 评论(0) 推荐(0)

摘要：import org.apache.log4j.{Level, Logger}import org.apache.spark.mllib.classification.LogisticRegressionWithSGDimport org.apache.spark.m... 阅读全文

posted @ 2016-09-21 22:18 yuerspring 阅读(341) 评论(0) 推荐(0)

摘要：SQL 面试经典问题行列互相转化 1.行转列select 姓名 as 姓名 ,max(case 课程 when '语文' then 分数 else 0 end) 语文,max(case 课程 when '数学' then 分数 else 0 end) 数学,max(... 阅读全文

posted @ 2016-09-20 21:28 yuerspring 阅读(144) 评论(0) 推荐(0)

摘要：import java.io.PrintWriterimport org.apache.log4j.{Level, Logger}import org.apache.spark.mllib.linalg.SparseVectorimport org.apache.sp... 阅读全文

posted @ 2016-09-17 20:46 yuerspring 阅读(760) 评论(0) 推荐(0)

KafKa常用命令

摘要：kafka-0.9.0.1/bin/kafka-server-start.sh ../config/server.properties &bin/kafka-console-producer.sh --broker-list hadoop1:9092,hadoop2:... 阅读全文

posted @ 2016-09-17 14:45 yuerspring 阅读(91) 评论(0) 推荐(0)

机器学习重点与步骤

摘要：机器学习重点与步骤 1.找到数据集，提取特征向量训练集 & 测试集2.使用正确的机器算法3.保证高的可靠性阅读全文

posted @ 2016-09-14 21:25 yuerspring 阅读(157) 评论(0) 推荐(0)

摘要：Sqoop 1.4.6执行方法 sqoop --options-file options11.hdfstomysqlexport--connectjdbc:mysql://bigdatacloud:3306/test--usernameroot--password... 阅读全文

posted @ 2016-09-09 20:43 yuerspring 阅读(308) 评论(0) 推荐(0)

摘要：java 插入数据到mysql 通过sqoop 导入到hive 中，kylin模拟见cube 时间和数据膨胀率 kylin 数据插入到 HBaseKylinHBase 1.1.3Hive 1.2.1Hadoop 2.5.1create table infoageti... 阅读全文

posted @ 2016-09-09 20:33 yuerspring 阅读(336) 评论(0) 推荐(0)

Spark shuffle 优化

摘要：spark.shuffle.file.buffer默认值：32k参数说明：该参数用于设置shuffle write task的BufferedOutputStream的buffer缓冲大小。将数据写到磁盘文件之前，会先写入buffer缓冲中，待缓冲写满之后，才会溢写到... 阅读全文

posted @ 2016-09-08 20:07 yuerspring 阅读(173) 评论(0) 推荐(0)

摘要：启动hive --service metastore启动 dfs yarn[root@bigdatastorm bin]# ./spark-sql --master yarn --deploy-mode client --driver-memory 512m --ex... 阅读全文

posted @ 2016-09-05 22:30 yuerspring 阅读(828) 评论(0) 推荐(0)

摘要：Spark streaming 应用简单示例package com.orc.streamimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Second... 阅读全文

posted @ 2016-09-02 22:10 yuerspring 阅读(223) 评论(0) 推荐(0)

Spark sql 简单示例

摘要：Spark sql hive 整合 http://shiyanjun.cn/archives/1113.html都这个点了，很多同事还没有到公司，得那我就简单写个spark sql 的示例回顾而知新，孔老子明智package com.ib.e3import org... 阅读全文

posted @ 2016-09-02 09:55 yuerspring 阅读(389) 评论(0) 推荐(0)

摘要：同事都被老板叫去开会了 ....... 已经开了两个小时了 GOD 广播变量 broadcast这个变量只能在drive 端修改，不能在executor 端修改不产生shuffle 的优化，但是需要这个RDD 数据量较小累加器 accumulator在executo... 阅读全文

posted @ 2016-09-01 16:37 yuerspring 阅读(205) 评论(0) 推荐(0)

yuerspring