随笔分类 -  BigData

上一页 1 2 3 4 5 6 7 ··· 11 下一页
摘要:包括cloudera-manager、hdfs、impala、kudu、oozie以及系统日志等; #cloudera-service-monitor log /bin/rm /var/lib/cloudera-service-monitor/ts/*/partition*/* -rf /bin/r 阅读全文
posted @ 2019-07-03 16:39 匠人先生 阅读(634) 评论(0) 推荐(0)
摘要:logstash input插件之mongodb是第三方的,配置如下: input { mongodb { uri => 'mongodb://mongo_server:27017/db' placeholder_db_dir => '/path/to/db_dir/' placeholder_db 阅读全文
posted @ 2019-06-20 15:08 匠人先生 阅读(1472) 评论(0) 推荐(0)
摘要:官方:https://mesos.github.io/chronos/ mesos集群中替换crontab Chronos A fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based sc 阅读全文
posted @ 2019-06-19 14:51 匠人先生 阅读(737) 评论(0) 推荐(0)
摘要:https://drill.apache.org/ 一 简介 Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to suppo 阅读全文
posted @ 2019-06-16 22:23 匠人先生 阅读(2115) 评论(0) 推荐(0)
摘要:ETL ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS source systems then tra 阅读全文
posted @ 2019-06-16 21:45 匠人先生 阅读(1238) 评论(0) 推荐(0)
摘要:经常有一些需要做id打通的场景,比如用户id打通等, 问题抽象是每条数据都可以解析出一个或多个kv pair:(id_type,id),然后需要将某一个kv pair匹配的多条数据进行merge; 比如: data1: Array(('type1', 'id1'), ('type2', 'id2') 阅读全文
posted @ 2019-06-07 01:18 匠人先生 阅读(545) 评论(0) 推荐(0)
摘要:gobblin 0.10 想要持久化kafka到hdfs有很多种方式,比如flume、logstash、gobblin,其中flume和logstash是流式的,gobblin是批处理式的,gobblin通过定时任务触发来完成数据持久化,在任务和任务之间是没有任何读写的,这点是和flume、logs 阅读全文
posted @ 2019-06-01 14:29 匠人先生 阅读(1676) 评论(0) 推荐(0)
摘要:spark 2.4.3 spark读取hive表,步骤: 1)hive-site.xml hive-site.xml放到$SPARK_HOME/conf下 2)enableHiveSupport SparkSession.builder.enableHiveSupport().getOrCreate 阅读全文
posted @ 2019-06-01 14:05 匠人先生 阅读(5345) 评论(3) 推荐(1)
摘要:kudu tserver占用内存过高后会拒绝部分写请求,日志如下: 19/06/01 13:34:12 INFO AsyncKuduClient: Invalidating location 34b1c13d04664cc8bae6689d39b08b77($kudu_tserver:7050) f 阅读全文
posted @ 2019-06-01 13:48 匠人先生 阅读(3567) 评论(0) 推荐(0)
摘要:概述 The Agent is started by init.d at start-up. It, in turn, contacts the Cloudera Manager Server and determines which processes should be running. The 阅读全文
posted @ 2019-05-28 22:51 匠人先生 阅读(1915) 评论(0) 推荐(0)
摘要:一 对比 存储空间对比: 查询性能对比: 二 设计方案 将数据拆分为:历史数据(hdfs+parquet+snappy)+ 近期数据(kudu),可以兼具各种优点: 1)整体低于10%的磁盘占用; 2)更少的查询耗时; 3)近期数据实时更新; 4)近期数据可修改; 5)kudu集群重启时间降低90% 阅读全文
posted @ 2019-05-27 17:45 匠人先生 阅读(1834) 评论(0) 推荐(0)
摘要:kudu的副本数量是在表上设置,可以通过命令查看 # sudo -u kudu kudu cluster ksck $master ... Summary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Und 阅读全文
posted @ 2019-05-27 15:16 匠人先生 阅读(2814) 评论(0) 推荐(0)
摘要:kudu加减数据盘不能直接修改配置fs_data_dirs后重启,否则会报错: Check failed: _s.ok() Bad status: Already present: FS layout already exists; not overwriting existing layout: 阅读全文
posted @ 2019-05-25 18:25 匠人先生 阅读(4481) 评论(0) 推荐(0)
摘要:kudu rebalance命令报错 terminate called after throwing an instance of 'std::regex_error' what(): regex_error *** Aborted at 1558779043 (unix time) try "da 阅读全文
posted @ 2019-05-25 18:22 匠人先生 阅读(1290) 评论(1) 推荐(0)
摘要:从impala中创建kudu表之后,如果想从hive或spark sql直接读取,会报错: Caused by: java.lang.ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler at java.net.URLCl 阅读全文
posted @ 2019-05-22 18:06 匠人先生 阅读(5419) 评论(0) 推荐(1)
摘要:kudu并没有命令可以直接查看每个table占用的空间,可以从cloudera manager上间接查看 CM is scrapping and aggregating the /metrics pages from the tablet server instances for each tabl 阅读全文
posted @ 2019-05-21 20:11 匠人先生 阅读(3193) 评论(0) 推荐(0)
摘要:kudu写入压力大时报错 19/05/18 16:53:12 INFO AsyncKuduClient: Invalidating location fd52e4f930bc45458a8f29ed118785e3(server002:7050) for tablet 4259921cdcca477 阅读全文
posted @ 2019-05-20 20:11 匠人先生 阅读(3890) 评论(0) 推荐(0)
摘要:1 下载 https://www.mongodb.com/download-center/community 比如: https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-4.0.9.tgz 2 连接 # cd $MONGODB_HOME# bi 阅读全文
posted @ 2019-05-16 10:41 匠人先生 阅读(448) 评论(0) 推荐(0)
摘要:spark2.4.3+kudu1.9 1 批量读 val df = spark.read.format("kudu") .options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test_db.test_table") 阅读全文
posted @ 2019-05-15 10:43 匠人先生 阅读(5244) 评论(0) 推荐(0)
摘要:hue启动coordinator时报错,页面返回undefinied错误框: 后台日志报错: runcpserver.log [13/May/2019 04:34:55 -0700] middleware INFO Processing exception: 'NoneType' object ha 阅读全文
posted @ 2019-05-13 19:57 匠人先生 阅读(938) 评论(0) 推荐(0)

上一页 1 2 3 4 5 6 7 ··· 11 下一页