随笔分类 - 经验
摘要:spark 2.4.3 spark读取hive表,步骤: 1)hive-site.xml hive-site.xml放到$SPARK_HOME/conf下 2)enableHiveSupport SparkSession.builder.enableHiveSupport().getOrCreate
阅读全文
摘要:概述 The Agent is started by init.d at start-up. It, in turn, contacts the Cloudera Manager Server and determines which processes should be running. The
阅读全文
摘要:一 对比 存储空间对比: 查询性能对比: 二 设计方案 将数据拆分为:历史数据(hdfs+parquet+snappy)+ 近期数据(kudu),可以兼具各种优点: 1)整体低于10%的磁盘占用; 2)更少的查询耗时; 3)近期数据实时更新; 4)近期数据可修改; 5)kudu集群重启时间降低90%
阅读全文
摘要:kudu的副本数量是在表上设置,可以通过命令查看 # sudo -u kudu kudu cluster ksck $master ... Summary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Und
阅读全文
摘要:从impala中创建kudu表之后,如果想从hive或spark sql直接读取,会报错: Caused by: java.lang.ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler at java.net.URLCl
阅读全文
摘要:kudu并没有命令可以直接查看每个table占用的空间,可以从cloudera manager上间接查看 CM is scrapping and aggregating the /metrics pages from the tablet server instances for each tabl
阅读全文
摘要:kudu写入压力大时报错 19/05/18 16:53:12 INFO AsyncKuduClient: Invalidating location fd52e4f930bc45458a8f29ed118785e3(server002:7050) for tablet 4259921cdcca477
阅读全文
摘要:hue启动coordinator时报错,页面返回undefinied错误框: 后台日志报错: runcpserver.log [13/May/2019 04:34:55 -0700] middleware INFO Processing exception: 'NoneType' object ha
阅读全文
摘要:/opt/cloudera/parcels/CDH/lib/hue/apps/beeswax/src/beeswax/conf.py # Deprecated DOWNLOAD_CELL_LIMIT = Config( key='download_cell_limit', default=10000
阅读全文
摘要:flume kudu sink运行一段时间报错: 19/05/05 10:15:56 WARN client.ConnectToCluster: Error receiving a response from: master:7051 org.apache.kudu.client.Recoverab
阅读全文
摘要:kudu安装后运行不正常,master中找不到任何tserver,查看tserver日志发现有很多报错: Failed to heartbeat to master:7051: Invalid argument: Failed to ping master at master:7051: Clien
阅读全文
摘要:Cloudera Manager中修改配置可能报错: Incorrect string value: '\xE7\xA8\x8B\xE5\xBA\x8F...' for column 'MESSAGE' at row 1 这是一个mysql的字符集问题,极有可能创建scm数据库时使用默认的latin
阅读全文
摘要:docker container启动失败,报错:Exited (137) *** ago,比如 Exited (137) 16 seconds ago 这时通过docker logs查不到任何日志,从mesos上看stderr相关的只有一句 I0409 16:56:26.408077 8583 ex
阅读全文
摘要:yarn开启日志归集功能,除了配置之外 yarn.log-aggregation-enable=true 还要检查/tmp/logs目录是否存在以及权限,尤其是在开启kerberos之后,有些目录可能不能自动创建成功,需要手工创建: $ hdfs dfs -mkdir /tmp$ hdfs dfs
阅读全文
摘要:用户提交任务到yarn时有可能遇到下面的错误: 1) Requested user anything is not whitelisted and has id 980,which is below the minimum allowed 1000 这是因为yarn中配置min.user.id=10
阅读全文
摘要:当hdfs空间不足时,除了删除临时数据或垃圾数据之外,还可以适当调整部分大目录的副本数量,多管齐下; 1 查看 $ hdfs dfs -ls /user/hive/warehouse/temp.db/test_ext_o-rwxr-xr-x 3 hadoop supergroup 44324200
阅读全文
摘要:logstash6.6.0-6.6.2版本使用jdbc input plugin时如果设置了jdbc_default_timezone,会报错: { 2012 rufus-scheduler intercepted an error: 2012 job: 2012 Rufus::Scheduler:
阅读全文
摘要:云主机cpu使用率突然很高 查看服务器发现异常 1 crontab # crontab -l* * * * * /tmp/.dns/y2kupdate >/dev/null 2>&1 2 iptables # iptables -nLChain INPUT (policy DROP)target p
阅读全文
摘要:hadoop.security.authentication: Kerberos -> Simple hadoop.security.authorization: true -> false dfs.datanode.address: -> from 1004 (for Kerberos) to 5
阅读全文
摘要:hdfs开启kerberos之后,namenode报错,连不上journalnode 2019-03-15 18:54:46,504 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as
阅读全文