摘要: 原因:是table 的location不对,hdfs:///WebLog/Parsed可能被解析为hdfs:/WebLog/Parsed,应该改为hdfs://127.0.0.1:8020/WebLog/Parsed 阅读全文
posted @ 2020-05-21 10:13 DataNerd 阅读(418) 评论(0) 推荐(0) 编辑
摘要: 在Hive的cli中执行 报一下错误 错误分析 自己在HADOOP的配置目录下的hadoop env.sh中添加了 其中HBase的lib下有hadoop common 2.5.1.jar,我的HADOOP版本是2.8.5. 估计是Hive语句执行时调用的jar文件是hadoop common 2. 阅读全文
posted @ 2020-05-11 00:18 DataNerd 阅读(1681) 评论(1) 推荐(0) 编辑
摘要: In addition to the Resilient Distributed Dataset (RDD) interface, the second kind of low level API in Spark is two types of “distributed shared variab 阅读全文
posted @ 2019-03-04 10:36 DataNerd 阅读(310) 评论(0) 推荐(0) 编辑
摘要: This chapter covers the advanced RDD operations and focuses on key–value RDDs, a powerful abstraction for manipulating data. We also touch on some mor 阅读全文
posted @ 2019-03-04 10:03 DataNerd 阅读(301) 评论(0) 推荐(0) 编辑
摘要: What Are the Low Level APIs? There are two sets of low level APIs: there is one for manipulating distributed data (RDDs), and another for distributing 阅读全文
posted @ 2019-02-28 11:24 DataNerd 阅读(144) 评论(0) 推荐(0) 编辑
摘要: Datasets are a strictly Java Virtual Machine (JVM) language feature that work only with Scala and Java. Using Datasets, you can define the object that 阅读全文
posted @ 2019-02-23 14:51 DataNerd 阅读(321) 评论(0) 推荐(0) 编辑
摘要: What Is SQL? Big Data and SQL: Apache Hive Big Data and SQL: Spark SQL The power of Spark SQL derives from several key facts: SQL analysts can now tak 阅读全文
posted @ 2019-02-23 11:05 DataNerd 阅读(314) 评论(0) 推荐(0) 编辑
摘要: Spark Core DataSource: CSV JSON Parquet ORC JDBC/ODBC connections Plain text files The Structure of the Data Sources API Read API Structure The core s 阅读全文
posted @ 2019-02-23 09:58 DataNerd 阅读(434) 评论(0) 推荐(0) 编辑
摘要: Join Expressions A join brings together two sets of data, the left and the right, by comparing the value of one or more keys of the left and right and 阅读全文
posted @ 2019-02-19 12:29 DataNerd 阅读(202) 评论(0) 推荐(0) 编辑
摘要: 分组的类型: The simplest grouping is to just summarize a complete DataFrame by performing an aggregation in a select statement. A “group by” allows you to 阅读全文
posted @ 2019-02-19 11:06 DataNerd 阅读(307) 评论(0) 推荐(0) 编辑