骑着蜗牛追火车

GitLab CI/CD Pipeline Configuration Reference：四 [Parameter details: extends rules ]

摘要： extends Introduced in GitLab 11.3. extends defines entry names that a job that uses extends is going to inherit from. It’s an alternative to using YAM 阅读全文

posted @ 2020-06-24 11:23 骑着蜗牛追火车阅读(907) 评论(0) 推荐(0)

GitLab CI/CD Pipeline Configuration Reference：三 [Parameter details: script before_script and after_script stage]

摘要： script script is the only required keyword that a job needs. It's a shell script which is executed by the Runner. For example: job: script: "bundle ex 阅读全文

posted @ 2020-06-23 18:26 骑着蜗牛追火车阅读(1316) 评论(0) 推荐(0)

GitLab CI/CD Pipeline Configuration Reference：二 [Parameter details: image service]

摘要： Parameter details The following are detailed explanations for parameters used to configure CI/CD pipelines. image Used to specify a Docker image to us 阅读全文

posted @ 2020-06-23 17:10 骑着蜗牛追火车阅读(485) 评论(0) 推荐(0)

GitLab CI/CD Pipeline Configuration Reference：一 [Global inherit stages workflow rules include]

摘要： GitLab CI/CD pipelines are configured using a YAML file called .gitlab-ci.yml within each project. The .gitlab-ci.yml file defines the structure and o 阅读全文

posted @ 2020-06-23 15:04 骑着蜗牛追火车阅读(1023) 评论(0) 推荐(0)

kudu 核心概念

摘要：核心概念 Columnar Data Store Kudu is a columnar data store. A columnar data store stores data in strongly-typed columns. With a proper design, it is super 阅读全文

posted @ 2020-06-15 18:16 骑着蜗牛追火车阅读(503) 评论(0) 推荐(0)

Spark DataSet

摘要： 1.DataSet相关概念 Dataset是一个分布式的数据集。Dataset是Spark 1.6开始新引入的一个接口，它结合了RDD API的很多优点（包括强类型，支持lambda表达式等），以及Spark SQL的优点（优化后的执行引擎）。Dataset可以通过JVM对象来构造，然后通过tran 阅读全文

posted @ 2020-05-28 23:13 骑着蜗牛追火车阅读(1603) 评论(0) 推荐(0)

spark RDD 的map与flatmap区别说明

摘要： HDFS到HDFS过程看看map 和flatmap的位置 Flatmap 和map 的定义 map()是将函数用于RDD中的每个元素，将返回值构成新的RDD。 flatmap()是将函数应用于RDD中的每个元素，将返回的迭代器的所有内容构成新的RDD 例子： val rdd = sc.parall 阅读全文

posted @ 2020-05-28 23:04 骑着蜗牛追火车阅读(2979) 评论(0) 推荐(0)

Spark 模块及常用实例

摘要： Apache Spark Examples These examples give a quick overview of the Spark API. Spark is built on the concept of distributed datasets, which contain arbi 阅读全文

posted @ 2020-05-28 10:18 骑着蜗牛追火车阅读(1061) 评论(0) 推荐(0)

Parquet 列式存储结构

摘要： Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, d 阅读全文

posted @ 2020-05-27 14:32 骑着蜗牛追火车阅读(1116) 评论(0) 推荐(0)

Avro 序列化

摘要：官网传送：http://avro.apache.org/docs/current/ Introduction Apache Avro™ is a data serialization system. Avro provides: Rich data structures. A compact, fa 阅读全文

posted @ 2020-05-27 10:53 骑着蜗牛追火车阅读(754) 评论(0) 推荐(0)

导航

2020年6月24日

2020年6月23日

2020年6月15日

2020年5月28日

2020年5月27日