随笔分类 -  Data Lake

Iceberg Table Spec
摘要:https://iceberg.apache.org/spec/#partition-transforms 以文件形式而非目录形式,形成table format;传统的Hive和Hudi,都是以目录来区分Table partitions This table format tracks indivi 阅读全文

posted @ 2022-08-03 15:02 fxjwind 阅读(307) 评论(0) 推荐(0)

Apache Hudi 源码分析 - HoodieTableSink
摘要:可以看出,分为几种, bulk_insert append default, upsert compact,clean Pipelines这部分,注释写的非常不错 bulk_insert 先按照partition path进行shuffle分区,然后再按照partition path进行排序,一个p 阅读全文

posted @ 2022-07-12 18:01 fxjwind 阅读(1570) 评论(0) 推荐(0)

Apache Hudi 源码分析 - HoodieTableSource
摘要:有两个核心的算子, StreamReadMonitoringFunction ,单并发对应于一个table,读Meta,找出更新的FileSilce,生成inputSplits StreamReadOperator,从inputSplits中读出来RowData StreamReadMonitori 阅读全文

posted @ 2022-06-15 15:12 fxjwind 阅读(838) 评论(0) 推荐(1)

Apache Hudi 源码分析 - JavaClient
摘要:JavaClient Insert, @Override public List<WriteStatus> insert(List<HoodieRecord<T>> records, String instantTime) { HoodieTable<T, List<HoodieRecord<T>> 阅读全文

posted @ 2022-05-30 15:24 fxjwind 阅读(611) 评论(0) 推荐(0)