| path |
Y |
N/A |
Hudi表的 base path,如果不存在会创建,否则应是一个已初始化成功的 hudi 表 |
| read.end-commit |
Y |
N/A |
|
| read.start-commit |
Y |
N/A |
|
| read.tasks |
Y |
N/A |
|
| write.tasks |
Y |
N/A |
|
| write.partition.format |
Y |
N/A |
分区路径格式,仅 write.datetime.partitioning 为 true 是有效。两种默认值:1、yyyyMMddHH,当分区字段类型为 timestamp(3) WITHOUT TIME ZONE, LONG, FLOAT, DOUBLE, DECIMAL 是;2、yyyyMMdd,当分区字段类型为 DATE 和 INT 时。 |
| write.bucket_assign.tasks |
Y |
N/A |
|
| archive.max_commits |
N |
50 |
|
| archive.min_commits |
N |
40 |
|
| cdc.enabled |
N |
false |
|
| changelog.enabled |
N |
false |
|
| clean.async.enabled |
N |
true |
|
| clean.policy |
N |
KEEP_LATEST_COMMITS |
清理策略,可取值:KEEP_LATEST_COMMITS, KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_BY_HOURS.Default is KEEP_LATEST_COMMITS |
| clean.retain_commits |
N |
30 |
|
| clean.retain_file_versions |
N |
5 |
|
| clean.retain_hours |
N |
24 |
|
| clustering.async.enabled |
N |
false |
|
| clustering.delta_commits |
N |
4 |
|
| clustering.plan.partition.filter.mode |
N |
NONE |
可取值:NONE, RECENT_DAYS, SELECTED_PARTITIONS, DAY_ROLLING |
| clustering.plan.strategy.class |
N |
org.apache.hudi.client.clustering.plan.strategy.FlinkSizeBasedClusteringPlanStrategy |
|
| clustering.tasks |
Y |
N/A |
|
| clustering.schedule.enabled |
N |
false |
|
| compaction.async.enabled |
N |
true |
|
| compaction.delta_commits |
N |
5 |
|
| compaction.delta_seconds |
N |
3600 |
|
| compaction.max_memory |
N |
100 |
|
| compaction.schedule.enabled |
N |
true |
|
| compaction.target_io |
N |
512000 |
|
| compaction.timeout.seconds |
N |
1200 |
|
| compaction.trigger.strategy |
N |
num_commits |
可取值:num_commits, time_elapsed, num_or_time |
| hive_sync.conf.dir |
Y |
N/A |
|
| hive_sync.table_properties |
Y |
N/A |
|
| hive_sync.assume_date_partitioning |
N |
false |
假定分区为 yyyy/mm/dd 格式 |
| hive_sync.auto_create_db |
N |
true |
自动创建不存在的数据库 |
| hive_sync.db |
N |
default |
|
| hive_sync.table |
N |
unknown |
|
| hive_sync.table.strategy |
N |
ALL |
|
| hive_sync.enabled |
N |
false |
|
| hive_sync.file_format |
N |
PARQUET |
|
| hive_sync.jdbc_url |
N |
jdbc:hive2://localhost:10000 |
|
| hive_sync.metastore.uris |
N |
'' |
Hive Metastore uris |
| hive_sync.mode |
N |
HMS |
|
| hive_sync.partition_fields |
N |
'' |
|
| hive_sync.password |
N |
hive |
|
| hive_sync.support_timestamp |
N |
true |
|
| hive_sync.use_jdbc |
N |
true |
|
| hive_sync.username |
N |
hive |
|
| hoodie.bucket.index.hash.field |
N |
|
桶(BUCKET)的 key,必须为 recordKey 的子集,或者就是 recordKey |
| hoodie.bucket.index.num.buckets |
N |
4 |
|
| hoodie.datasource.merge.type |
N |
payload_combine |
|
| hoodie.datasource.query.type |
N |
snapshot |
|
| hoodie.datasource.write.hive_style_partitioning |
N |
false |
|
| hoodie.datasource.write.keygenerator.type |
N |
SIMPLE |
|
| hoodie.datasource.write.partitionpath.field |
N |
'' |
|
| hoodie.datasource.write.recordkey.field |
N |
uuid |
|
| hoodie.datasource.write.partitionpath.urlencode |
N |
false |
|
| hoodie.database.name |
Y |
N/A |
|
| hoodie.table.name |
Y |
N/A |
|
| hoodie.datasource.write.keygenerator.class |
Y |
N/A |
|
| index.bootstrap.enabled |
N |
false |
|
| index.global.enabled |
N |
true |
|
| index.partition.regex |
N |
* |
|
| index.state.ttl |
N |
0.0 |
|
| index.type |
N |
FLINK_STATE |
取值有:BUCKET,FLINK_STATE,BLOOM,GLOBAL_BLOOM,GLOBAL_SIMPLE,HBASE,INMEMORY,SIMPLE,默认为 FLINK_STATE,详情参见 https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java 或者 https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java |
| metadata.enabled |
N |
false |
|
| metadata.compaction.delta_commits |
N |
10 |
|
| partition.default_name |
N |
HIVE_DEFAULT_PARTITION |
|
| payload.class |
N |
org.apache.hudi.common.model.EventTimeAvroPayload |
|
| precombine.field |
N |
ts |
|
| read.streaming.enabled |
N |
false |
|
| read.streaming.skip_compaction |
N |
false |
|
| read.streaming.skip_clustering |
N |
false |
|
| read.utc-timezone |
N |
true |
|
| record.merger.impls |
N |
org.apache.hudi.common.model.HoodieAvroRecordMerger |
|
| record.merger.strategy |
N |
eeb8d96f-b1e4-49fd-bbf8-28ac514178e5 |
|
| table.type |
N |
COPY_ON_WRITE |
指定表类型,可取:COPY_ON_WRITE 或 MERGE_ON_READ |
| write.batch.size |
N |
256.0 |
|
| write.commit.ack.timeout |
N |
-1 |
|
| write.ignore.failed |
N |
false |
|
| write.insert.cluster |
N |
false |
|
| write.log.max.size |
N |
1024 |
|
| write.log_block.size |
N |
128 |
|
| write.log_block.size |
N |
100 |
单位:MB |
| write.operation |
N |
upsert |
可取值:false、insert 或 upsert,默认 false 表示允许重复 |
| write.precombine |
N |
false |
是否在 insert 和 upsert 前删除重复数据 |
| write.parquet.block.size |
N |
120 |
|
| write.rate.limit |
N |
0 |
每秒写入的数据条数。默认 0 表示没有限制 |
| write.retry.interval.ms |
N |
2000 |
|
| write.retry.times |
N |
3 |
|
| write.sort.memory |
N |
128 |
单位:MB |
| write.task.max.size |
N |
1024.0 |
单位:MB |