公告

日历

In Flink, a Catalog is the component that stores metadata about your databases, tables, and views. By default, Flink is configured with a GenericInMemoryCatalog. As the name suggests, this stores everything in the JobManager's RAM. When you restart the cluster (or even just the SQL Client session), that memory is cleared, and all your CREATE TABLE definitions are lost.

Flink SQL> SHOW CATALOGS;
+-----------------+
|    catalog name |
+-----------------+
| default_catalog |
+-----------------+
1 row in set

Flink SQL> CREATE TABLE zzh_test(`id` INT) WITH ('connector' = 'datagen');


Flink SQL> exit;


frank@ZZHPC:~$ sql-client.sh


Flink SQL> SHOW TABLES;
Empty set

There's only one catalog named 'default_catalog', no 'GenericInMemoryCatalog' catalog.

That is a very observant point! The confusion comes from the fact that default_catalog is the name given to the instance, but GenericInMemoryCatalog is the implementation (the underlying Java class) that powers it.

In Flink's default configuration, default_catalog is essentially a "blank slate" that exists only in the JobManager's volatile memory.

Why "default_catalog" is misleading

When you start a Flink cluster without any specific configuration:

Flink automatically creates a catalog named default_catalog.
It assigns the GenericInMemoryCatalog type to that name.
Because it is "In-Memory," any CREATE TABLE statement you run stores the schema in a Java HashMap.

When the cluster process exits, that HashMap is destroyed.

How to verify the implementation

If you want to see what is actually powering your default_catalog, you can run this command in the SQL Client:

DESCRIBE CATALOG default_catalog;

You will likely see output indicating that the type is generic_in_memory.

Flink SQL> DESCRIBE CATALOG default_catalog;
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Cannot obtain metadata information from Catalog default_catalog.

How to Fix This

To make your tables survive a restart, you have two main options:

1. Use a Persistent Catalog (Recommended)

You can connect Flink to an external metadata store that persists data on disk. The most common options are:

Hive Catalog: Uses the Hive Metastore (backed by a relational database like MySQL or Postgres). This is the industry standard for persisting Flink SQL metadata.
JDBC Catalog: Connects directly to a relational database (Postgres, etc.). Note that some JDBC catalog implementations are read-only or have limited support for saving new Flink-specific tables.
Catalog-Specific Connectors: Modern table formats like Apache Iceberg, Apache Paimon, or Apache Hudi provide their own catalogs that store metadata alongside the data files (e.g., in S3 or HDFS).

Example of registering a Hive Catalog:

CREATE CATALOG my_hive WITH (
  'type' = 'hive',
  'default-database' = 'default',
  'hive-conf-dir' = '/path/to/conf'
);
USE CATALOG my_hive;

2. Use an Initialization SQL File

If you don't want to manage an external metastore like Hive, you can automate the recreation of your tables. You can put all your CREATE TABLE and CREATE VIEW statements into a .sql file and tell the Flink SQL Client to run it every time it starts.

Command to start SQL Client with an init file:

./bin/sql-client.sh -i init_tables.sql

Key Distinction: Metadata vs. Data

It is important to remember that Flink itself does not store data.

Metadata: The "table definition" (schema, Kafka topic name, format). This is what lives in the Catalog.
Data: The actual records (the bytes in Kafka, S3, or MySQL). This data is safe and was not deleted by the restart. You simply lost the "map" (the table definition) that Flink uses to find and read that data.

Please give me SQL code to create a catalog for each of Iceberg, Hudi and Paimon.

To persist your table definitions in Flink 2.0.1 using Apache Iceberg, Hudi, or Paimon, you are moving toward a "Lakehouse" architecture. These catalogs are superior to the default in-memory one because they store the metadata (schema) directly in your filesystem (S3, HDFS, or local disk) alongside the data.

To use these, you must ensure the corresponding Catalog/Connector JAR is in your Flink /lib directory.

1. Apache Paimon Catalog

Paimon is highly integrated with Flink. The filesystem type catalog stores all metadata in a directory.

CREATE CATALOG paimon_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 'file:///tmp/paimon_warehouse' -- Or s3://my-bucket/paimon
);

-- Use it to make it the active catalog
USE CATALOG paimon_catalog;

2. Apache Iceberg Catalog

Iceberg is very flexible. While you can use Hive or JDBC for the backend, the hadoop catalog type is the simplest for local/filesystem persistence.

CREATE CATALOG iceberg_catalog WITH (
    'type' = 'iceberg',
    'catalog-type' = 'hadoop',
    'warehouse' = 'file:///tmp/iceberg_warehouse',
    'property-version' = '1'
);

USE CATALOG iceberg_catalog;

3. Apache Hudi Catalog

Hudi's Flink integration typically uses a Hudi catalog to manage its copy-on-write or merge-on-read tables.

CREATE CATALOG hudi_catalog WITH (
    'type' = 'hudi',
    'mode' = 'hms', -- Can be 'hms' (Hive Metastore) or 'dfs' (Filesystem)
    'warehouse' = 'file:///tmp/hudi_warehouse'
);

USE CATALOG hudi_catalog;

Comparison of Lakehouse Catalogs

Feature	Paimon	Iceberg	Hudi
Primary Strength	High-speed streaming updates	Universal ecosystem support	Incremental processing
Metadata Storage	Filesystem (Manifests)	Filesystem (Snapshots)	Timeline/Filesystem
Flink Integration	Native/Very Tight	Strong	Moderate

Crucial Step: The "USE" Command

Remember that even if you create these catalogs, Flink starts in default_catalog every time you open the SQL Client. To see your persisted tables after a restart, you must run:

USE CATALOG paimon_catalog; -- (or iceberg_catalog, etc.)
SHOW TABLES;

If you want Flink to automatically start in your new catalog, you should add these CREATE and USE statements to an initialization SQL file (sql-client.sh -i init.sql).

posted on 2025-12-28 09:46 ZhangZhihuiAAA 阅读(0) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航