Greenplum 实战-添加拓展插件

Greenplum 实战-添加拓展插件

概述

PostgreSQL提供了丰富的扩展,如模糊查询 pg_trgm、中文全文检索 zhparser、字符串相似度判断 fuzzystrmatch、数组相似度 smlar、位图计算 roaringbitmap等等。Greenplum 使用了PostgreSQL内核,理论上兼容PG扩展包。本文将介绍在Greenplum上怎么安装扩展插件。

快速入门

查看自带插件列表

已装插件列表

通过查询pg_extension视图,可看到已装的扩展。gp默认只安装了plpgsql插件

SELECT * from pg_extension;

  extname  | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-----------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql   |       10 |           11 | f              | 1.0        |           |
 plpythonu |       10 |           11 | f              | 1.0        |           |
 pg_trgm   |       10 |         2200 | t              | 1.1        |           |

可安装的插件列表

SELECT * from pg_available_extensions;
-----------------------------+-----------------+-------------------+-----------------------------------------------------------------------------------------------
 advanced_password_check     | 1.0             |                   | Advanced Password Check
 amcheck                     | 1.0             |                   | functions for verifying relation integrity
 plperlu                     | 1.0             |                   | PL/PerlU untrusted procedural language
 greenplum_fdw               | 1.0             |                   | foreign-data wrapper for remote Greenplum servers
 btree_gin                   | 1.0             |                   | support for indexing common datatypes in GIN
 plpgsql                     | 1.0             | 1.0               | PL/pgSQL procedural language
 pageinspect                 | 1.8             |                   | inspect the contents of database pages at a low level
 citext                      | 1.0             |                   | data type for case-insensitive character strings
 plpython2u                  | 1.0             |                   | PL/Python2U untrusted procedural language
 dataflow                    | 1.0             |                   | Extension which provides extra formatters and types for dataflow
 plpythonu                   | 1.0             | 1.0               | PL/PythonU untrusted procedural language
 dblink                      | 1.1             |                   | connect to other PostgreSQL databases from within a database
 postgres_fdw                | 1.0             |                   | foreign-data wrapper for remote PostgreSQL servers
 diskquota                   | 1.0             |                   | Disk Quota Main Program
 pg_trgm                     | 1.1             | 1.1               | text similarity measurement and index searching based on trigrams
 fuzzystrmatch               | 1.0             |                   | determine similarities and distance between strings
 gp_array_agg                | 1.0.0           |                   | Parallel aggregate implementation for Greenplum
 gp_distribution_policy      | 1.0             |                   | check distribution policy in a GPDB cluster
 gp_internal_tools           | 1.0.0           |                   | Different internal tools for Greenplum
 hstore                      | 1.3             |                   | data type for storing sets of (key, value) pairs
 gp_legacy_string_agg        | 1.0.0           |                   | Legacy one-argument string_agg implementation for Greenplum
 gpss                        | 1.0             |                   | Extension which implements kinds of gpss formaters and protocol buffer
 gp_parallel_retrieve_cursor | 1.0             |                   | Retrieve results of cursor in parallel
 sslinfo                     | 1.0             |                   | information about SSL certificates
 gp_pitr                     | 1.0             |                   | Distributed point-in-time-recovery functions
 pgcrypto                    | 1.1             |                   | cryptographic functions
 gp_sparse_vector            | 1.0.1           |                   | SParse vector implementation for GreenPlum
 gp_wlm                      | 0.1             |                   | Greenplum Workload Manager Extension
 metrics_collector           | 1.0             |                   | Greenplum Metrics Collector Extension
 orafce                      | 3.7             |                   | Functions and operators that emulate a subset of functions and packages from the Oracle RDBMS
 plperl                      | 1.0             |                   | PL/Perl procedural language
 tablefunc                   | 1.0             |                   | functions that manipulate whole tables, including crosstab
(32 rows)

插件详情

可看到当前拥有的扩展清单、扩展版本号、是否已安装

SELECT * from pg_available_extension_versions;
            name             | version | installed | superuser | relocatable |   schema   | requires |                                            comment
-----------------------------+---------+-----------+-----------+-------------+------------+----------+-----------------------------------------------------------------------------------------------
 advanced_password_check     | 1.0     | f         | t         | t           |            |          | Advanced Password Check
 amcheck                     | 1.0     | f         | t         | t           |            |          | functions for verifying relation integrity
 plperlu                     | 1.0     | f         | t         | f           | pg_catalog |          | PL/PerlU untrusted procedural language
 greenplum_fdw               | 1.0     | f         | t         | t           |            |          | foreign-data wrapper for remote Greenplum servers
 btree_gin                   | 1.0     | f         | t         | t           |            |          | support for indexing common datatypes in GIN
 plpgsql                     | 1.0     | t         | f         | f           | pg_catalog |          | PL/pgSQL procedural language
 pageinspect                 | 1.8     | f         | t         | t           |            |          | inspect the contents of database pages at a low level
 pageinspect                 | 1.2     | f         | t         | t           |            |          | inspect the contents of database pages at a low level
 citext                      | 1.0     | f         | t         | t           |            |          | data type for case-insensitive character strings
 citext                      | 1.1     | f         | t         | t           |            |          | data type for case-insensitive character strings
 plpython2u                  | 1.0     | f         | t         | f           | pg_catalog |          | PL/Python2U untrusted procedural language
 dataflow                    | 1.0     | f         | t         | t           |            |          | Extension which provides extra formatters and types for dataflow
 plpythonu                   | 1.0     | t         | t         | f           | pg_catalog |          | PL/PythonU untrusted procedural language
 dblink                      | 1.1     | f         | t         | t           |            |          | connect to other PostgreSQL databases from within a database
 postgres_fdw                | 1.0     | f         | t         | t           |            |          | foreign-data wrapper for remote PostgreSQL servers
 diskquota                   | 1.0     | f         | t         | t           |            |          | Disk Quota Main Program
 pg_trgm                     | 1.1     | t         | t         | t           |            |          | text similarity measurement and index searching based on trigrams
 fuzzystrmatch               | 1.0     | f         | t         | t           |            |          | determine similarities and distance between strings
 gp_array_agg                | 1.0.0   | f         | t         | t           |            |          | Parallel aggregate implementation for Greenplum
 gp_distribution_policy      | 1.0     | f         | t         | t           |            |          | check distribution policy in a GPDB cluster
 gp_internal_tools           | 1.0.0   | f         | t         | t           |            |          | Different internal tools for Greenplum
 hstore                      | 1.3     | f         | t         | t           |            |          | data type for storing sets of (key, value) pairs
 gp_legacy_string_agg        | 1.0.0   | f         | t         | t           |            |          | Legacy one-argument string_agg implementation for Greenplum
 gpss                        | 1.0     | f         | t         | t           |            |          | Extension which implements kinds of gpss formaters and protocol buffer
 gp_parallel_retrieve_cursor | 1.0     | f         | t         | f           | pg_catalog |          | Retrieve results of cursor in parallel
 sslinfo                     | 1.0     | f         | t         | t           |            |          | information about SSL certificates
 gp_pitr                     | 1.0     | f         | t         | f           |            |          | Distributed point-in-time-recovery functions
 pgcrypto                    | 1.1     | f         | t         | t           |            |          | cryptographic functions
 gp_sparse_vector            | 1.0.1   | f         | t         | t           |            |          | SParse vector implementation for GreenPlum
 gp_wlm                      | 0.1     | f         | t         | f           | gpmetrics  |          | Greenplum Workload Manager Extension
 metrics_collector           | 1.0     | f         | t         | f           | gpmetrics  |          | Greenplum Metrics Collector Extension
 orafce                      | 3.7     | f         | t         | f           |            |          | Functions and operators that emulate a subset of functions and packages from the Oracle RDBMS
 plperl                      | 1.0     | f         | f         | f           | pg_catalog |          | PL/Perl procedural language
 tablefunc                   | 1.0     | f         | t         | t           |            |          | functions that manipulate whole tables, including crosstab
(34 rows)

安装插件

加载已添加的扩展

通过命令 CREATE EXTENSION,加载pg_available_extensions视图里的扩展到数据库中。执行成功,即可直接使用。
相关命令介绍详见:http://docs-cn.greenplum.org/v6/ref_guide/sql_commands/CREATE_EXTENSION.html#topic1

CREATE EXTENSION [ IF NOT EXISTS ] extension_name
  [ WITH ] [ SCHEMA schema_name ]
           [ VERSION version ]
           [ FROM old_version ]
           [ CASCADE ]
           
示例:
CREATE EXTENSION fuzzystrmatch;

安装外部插件

需要先下载扩展包,解压到gp数据库的扩展包目录下,执行编译脚本。安装成功后,在可用插件中就可以看到该插件信息。接下来通过CREATE EXTENSION加装插件(过程同上),就完成了。

下载扩展包

如果扩展在未在视图pg_available_extension_versions中找到,就得先进行插件安装。GP团队已收集了常用的插件(实际就是PostgreSQL的扩展插件),统一维护在gp源码包中。可以通过下载源码包获得需要的插件,或单独下载需要的三方扩展包。以模糊查询 pg_trgm插件为例。
下载gp源码包,并解压

wget https://github.com/greenplum-db/gpdb/archive/refs/tags/6.18.2.tar.gz \
&& tar -zxvf gpdb-6.18.2.tar.gz

或者https://github.com/greenplum-db/gpdb-archive/archive/refs/heads/main.zip

拷贝扩展包到安装目录

gp源码包解压后,找到需要的扩展包。扩展包路径在gpdb-6.18.2/contrib下,进到该目录可看到可用的扩展包
扩展包

扩展2

复制扩展包到GP数据库扩展包安装目录($GPHOME/share/postgresql/contrib),当然也可以把插件全部拷贝过去,这里以模糊查询 pg_trgm插件为例,其路径为 gpdb-6.18.2/contrib/pg_trgm

cp -r gpdb-6.18.2/contrib/pg_trgm $GPHOME/share/postgresql/contrib

编译并安装扩展包

进到GP数据库扩展包安装目录,进到需安装的扩展包里,执行编译安装命令

cd $GPHOME/share/postgresql/contrib/pg_trgm 

扩展3

注意请切换gpadmin用户(与GP安装同用户)安装,如果当前不是请授权文件夹权限并切换到gpadmin用户

chown -R gpadmin:gpadmin $GPHOME/share/postgresql/contrib/pg_trgm 
su gpadmin

编译并安装,注意要带上USE_PGXS=1,直接执行make会报错。见附录

make USE_PGXS=1 && make USE_PGXS=1 install

验证插件安装是否成功

安装成功后,会在$GPHOME/share/postgresql/extension目录下生成插件相关的几个文件
扩展5

并且在可用插件列表中也会出现。因为笔者已安装,固直接截图插件详情里的。验证成功后,注意这时别急,如果gp为集群环境,请在sement上也进行安装

SELECT * from pg_available_extensions;

sement节点机器安装扩展插件

集群环境下,sement机器必须也安装插件。还记得gp安装时的gpscp、gpssh命令么,没错,这时可以用这两个工具批量操作。进行相同安装操作,或可直接拷贝安装成功后生成的文件(注意不单单是$GPHOME/share/postgresql/extension目录下的,没实践不展开)。此处以安装包为例。
打包插件包并拷贝到sement节点上

# 打包压缩
cd $GPHOME/share/postgresql/contrib \
&& tar -czvf pg_trgm.tar.gz ./pg_trgm

# 拷贝到segment上,seghost文件为segment机器列表
gpscp -f <seghost_list> ./pg_trgm.tar.gz=:$GPHOME/share/postgresql/contrib/pg_trgm.tar.gz
# 示例
gpscp -f /home/gpadmin/config/seghosts ./pg_trgm.tar.gz=:/opt/gpdb/share/postgresql/contrib/pg_trgm.tar.gz

执行安装脚本

# 1、ssh打通segment,批量安装
gpssh -f /home/gpadmin/config/seghosts 

# 检查下segment是否都连上了
pwd

# 2、进入插件包目录
cd $GPHOME/share/postgresql/contrib

# 3、解压缩并进入插件安装包目录下
tar -zxvf pg_trgm.tar.gz && cd pg_trgm

# 4、编译并安装,等待安装完成
make USE_PGXS=1 && make USE_PGXS=1 install

# 5、等待安装完成,退出
exit

到此segment也都装上了,外部插件已成功添加。

加载扩展到数据库中

通过命令 CREATE EXTENSION 加载插件。同加载已添加的扩展一样

CREATE EXTENSION pg_trgm

使用

插件安装并加载完成,就可以直接使用了。不同的插件,作用的范围不同,新增的特性会在不同的地方体现,详细的说明,请自行翻阅插件本身的文档。
以模糊查询插件为例,具体使用详见:http://postgres.cn/docs/13/pgtrgm.html

select show_trgm('hello 这个傻猿不太冷 怎么这么帅');  
{0x83a7ab,0x8df69b,0xaf9d36,0xaf7274,0xb6a741,0xc4ce61,0x19d70f,0x193489,0x1a5c56,"  h"," he",0x48c484,0x5c7a09,0x61e652,ell,hel,llo,"lo ",0x76572a,0x7e334d}

看到以下内容,就大功告成了。好了,你可以尽情摸索新插件带来的新特性

删除扩展

通过命令 DROP EXTENSION
相关命令介绍详见:http://docs-cn.greenplum.org/v6/ref_guide/sql_commands/DROP_EXTENSION.html#topic1

DROP EXTENSION [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

示例:
DROP EXTENSION fuzzystrmatch;

更改已安装的扩展定义

相关命令介绍详见:http://docs-cn.greenplum.org/v6/ref_guide/sql_commands/ALTER_EXTENSION.html#topic1

ALTER EXTENSION name UPDATE [ TO new_version ]
ALTER EXTENSION name SET SCHEMA new_schema
ALTER EXTENSION name ADD member_object
ALTER EXTENSION name DROP member_object

where member_object is:

  ACCESS METHOD object_name |
  AGGREGATE aggregate_name ( aggregate_signature ) |
  CAST (source_type AS target_type) |
  COLLATION object_name |
  CONVERSION object_name |
  DOMAIN object_name |
  EVENT TRIGGER object_name |
  FOREIGN DATA WRAPPER object_name |
  FOREIGN TABLE object_name |
  FUNCTION function_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) |
  MATERIALIZED VIEW object_name |
  OPERATOR operator_name (left_type, right_type) |
  OPERATOR CLASS object_name USING index_method |
  OPERATOR FAMILY object_name USING index_method |
  [ PROCEDURAL ] LANGUAGE object_name |
  SCHEMA object_name |
  SEQUENCE object_name |
  SERVER object_name |
  TABLE object_name |
  TEXT SEARCH CONFIGURATION object_name |
  TEXT SEARCH DICTIONARY object_name |
  TEXT SEARCH PARSER object_name |
  TEXT SEARCH TEMPLATE object_name |
  TRANSFORM FOR type_name LANGUAGE lang_name |
  TYPE object_name |
  VIEW object_name

and aggregate_signature is:

* | [ argmode ] [ argname ] argtype [ , ... ] |
  [ [ argmode ] [ argname ] argtype [ , ... ] ] 
    ORDER BY [ argmode ] [ argname ] argtype [ , ... ]

附录

GP支持插件列表

完整清单详见: https://github.com/greenplum-db/gpdb/tree/02f2a3f39abefa8f7cb24c392d59d0d7be8b8495/contrib
greenplum插件

阿里云-RDS PostgreSQL 插件说明&实践文档(白嫖)

地址:https://help.aliyun.com/document_detail/142340.html

阿里云上已经帮我们收集了大量插件说明文档,还整理了很多有用实践文档。除了pg的一些实操,在这神奇的网站上,大家还可以找到很多有用的实践文档。不得不感谢”996“工程师们做出的无私贡献(看吧,我没在摸鱼)。一时白嫖一时爽,一直白嫖一直爽。

RoaringBitmap插件地址
https://github.com/zeromax007/gpdb-roaringbitmap

make 时报错:…/…/src/Makefile.global: 没有那个文件或目录

RoaringBitmap插件地址

https://github.com/zeromax007/gpdb-roaringbitmap

常见编译报错处理

make 时报错:…/…/src/Makefile.global: 没有那个文件或目录

报错1

看Makefile文件即可知道如何解决
报错解决

解决方案:带上参数USE_PGXS即可

make USE_PGXS=1

make 时报错:gcc问题

一般都是没有安装gcc的原因,安装下gcc

sudo yum install gcc

报错:gp相关命令不存在
一般重新应用下环境变量即可

source ~/.base_profile

报错:source环境变量,但是gp命令不生效

已经执行了source ~/.base_profile,但是gp命令还是报错:找不到。更神奇的是在特定目录下,可以执行gp命令,其他目录都不可用。
出现这情况,请检查下配置文件的GP安装主路径是否正确。可通过env命令检查当前生效的环境变量信息

env

如果发现gp的path跟实际的有出入,请做出相应更改。可能是GPHOME变量有问题,在环境变量中指定GPHOME变量

# 查看 GPHOME变量
echo $GPHOME

# 如果与实际不符,手动指定GPHOME
vim ~/.bash_profile

# 重新执行 srouce命令
source ~/.bash_profile

转自:
https://blog.csdn.net/y408526120/article/details/121962414

posted @ 2025-06-26 16:42  数据库小白(专注)  阅读(96)  评论(0)    收藏  举报