GreenPlum大数据文件加载工具之gpfdist

我的GP集群只有4台,一台mdw,一台备的sdw,两台segment节点,没有创建etl节点,生产中受网络,磁盘等影响,建议使用etl节点操作。

 

 

 

 

作用:

一般安装在一台ETL机器上使用

基于libevent的高速并行文件加载工具

充分利用多节点优势,并行加载

加载性能非常好

可水平扩展

实验目的:

通过mdw访问另一台smdw(etl)的demo.txt文件

 

 

mdw:

一、在smdw(生产中用一台etl)gpadmin家目录随便创建一个demo.txt

[gpadmin@gp-smdw ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:af:83:c7 brd ff:ff:ff:ff:ff:ff
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:af:83:d1 brd ff:ff:ff:ff:ff:ff
inet 192.168.159.143/24 brd 192.168.159.255 scope global noprefixroute dynamic ens37
valid_lft 1080sec preferred_lft 1080sec
inet6 fe80::322e:1dae:ab6e:5d12/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[gpadmin@gp-smdw ~]$ cat demo.txt
This|is|andyxi
GP|etl|test

[gpadmin@gp-smdw ~]$ pwd
/home/gpadmin
[gpadmin@gp-smdw ~]$

 

二、mdw数据库中创建目标表

[gpadmin@gp-mdw 20221215]$ psql
psql (9.4.26)
Type "help" for help.

etl=# create table demo (c1 text,c2 text,c3 text);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'c1' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
etl=# \d demo
Table "public.demo"
Column | Type | Modifiers
--------+------+-----------
c1 | text |
c2 | text |
c3 | text |
Distributed by: (c1)

etl=# create external table demo_ext (like demo) LOCATION
('gpfdist://gp-smdw:8080/demo.txt') FORMAT 'TEXT' (DELIMITER '|');
CREATE EXTERNAL TABLE
etl=# \d demo_ext;
External table "public.demo_ext"
Column | Type | Modifiers
--------+------+-----------
c1 | text |
c2 | text |
c3 | text |
Type: readable
Encoding: UTF8
Format type: text
Format options: delimiter '|' null '\N' escape '\'
External options: {}
External location: gpfdist://gp-smdw:8080/demo.txt
Execute on: all segments

 

四、smdw开启gpdfist服务

 

[gpadmin@gp-smdw ~]$ nohup gpfdist >gpfdist.log
nohup: ignoring input and redirecting stderr to stdout

 

[gpadmin@gp-smdw ~]$ cat gpfdist.log
2022-12-17 15:47:34 82596 INFO Before opening listening sockets - following listening sockets are available:
2022-12-17 15:47:34 82596 INFO IPV6 socket: [::]:8080
2022-12-17 15:47:34 82596 INFO IPV4 socket: 0.0.0.0:8080
2022-12-17 15:47:34 82596 INFO Trying to open listening socket:
2022-12-17 15:47:34 82596 INFO IPV6 socket: [::]:8080
2022-12-17 15:47:34 82596 INFO Opening listening socket succeeded
2022-12-17 15:47:34 82596 INFO Trying to open listening socket:
2022-12-17 15:47:34 82596 INFO IPV4 socket: 0.0.0.0:8080
2022-12-17 15:47:34 82596 INFO Opening listening socket succeeded
Serving HTTP on port 8080, directory /home/gpadmin

五、smdw验证测试(生产中用etl机器)

[gpadmin@gp-smdw ~]$ curl -H "X-GP-PROTO:0" gp-smdw:8080/demo.txt
This|is|andyxi
GP|etl|test

 

[gpadmin@gp-smdw ~]$ curl gp-smdw:8080/demo.txt
[gpadmin@gp-smdw ~]$

[gpadmin@gp-smdw ~]$ curl -v gp-smdw:8080/demo.txt
* About to connect() to gp-smdw port 8080 (#0)
* Trying 192.168.159.143...
* Connected to gp-smdw (192.168.159.143) port 8080 (#0)
> GET /demo.txt HTTP/1.1
> User-Agent: curl/7.29.0
> Host: gp-smdw:8080
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 invalid request (no gp-proto)
< Content-length: 0
< Expires: 0
< X-GPFDIST-VERSION: 6.22.0 build commit:4b6c079bc3aed35b2f161c377e208185f9310a69 Open Source
< Cache-Control: no-cache
< Connection: close
<
* Closing connection 0

 

posted @ 2022-12-17 15:37  青空如璃  阅读(270)  评论(0编辑  收藏  举报