centos7 安装测试clickhouse

一 系统要求

ClickHouse可以在任何具有x86_64,AArch64或PowerPC64LE CPU架构的Linux,FreeBSD或Mac OS X上运行。

虽然预构建的二进制文件通常是为x86 _64编译并利用SSE 4.2指令集,但除非另有说明,否则使用支持它的CPU将成为额外的系统要求。这是检查当前CPU是否支持SSE 4.2的命令:

$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

二 安装和启动

首先,您需要添加官方存储库:

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64

如果您想使用最新版本,请将stable替换为testing(建议您在测试环境中使用)。

然后运行这些命令以实际安装包:

sudo yum install clickhouse-server clickhouse-client

可以运行如下命令在后台启动服务:

sudo service clickhouse-server start
可以在/var/log/clickhouse-server/目录中查看日志。

如果服务没有启动,请检查配置文件 /etc/clickhouse-server/config.xml。

你可以使用命令行客户端连接到服务:

clickhouse-client

默认情况下它使用’default’用户无密码的与localhost:9000服务建立连接。
客户端也可以用于连接远程服务,例如:

clickhouse-client --host=example.com

检查系统是否工作:

milovidov@hostname:~/work/metrica/src/src/Client$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.

😃 SELECT 1

SELECT 1

┌─1─┐
│ 1 │
└───┘

1 rows in set. Elapsed: 0.003 sec.

😃

三 导入示例数据集

现在是时候用一些示例数据填充我们的ClickHouse服务器。 在本教程中,我们将使用Yandex的匿名数据。Metrica,在成为开源之前以生产方式运行ClickHouse的第一个服务(更多关于这一点 历史科). 有 多种导入Yandex的方式。梅里卡数据集,为了本教程,我们将使用最现实的一个。

下载并提取表数据

curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=nproc > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=nproc > visits_v1.tsv

提取的文件大小约为10GB。

如果国外下载慢 可以使用网盘下载数据文件

链接: https://pan.baidu.com/s/1LFzoWq-IdVONJra1lHN-PA 提取码: 4zez

创建表

与大多数数据库管理系统一样,ClickHouse在逻辑上将表分组为 “databases”. 有一个 default 数据库,但我们将创建一个名为新的 tutorial:

clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"

与数据库相比,创建表的语法要复杂得多(请参阅 参考资料. 一般 CREATE TABLE 声明必须指定三个关键的事情:

要创建的表的名称。
Table schema, i.e. list of columns and their 数据类型.

表引擎 及其设置,这决定了如何物理执行对此表的查询的所有细节。
YandexMetrica是一个网络分析服务,样本数据集不包括其全部功能,因此只有两个表可以创建:

hits 是一个表格,其中包含所有用户在服务所涵盖的所有网站上完成的每个操作。
visits 是一个包含预先构建的会话而不是单个操作的表。

打开客户端多行sql执行窗口

clickhouse-client -m

让我们看看并执行这些表的实际创建表查询:

CREATE TABLE tutorial.hits_v1
(
WatchID UInt64,
JavaEnable UInt8,
Title String,
GoodEvent Int16,
EventTime DateTime,
EventDate Date,
CounterID UInt32,
ClientIP UInt32,
ClientIP6 FixedString(16),
RegionID UInt32,
UserID UInt64,
CounterClass Int8,
OS UInt8,
UserAgent UInt8,
URL String,
Referer String,
URLDomain String,
RefererDomain String,
Refresh UInt8,
IsRobot UInt8,
RefererCategories Array(UInt16),
URLCategories Array(UInt16),
URLRegions Array(UInt32),
RefererRegions Array(UInt32),
ResolutionWidth UInt16,
ResolutionHeight UInt16,
ResolutionDepth UInt8,
FlashMajor UInt8,
FlashMinor UInt8,
FlashMinor2 String,
NetMajor UInt8,
NetMinor UInt8,
UserAgentMajor UInt16,
UserAgentMinor FixedString(2),
CookieEnable UInt8,
JavascriptEnable UInt8,
IsMobile UInt8,
MobilePhone UInt8,
MobilePhoneModel String,
Params String,
IPNetworkID UInt32,
TraficSourceID Int8,
SearchEngineID UInt16,
SearchPhrase String,
AdvEngineID UInt8,
IsArtifical UInt8,
WindowClientWidth UInt16,
WindowClientHeight UInt16,
ClientTimeZone Int16,
ClientEventTime DateTime,
SilverlightVersion1 UInt8,
SilverlightVersion2 UInt8,
SilverlightVersion3 UInt32,
SilverlightVersion4 UInt16,
PageCharset String,
CodeVersion UInt32,
IsLink UInt8,
IsDownload UInt8,
IsNotBounce UInt8,
FUniqID UInt64,
HID UInt32,
IsOldCounter UInt8,
IsEvent UInt8,
IsParameter UInt8,
DontCountHits UInt8,
WithHash UInt8,
HitColor FixedString(1),
UTCEventTime DateTime,
Age UInt8,
Sex UInt8,
Income UInt8,
Interests UInt16,
Robotness UInt8,
GeneralInterests Array(UInt16),
RemoteIP UInt32,
RemoteIP6 FixedString(16),
WindowName Int32,
OpenerName Int32,
HistoryLength Int16,
BrowserLanguage FixedString(2),
BrowserCountry FixedString(2),
SocialNetwork String,
SocialAction String,
HTTPError UInt16,
SendTiming Int32,
DNSTiming Int32,
ConnectTiming Int32,
ResponseStartTiming Int32,
ResponseEndTiming Int32,
FetchTiming Int32,
RedirectTiming Int32,
DOMInteractiveTiming Int32,
DOMContentLoadedTiming Int32,
DOMCompleteTiming Int32,
LoadEventStartTiming Int32,
LoadEventEndTiming Int32,
NSToDOMContentLoadedTiming Int32,
FirstPaintTiming Int32,
RedirectCount Int8,
SocialSourceNetworkID UInt8,
SocialSourcePage String,
ParamPrice Int64,
ParamOrderID String,
ParamCurrency FixedString(3),
ParamCurrencyID UInt16,
GoalsReached Array(UInt32),
OpenstatServiceName String,
OpenstatCampaignID String,
OpenstatAdID String,
OpenstatSourceID String,
UTMSource String,
UTMMedium String,
UTMCampaign String,
UTMContent String,
UTMTerm String,
FromTag String,
HasGCLID UInt8,
RefererHash UInt64,
URLHash UInt64,
CLID UInt32,
YCLID UInt64,
ShareService String,
ShareURL String,
ShareTitle String,
ParsedParams Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
IslandID FixedString(16),
RequestNum UInt32,
RequestTry UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192

CREATE TABLE tutorial.visits_v1
(
CounterID UInt32,
StartDate Date,
Sign Int8,
IsNew UInt8,
VisitID UInt64,
UserID UInt64,
StartTime DateTime,
Duration UInt32,
UTCStartTime DateTime,
PageViews Int32,
Hits Int32,
IsBounce UInt8,
Referer String,
StartURL String,
RefererDomain String,
StartURLDomain String,
EndURL String,
LinkURL String,
IsDownload UInt8,
TraficSourceID Int8,
SearchEngineID UInt16,
SearchPhrase String,
AdvEngineID UInt8,
PlaceID Int32,
RefererCategories Array(UInt16),
URLCategories Array(UInt16),
URLRegions Array(UInt32),
RefererRegions Array(UInt32),
IsYandex UInt8,
GoalReachesDepth Int32,
GoalReachesURL Int32,
GoalReachesAny Int32,
SocialSourceNetworkID UInt8,
SocialSourcePage String,
MobilePhoneModel String,
ClientEventTime DateTime,
RegionID UInt32,
ClientIP UInt32,
ClientIP6 FixedString(16),
RemoteIP UInt32,
RemoteIP6 FixedString(16),
IPNetworkID UInt32,
SilverlightVersion3 UInt32,
CodeVersion UInt32,
ResolutionWidth UInt16,
ResolutionHeight UInt16,
UserAgentMajor UInt16,
UserAgentMinor UInt16,
WindowClientWidth UInt16,
WindowClientHeight UInt16,
SilverlightVersion2 UInt8,
SilverlightVersion4 UInt16,
FlashVersion3 UInt16,
FlashVersion4 UInt16,
ClientTimeZone Int16,
OS UInt8,
UserAgent UInt8,
ResolutionDepth UInt8,
FlashMajor UInt8,
FlashMinor UInt8,
NetMajor UInt8,
NetMinor UInt8,
MobilePhone UInt8,
SilverlightVersion1 UInt8,
Age UInt8,
Sex UInt8,
Income UInt8,
JavaEnable UInt8,
CookieEnable UInt8,
JavascriptEnable UInt8,
IsMobile UInt8,
BrowserLanguage UInt16,
BrowserCountry UInt16,
Interests UInt16,
Robotness UInt8,
GeneralInterests Array(UInt16),
Params Array(String),
Goals Nested(
ID UInt32,
Serial UInt32,
EventTime DateTime,
Price Int64,
OrderID String,
CurrencyID UInt32),
WatchIDs Array(UInt64),
ParamSumPrice Int64,
ParamCurrency FixedString(3),
ParamCurrencyID UInt16,
ClickLogID UInt64,
ClickEventID Int32,
ClickGoodEvent Int32,
ClickEventTime DateTime,
ClickPriorityID Int32,
ClickPhraseID Int32,
ClickPageID Int32,
ClickPlaceID Int32,
ClickTypeID Int32,
ClickResourceID Int32,
ClickCost UInt32,
ClickClientIP UInt32,
ClickDomainID UInt32,
ClickURL String,
ClickAttempt UInt8,
ClickOrderID UInt32,
ClickBannerID UInt32,
ClickMarketCategoryID UInt32,
ClickMarketPP UInt32,
ClickMarketCategoryName String,
ClickMarketPPName String,
ClickAWAPSCampaignName String,
ClickPageName String,
ClickTargetType UInt16,
ClickTargetPhraseID UInt64,
ClickContextType UInt8,
ClickSelectType Int8,
ClickOptions String,
ClickGroupBannerID Int32,
OpenstatServiceName String,
OpenstatCampaignID String,
OpenstatAdID String,
OpenstatSourceID String,
UTMSource String,
UTMMedium String,
UTMCampaign String,
UTMContent String,
UTMTerm String,
FromTag String,
HasGCLID UInt8,
FirstVisit DateTime,
PredLastVisit Date,
LastVisit Date,
TotalVisits UInt32,
TraficSource Nested(
ID Int8,
SearchEngineID UInt16,
AdvEngineID UInt8,
PlaceID UInt16,
SocialSourceNetworkID UInt8,
Domain String,
SearchPhrase String,
SocialSourcePage String),
Attendance FixedString(16),
CLID UInt32,
YCLID UInt64,
NormalizedRefererHash UInt64,
SearchPhraseHash UInt64,
RefererDomainHash UInt64,
NormalizedStartURLHash UInt64,
StartURLDomainHash UInt64,
NormalizedEndURLHash UInt64,
TopLevelDomain UInt64,
URLScheme UInt64,
OpenstatServiceNameHash UInt64,
OpenstatCampaignIDHash UInt64,
OpenstatAdIDHash UInt64,
OpenstatSourceIDHash UInt64,
UTMSourceHash UInt64,
UTMMediumHash UInt64,
UTMCampaignHash UInt64,
UTMContentHash UInt64,
UTMTermHash UInt64,
FromHash UInt64,
WebVisorEnabled UInt8,
WebVisorActivity UInt32,
ParsedParams Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
Market Nested(
Type UInt8,
GoalID UInt32,
OrderID String,
OrderPrice Int64,
PP UInt32,
DirectPlaceID UInt32,
DirectOrderID UInt32,
DirectBannerID UInt32,
GoodID String,
GoodName String,
GoodQuantity Int32,
GoodPrice Int64),
IslandID FixedString(16)
)
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192

您可以使用以下交互模式执行这些查询 clickhouse-client (只需在终端中启动它,而不需要提前指定查询)或尝试一些 替代接口 如果你愿意的话

正如我们所看到的, hits_v1 使用 基本MergeTree引擎,而 visits_v1 使用 崩溃 变体。

导入数据

数据导入到ClickHouse是通过以下方式完成的 INSERT INTO 查询像许多其他SQL数据库。 然而,数据通常是在一个提供 支持的序列化格式 而不是 VALUES 子句(也支持)。

我们之前下载的文件是以制表符分隔的格式,所以这里是如何通过控制台客户端导入它们:

clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv

ClickHouse有很多 要调整的设置 在控制台客户端中指定它们的一种方法是通过参数,我们可以看到 --max_insert_block_size. 找出可用的设置,它们意味着什么以及默认值的最简单方法是查询 system.settings 表:

SELECT name, value, changed, description
FROM system.settings
WHERE name LIKE '%max_insert_b%'
FORMAT TSV

max_insert_block_size 1048576 0 "The maximum block size for insertion, if we control the creation of blocks for insertion."

您也可以 OPTIMIZE 导入后的表。 使用MergeTree-family引擎配置的表总是在后台合并数据部分以优化数据存储(或至少检查是否有意义)。 这些查询强制表引擎立即进行存储优化,而不是稍后进行一段时间:

clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"

这些查询开始一个I/O和CPU密集型操作,所以如果表一直接收到新数据,最好不要管它,让合并在后台运行。

查询测试

现在我们可以检查表导入是否成功:

clickhouse-client --query "SELECT COUNT() FROM tutorial.hits_v1"
clickhouse-client --query "SELECT COUNT(
) FROM tutorial.visits_v1"

查询示例

SELECT
StartURL AS URL,
AVG(Duration) AS AvgDuration
FROM tutorial.visits_v1
WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
GROUP BY URL
ORDER BY AvgDuration DESC
LIMIT 10

SELECT
sum(Sign) AS visits,
sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
(100. * goal_visits) / visits AS goal_percent
FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')

四 win客户端管理软件-DBeaver

软件下载地址
https://dbeaver.io/download/

国内网盘下载 地址同上

修改config和添加用户

vi /etc/clickhouse/config.xml
#监听host改下
<listen_host>0.0.0.0</listen_host>

vi /etc/clickhouse/user.xml
#添加root用户 密码root






::/0

default
default


root

::/0

default
default

DBeaver新建clickhouse连接

查询测试

posted @ 2020-07-11 09:24  sentangle  阅读(742)  评论(0)    收藏  举报