【Flink系列十八】History Server 重新登场，如何实现Yarn日志集成

本文适用于Flink-1.11+

HistoryServer 至少Flink-1.16+

先看Flink的官方文档

JobManager

The archiving of completed jobs happens on the JobManager, which uploads the archived job information to a file system directory. You can configure the directory to archive completed jobs in flink-conf.yaml by setting a directory via jobmanager.archive.fs.dir.

# Directory to upload completed job information
jobmanager.archive.fs.dir: hdfs:///completed-jobs

HistoryServer

The HistoryServer can be configured to monitor a comma-separated list of directories in via historyserver.archive.fs.dir. The configured directories are regularly polled for new archives; the polling interval can be configured via historyserver.archive.fs.refresh-interval.

# Monitor the following directories for completed jobs
historyserver.archive.fs.dir: hdfs:///completed-jobs

# Refresh every 10 seconds
historyserver.archive.fs.refresh-interval: 10000
The contained archives are downloaded and cached in the local filesystem. The local directory for this is configured via historyserver.web.tmpdir.

Check out the configuration page for a complete list of configuration options.

Log Integration

Flink does not provide built-in methods for archiving logs of completed jobs. However, if you already have log archiving and browsing services, you can configure HistoryServer to integrate them (via historyserver.log.jobmanager.url-pattern and historyserver.log.taskmanager.url-pattern). In this way, you can directly link from HistoryServer WebUI to logs of the relevant JobManager / TaskManagers.

# HistoryServer will replace <jobid> with the relevant job id
historyserver.log.jobmanager.url-pattern: http://my.log-browsing.url/<jobid>

# HistoryServer will replace <jobid> and <tmid> with the relevant job id and taskmanager id
historyserver.log.taskmanager.url-pattern: http://my.log-browsing.url/<jobid>/<tmid>

集成方案

日志集成部分说明了，flink的History UI 提供两种URL链接，实现一个log-browsing的服务，则可以不修改源码的方式直接访问Yarn的日志。

那么在现有的实时计算平台，直接实现一个地址转换器是成本最低，维护最简单的方案。

前提

已经做了ApplicationId和 Flink Job Id关联
部署方式基于On Yarn Per-Job

这里说明一下，以下解决方案仅供参考
这里说明一下，以下解决方案仅供参考
这里说明一下，以下解决方案仅供参考

如何获取 JobManager 日志链接

例如 http://flink.slankka.com/<jobId> ，可以根据jobId查找作业的实例历史记录，找到对应的applicationId，接着查询Yarn Rest API，获取，拼接出Yarn的JobManager的URL。

Yarn Rest API /ws/v1/cluster/apps/{appid} ，日志的URL就在在返回值内JSONPath:app/amContainerLogs。

如何获取TaskManager 日志链接

http://flink.slankka.com/<jobId>/<tmId>，则有些不同：

通过History UI的Restapi, /jobs/{jobid}，获得 /vertices, 得到vertice ID
通过History UI的Restapi, /jobs/{jobid}/vertices/{vertexid}/taskmanagers，获得TaskManager的数值。
通过taskmanager-id获得 NodeManager的短名称
短名称拼上Yarn的完整Server域名

举一个例子

这里是一个TaskManager的Host，它不完整，但是和Yarn的域名前缀是吻合的。
因此拼接出: ddn130160.yarn.slankka.com 即可。

最终的URL地址例子：

http://hist.yarn.slankka.com:19888/jobhistory/logs/ddn130160.yarn.slankka.com:8041/container_e15_1665284980006_8340_01_000002/container_e15_1665284980006_8340_01_000002/slankka

posted @ 2023-05-31 12:01 一杯半盏阅读(739) 评论(0) 收藏举报

刷新页面返回顶部

一杯半盏