Flink基础(五):DS简介(5) 开发环境准备以及编写第一个Flink程序
1 在IDEA中编写Flink程序
Scala版Flink程序编写
本项目使用的Flink版本为最新版本,也就是1.11.0。现在提供maven项目的配置文件。
- 使用Intellij IDEA创建一个Maven新项目
- 勾选
Create from archetype
,然后点击Add Archetype
按钮 GroupId
中输入org.apache.flink
,ArtifactId
中输入flink-quickstart-scala
,Version
中输入1.11.0
,然后点击OK
- 点击向右箭头,出现下拉列表,选中
flink-quickstart-scala:1.11.0
,点击Next
Name
中输入FlinkTutorial
,GroupId
中输入com.atguigu
,ArtifactId
中输入FlinkTutorial
,点击Next
- 最好使用IDEA默认的Maven工具:Bundled(Maven 3),点击
Finish
,等待一会儿,项目就创建好了
编写WordCount.scala
程序
import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.windowing.time.Time object StreamingJob { /** Main program method */ def main(args: Array[String]) : Unit = { // get the execution environment StreamExecutionEnvironment env: StreamExecutionEnvironment = StreamExecutionEnvironment .getExecutionEnvironment // get input data by connecting to the socket val text: DataStream[String] = env .socketTextStream("localhost", 9999, '\n') // parse the data, group it, window it, and aggregate the counts val windowCounts = text .flatMap { w => w.split("\\s") } .map { w => WordWithCount(w, 1) } .keyBy("word") .timeWindow(Time.seconds(5)) .sum("count") // print the results with a single thread, rather than in parallel windowCounts .print() .setParallelism(1) env.execute("Socket Window WordCount") } /** Data type for words with count */ case class WordWithCount(word: String, count: Long) }
打开一个终端(Terminal),运行以下命令
$ nc -lk 9999
接下来使用IDEA
运行就可以了。
Java版Flink程序编写
- 使用Intellij IDEA创建一个Maven新项目
- 勾选
Create from archetype
,然后点击Add Archetype
按钮 GroupId
中输入org.apache.flink
,ArtifactId
中输入flink-quickstart-java
,Version
中输入1.14.4
,然后点击OK
- 点击向右箭头,出现下拉列表,选中
flink-quickstart-java:1.14.4
,点击Next
Name
中输入FlinkTutorial
,GroupId
中输入com.atguigu
,ArtifactId
中输入FlinkTutorial
,点击Next
- 最好使用IDEA默认的Maven工具:Bundled(Maven 3),点击
Finish
,等待一会儿,项目就创建好了
pom.xml
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>flink_test</artifactId> <version>1.0-SNAPSHOT</version> <packaging>jar</packaging> <name>Flink Quickstart Job</name> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <flink.version>1.14.4</flink.version> <target.java.version>1.8</target.java.version> <scala.binary.version>2.11</scala.binary.version> <maven.compiler.source>${target.java.version}</maven.compiler.source> <maven.compiler.target>${target.java.version}</maven.compiler.target> <log4j.version>2.17.1</log4j.version> </properties> <repositories> <repository> <id>apache.snapshots</id> <name>Apache Development Snapshot Repository</name> <url>https://repository.apache.org/content/repositories/snapshots/</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories> <dependencies> <!-- Apache Flink dependencies --> <!-- These dependencies are provided, because they should not be packaged into the JAR file. --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_${scala.binary.version}</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_${scala.binary.version}</artifactId> <version>${flink.version}</version> <scope>provided</scope> </dependency> <!-- Add connector dependencies here. They must be in the default scope (compile). --> <!-- Example: <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> --> <!-- Add logging framework, to produce console output when running in the IDE. --> <!-- These dependencies are excluded from the application JAR by default. --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>${log4j.version}</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>${log4j.version}</version> <scope>runtime</scope> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>${log4j.version}</version> <scope>runtime</scope> </dependency> </dependencies> <build> <plugins> <!-- Java Compiler --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <configuration> <source>${target.java.version}</source> <target>${target.java.version}</target> </configuration> </plugin> <!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. --> <!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.1.1</version> <executions> <!-- Run shade goal on package phase --> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <artifactSet> <excludes> <exclude>org.apache.flink:flink-shaded-force-shading</exclude> <exclude>com.google.code.findbugs:jsr305</exclude> <exclude>org.slf4j:*</exclude> <exclude>org.apache.logging.log4j:*</exclude> </excludes> </artifactSet> <filters> <filter> <!-- Do not copy the signatures in the META-INF folder. Otherwise, this might cause SecurityExceptions when using the JAR. --> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.example.StreamingJob</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> <pluginManagement> <plugins> <!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. --> <plugin> <groupId>org.eclipse.m2e</groupId> <artifactId>lifecycle-mapping</artifactId> <version>1.0.0</version> <configuration> <lifecycleMappingMetadata> <pluginExecutions> <pluginExecution> <pluginExecutionFilter> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <versionRange>[3.1.1,)</versionRange> <goals> <goal>shade</goal> </goals> </pluginExecutionFilter> <action> <ignore/> </action> </pluginExecution> <pluginExecution> <pluginExecutionFilter> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <versionRange>[3.1,)</versionRange> <goals> <goal>testCompile</goal> <goal>compile</goal> </goals> </pluginExecutionFilter> <action> <ignore/> </action> </pluginExecution> </pluginExecutions> </lifecycleMappingMetadata> </configuration> </plugin> </plugins> </pluginManagement> </build> </project>
编写WordCount.java
程序
import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.util.Collector; public class WordCountFromSocket { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStream<String> stream = env.socketTextStream("localhost", 9999); stream.flatMap(new Tokenizer()).keyBy(r -> r.f0).sum(1).print(); env.execute("Flink Streaming Java API Skeleton"); } public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception { String[] stringList = value.split("\\s"); for (String s : stringList) { // 使用out.collect方法向下游发送数据 out.collect(new Tuple2(s, 1)); } } } }
2 下载Flink运行时环境,提交Jar包的运行方式
下载链接:http://mirror.bit.edu.cn/apache/flink/flink-1.11.1/flink-1.11.1-bin-scala_2.11.tgz
然后解压
$ tar xvfz flink-1.11.1-bin-scala_2.11.tgz
启动Flink集群
$ cd flink-1.11.1
$ ./bin/start-cluster.sh
可以打开Flink WebUI查看集群状态:http://localhost:8081
在IDEA
中使用maven package
打包。
提交打包好的JAR
包
$ cd flink-1.11.1
$ ./bin/flink run 打包好的JAR包的绝对路径
停止Flink集群
$ ./bin/stop-cluster.sh
查看标准输出日志的位置,在log
文件夹中。
$ cd flink-1.11.1/log
本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/13412865.html