Flink(五)Flink开发IDEA环境搭建与测试(2)

用IDEA开发实时程序--流式处理数据案例--WordcountStreaming

1Scala代码

import org.apache.flink.api.java.utils.ParameterTool

import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment

import org.apache.flink.streaming.api.windowing.time.Time

 

object SocketWindowWordCountScala {

  def main(args: Array[String]) : Unit = {

    // 定义一个数据类型保存单词出现的次数

    case class WordWithCount(word: String, count: Long)

    // port 表示需要连接的端口

    val port: Int = try {

      ParameterTool.fromArgs(args).getInt("port")

    } catch {

      case e: Exception => {

        System.err.println("No port specified. Please run 'SocketWindowWordCount --port <port>'")

        return

      }

    }

    // 获取运行环境

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    // 连接此socket获取输入数据

    val text = env.socketTextStream("node21", port, '\n')

    //需要加上这一行隐式转换 否则在调用flatmap方法的时候会报错

    import org.apache.flink.api.scala._

    // 解析数据, 分组, 窗口化, 并且聚合求SUM

    val windowCounts = text

      .flatMap { w => w.split("\\s") }

      .map { w => WordWithCount(w, 1) }

      .keyBy("word")

      .timeWindow(Time.seconds(5), Time.seconds(1))

      .sum("count")

    // 打印输出并设置使用一个并行度

    windowCounts.print().setParallelism(1)

    env.execute("Socket Window WordCount")

  }

}

----自己操作----

Scala代码:

package WordCount

import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

/**
  * @Author : ASUS and xinrong
  * @Version : 2020/8/29
  *
  *  流式处理数据-WordCountStreaming
  */
object WordCountStreaming2 {
  def main(args: Array[String]): Unit = {
    //一、环境
    val eonvironment=StreamExecutionEnvironment.getExecutionEnvironment;
    //二、接入端口
    val text= eonvironment.socketTextStream("192.168.212.111", 9000, '\n')
    //三、分词
    val windowCounts=text
      .flatMap(w=>w.split(" "))
      .map(w=>WordWithCounts(w,1L))//自定义类
      .keyBy("word")
      .timeWindow(Time.seconds(5), Time.seconds(1))
      .sum("count")
    //打印
    windowCounts.print()
    //执行
    eonvironment.execute("Scala Window")
  }
  case class WordWithCounts(word:String,count:Long)
}

测试:

首先,使用nc命令启动一个本地监听,命令是:

[root@bigdata111 flink-1.6.2]# nc -l 9000

启动IDEA中的程序

输入数据-1:

 

观察-1:

 

接着快速连着输入6个a:

 

查看IDEA中结果:

 

2Java代码

import org.apache.flink.api.common.functions.FlatMapFunction;

import org.apache.flink.api.java.utils.ParameterTool;

import org.apache.flink.streaming.api.datastream.DataStream;

import org.apache.flink.streaming.api.datastream.DataStreamSource;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import org.apache.flink.streaming.api.windowing.time.Time;

import org.apache.flink.util.Collector;

 

public class WordCount {

    //先在虚拟机上打开你的端口号 nc -l 9000

    public static void main(String[] args) throws Exception {

        //定义socket的端口号

        int port;

        try{

            ParameterTool parameterTool = ParameterTool.fromArgs(args);

            port = parameterTool.getInt("port");

        }catch (Exception e){

            System.err.println("没有指定port参数,使用默认值9000");

            port = 9000;

        }

 

        //获取运行环境

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

 

        //连接socket获取输入的数据

        DataStreamSource<String> text = env.socketTextStream("192.168.1.52", port, "\n");

 

        //计算数据

        DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {

            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {

                String[] splits = value.split("\\s");

                for (String word:splits) {

                    out.collect(new WordWithCount(word,1L));

                }

            }

        })//打平操作,把每行的单词转为<word,count>类型的数据

                .keyBy("word")//针对相同的word数据进行分组

                .timeWindow(Time.seconds(2),Time.seconds(1))//指定计算数据的窗口大小和滑动窗口大小

                .sum("count");

 

        //把数据打印到控制台

        windowCount.print()

                .setParallelism(1);//使用一个并行度

        //注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行

        env.execute("streaming word count");

 

    }

 

    /**

     * 主要为了存储单词以及单词出现的次数

     */

    public static class WordWithCount{

        public String word;

        public long count;

        public WordWithCount(){}

        public WordWithCount(String word, long count) {

            this.word = word;

            this.count = count;

        }

 

        @Override

        public String toString() {

            return "WordWithCount{" +

                    "word='" + word + '\'' +

                    ", count=" + count +

                    '}';

        }

    }

}

运行测试

首先,使用nc命令启动一个本地监听,命令是:

[itstar@node21 ~]$ nc -l 9000

启动监听如果报错:-bash: nc: command not found,请先安装nc,在线安装命令:yum -y install nc

(通过netstat命令观察9000端口: netstat -anlp | grep 9000)

然后,IDEA上运行flink官方案例程序

node21上输入

 

集群测试

这里单机测试官方案例

[itstar@node21 flink-1.6.1]$ pwd

/opt/flink-1.6.1

[itstar@node21 flink-1.6.1]$ ./bin/start-cluster.sh

Starting cluster.

Starting standalonesession daemon on host node21.

Starting taskexecutor daemon on host node21.

[itstar@node21 flink-1.6.1]$ jps

StandaloneSessionClusterEntrypoint

TaskManagerRunner

Jps

[itstar@node21 flink-1.6.1]$ ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

单词在5秒的时间窗口(处理时间,翻滚窗口)中计算并打印到stdout。监视TaskManager的输出文件并写入一些文本nc(输入在点击后逐行发送到Flink):

 

----自己操作----

Java代码:

package WordCount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

/**
 * @Author : ASUS and xinrong
 * @Version : 2020/8/29 & 1.0
 *
 * 流式处理数据-WordCountStreaming
 */
public class WordCountStreaming {
    public static void main(String[] args) throws Exception {
        //一、创建一个端口号
        int port=9000;
        //二、运行时的环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //三、创建Source
        DataStreamSource<String> text = env.socketTextStream("192.168.212.111", port, '\n');
        //四、转换数据-自定义结果数据类
        DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
            @Override
            public void flatMap(String line, Collector<WordWithCount> out) throws Exception {
                for (String word : line.split(" ")) {
                    out.collect(new WordWithCount(word, 1L));
                }
            }
        }).keyBy("word")
                .timeWindow(Time.seconds(2), Time.seconds(1))//时间窗口(窗口大小,每次滑动的秒数)
                .sum("count");
        windowCount.print();
        env.execute("Streaming word Count");//执行(添加名字)
    }
    /**
     * 自定义结果类
     */
    public static class WordWithCount {
        public String word;
        public Long count;

        public WordWithCount() {
        }


        public WordWithCount(String word, Long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return "WordWithCount{" +
                    "word='" + word + '\'' +
                    ", count=" + count +
                    '}';
        }
    }
}

测试: 

首先,使用nc命令启动一个本地监听,命令是:

[root@bigdata111 flink-1.6.2]# nc -l 9000

启动监听如果报错:-bash: nc: command not found,请先安装nc

先虚拟机联网,然后执行yum -y install nc

nc是用来打开端口的工具

然后nc -l 9000

然后,IDEA上运行flink官方案例程序

bigdata111上输入:

 

IDEA上执行日可看到:

 

 

 

博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3