一、划分Task的依据

用例:

        DataStream<String> lines = env.socketTextStream(args[0], Integer.parseInt(args[1]));
        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String line,
                    Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] words = line.split(" ");
                for(String word : words) {
                     Tuple2<String, Integer> tp = Tuple2.of(word, 1);
                     out.collect(tp);
                }
            }
        });
        
        SingleOutputStreamOperator<Tuple2<String, Integer>> sumned = wordAndOne.keyBy(0).sum(1);
        //调用Sink(Sink必须调用)
        sumned.print();

Flink界面视图如图所示:

 

 共有3个task(state),9个subtask,其中souce->flatMap并行度发生了改变,划分了一个task,flatMap—>keyby由于keyby 进行了hash分组划分一个task

 二、startNewChain和disableChaining(改变task划分的算子)

官网地址:https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/runtime.html

 startNewChain:从该算子开始,开启一个新的链,从这个算子之前,发生redistributing

disableChaining:将这个算子单独划分处理,生成一个Task,跟其他的算子不再有operator chain

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        
        DataStream<String> lines = env.socketTextStream(args[0], Integer.parseInt(args[1]));
        SingleOutputStreamOperator<String> word = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String line,
                    Collector<String> out) throws Exception {
                String[] words = line.split(" ");
                for(String word : words) {
                     out.collect(word);
                }
            }
        });
        
        SingleOutputStreamOperator<String> filterd = word.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.startsWith("h");
            }
        });
        
        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = filterd.map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String value) throws Exception {
                return Tuple2.of(value, 1);
            }
        });
        
        SingleOutputStreamOperator<Tuple2<String, Integer>> sumned = wordAndOne.keyBy(0).sum(1);
        sumned.print();
        env.execute("StreamWordCount");
    }

任务计划图:

 

 filter算子使用starNewChain后:

        SingleOutputStreamOperator<String> filterd = word.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.startsWith("h");
            }
        }).startNewChain();

任务计划图:

 

 filter算子使用过disableChaining后

        SingleOutputStreamOperator<String> filterd = word.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return value.startsWith("h");
            }
        }).disableChaining();

任务计划图:

 三、共享资源槽

 

 资源槽名字不设置默认为deault,设置后,当前算子及后面的算子会用新的资源槽。例:

        SingleOutputStreamOperator<String> word = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String line,
                    Collector<String> out) throws Exception {
                String[] words = line.split(" ");
                for(String word : words) {
                     out.collect(word);
                }
            }
        }).slotSharingGroup("doit");

重新设置资源槽名字后,此时并行度要改为3方可运行,因为一共4个资源槽,source用的默认资源槽