一、串行的Source(socketTextStream、fromElements、fromCollection)
/** * 并行度为1的source */ public class SourceDemo { public static void main(String[] args) throws Exception { //实时计算,创建一个实现的执行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //创建抽象的数据集[创建原始的抽象数据集的方法:Source] //DataStream是一个抽象的数据集 //1.socketTextStream // DataStream<String> socketTextStream = env.socketTextStream("localhost", 8888); //将客户端的集合并行化成一个抽象的数据集,通常是用来做测试和实验 //2、fromElements是一个有界的数据量,虽然是一个实时计算程序,但是数据处理完,程序就会退出 // DataStream<Integer> nums = env.fromElements(1,2,3,4,5,6,7,8,9); //3、fromCollection DataStream<Integer> nums = env.fromCollection(Arrays.asList(1,2,3,4,5,6,7,8,9)); //获取source的并行度 System.out.println("=====================source并行度==========:" + nums.getParallelism()); SingleOutputStreamOperator<Integer> filtered = nums.filter(new FilterFunction<Integer>() { @Override public boolean filter(Integer num) throws Exception { return num % 2 == 0; } }); //获取transformation并行度 System.out.println("=====================transformation并行度==========:" + filtered.getParallelism()); filtered.print(); env.execute("SourceDemo"); } }
程序运行打印:
=====================source并行度==========:1
=====================transformation并行度==========:4
二、并行Source(fromParallelCollection、generateSequence)
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //fromParallelCollection DataStream<Long> nums = env.fromParallelCollection(new NumberSequenceIterator(1, 10), Long.class); // DataStream<Long> nums = env.generateSequence(1, 100); //获取source的并行度 System.out.println("=====================source并行度==========:" + nums.getParallelism()); SingleOutputStreamOperator<Long> filtered = nums.filter(new FilterFunction<Long>() { @Override public boolean filter(Long num) throws Exception { return num % 2 == 0; } }); //获取transformation并行度 System.out.println("=====================transformation并行度==========:" + filtered.getParallelism()); filtered.print(); env.execute("SourceDemo");
程序打印:
=====================source并行度==========:4
=====================transformation并行度==========:4
有界数据流,程序运行完后自动停止
三、并行Source(readTextFile)
DataStream<String> lines = env.readTextFile(args[0]);
同上,source也是多并行的,不设置默认与本机机器核数有关,它也是个有界数据流,程序读取完后自动停止
四、并行Source(从kafka读数据)
添加依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>1.10.0</version>
</dependency>
public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties props = new Properties(); //指定kafka的Broke地址 props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.87.131:9092,192.168.87.131:9092"); //指定组ID props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaSource"); //如果没有记录偏移量,第一次从最开始消费 props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); //kafka消费者自动提交偏移量 props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true"); //KafkaSource FlinkKafkaConsumer<String> kafkaSource = new FlinkKafkaConsumer<String>("flinktest1", new SimpleStringSchema(), props); DataStream<String> lines = env.addSource(kafkaSource); System.out.println("=================kafkaSource并行度" + lines.getParallelism() +"================"); //Sink lines.print(); env.execute("KafkaSource"); }
程序打印:
=================kafkaSource并行度4================
启动kafka生产者写数据,程序会自动接收并打印相关数据
--创建topic
bin/kafka-topics.sh --create --zookeeper 192.168.87.130:2181,192.168.87.131:2181,192.168.87.132:2181 --partitions 2 --replication-factor 2 --topic flinktest1 --生产者写数据
bin/kafka-console-producer.sh --broker-list 192.168.87.131:9092 --topic flinktest1 >hell >hello spring flink
enable.auto.commit设置会自动提交偏移量,断开flink程序,kafka生产者继续写数据,启动flink程序会从未读的数据开始消费