序列化框架Avro
1.Avro介绍
- Avro是Hadoop中的一个子项目
- Avro是基于二进制传输高性能的中间件【Hbase和Hive的客户端与服务端的数据传输也采用该工具】
- Avro可以将数据进行序列化,适用远程和本地大批量数据的交互
- Avro可以支持对定义的数据结构(Schema)进行动态加载,提高性能
2.Avro特点
- 提供了丰富的数据结构类型,8种基本数据类型以及6种复杂数据类型
- 快速可压缩的二进制形式
- 提供容器文件用于持久化数据
- 远程过程调用RPC框架
入门demo
1.创建maven工程,导入pom依赖
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.blb</groupId> <artifactId>Avro</artifactId> <version>1.0-SNAPSHOT</version> <name>Avro</name> <!-- FIXME change it to the project's website --> <url>http://www.example.com</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.avro/avro --> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.8.2</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.avro/avro-tools --> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-tools</artifactId> <version>1.8.2</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.8.2</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-compiler</artifactId> <version>1.8.2</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro-ipc</artifactId> <version>1.8.2</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.8.2</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) --> <plugins> <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle --> <plugin> <artifactId>maven-clean-plugin</artifactId> <version>3.1.0</version> </plugin> <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging --> <plugin> <artifactId>maven-resources-plugin</artifactId> <version>3.0.2</version> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> </plugin> <plugin> <artifactId>maven-surefire-plugin</artifactId> <version>2.22.1</version> </plugin> <plugin> <artifactId>maven-jar-plugin</artifactId> <version>3.0.2</version> </plugin> <plugin> <artifactId>maven-install-plugin</artifactId> <version>2.5.2</version> </plugin> <plugin> <artifactId>maven-deploy-plugin</artifactId> <version>2.8.2</version> </plugin> <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle --> <plugin> <artifactId>maven-site-plugin</artifactId> <version>3.7.1</version> </plugin> <plugin> <artifactId>maven-project-info-reports-plugin</artifactId> <version>3.0.0</version> </plugin> </plugins> </pluginManagement> </build> </project>
2.在指定的目录下创建User.avsc文件
{ "namespace": "com.blb", "type": "record", "name": "User", "fields": [ {"name": "name","type": "string"}, {"name": "id","type": "int"}, {"name": "salary","type": "int"}, {"name": "age","type": "int"}, {"name": "address","type": "string"} ] }
3.使用avro-maven插件为avsc文件生成Java类
4.序列化示例
/** * 序列化测试 */ @Test public void write(){ // 初始化User对象 User user = new User("张三",1,6500,25,"云南"); User user1 = new User("李四",2,7000,20,"湖北"); DatumWriter<User> dw = new SpecificDatumWriter<>(User.class); DataFileWriter<User> dfw = new DataFileWriter<>(dw); //创建底层的文件输出通道 //schma - 序列化类的模式 //path - 文件路径 try { dfw.create(user.getSchema(),new File("G://hadoop//Avro//src//test//user.txt")); dfw.append(user); dfw.append(user1); dfw.close(); } catch (IOException e) { System.out.println("找不到指定文件"); } }
5.反序列化测试
/** * 反序列化测试 */ @Test public void read(){ DatumReader<User> dr = new SpecificDatumReader<>(User.class); try { DataFileReader dfr = new DataFileReader<User>(new File("G://hadoop//Avro//src//test//user.txt"), dr); // 通过迭代器的方式,迭代出对象数据 while(dfr.hasNext()){ System.out.println(dfr.next()); } } catch (IOException e) { System.out.println("找不到指定文件"); } }