序列化框架Avro

1.Avro介绍

  • Avro是Hadoop中的一个子项目
  • Avro是基于二进制传输高性能的中间件【Hbase和Hive的客户端与服务端的数据传输也采用该工具】
  • Avro可以将数据进行序列化,适用远程和本地大批量数据的交互
  • Avro可以支持对定义的数据结构(Schema)进行动态加载,提高性能

2.Avro特点

  • 提供了丰富的数据结构类型,8种基本数据类型以及6种复杂数据类型
  • 快速可压缩的二进制形式
  • 提供容器文件用于持久化数据
  • 远程过程调用RPC框架

入门demo

1.创建maven工程,导入pom依赖

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.blb</groupId>
  <artifactId>Avro</artifactId>
  <version>1.0-SNAPSHOT</version>

  <name>Avro</name>
  <!-- FIXME change it to the project's website -->
  <url>http://www.example.com</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro</artifactId>
      <version>1.8.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.avro/avro-tools -->
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-tools</artifactId>
      <version>1.8.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-maven-plugin</artifactId>
      <version>1.8.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-compiler</artifactId>
      <version>1.8.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro-ipc</artifactId>
      <version>1.8.2</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-maven-plugin</artifactId>
        <version>1.8.2</version>
        <executions>
          <execution>
            <phase>generate-sources</phase>
            <goals>
              <goal>schema</goal>
            </goals>
            <configuration>
              <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
              <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
      <plugins>
        <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
        <plugin>
          <artifactId>maven-clean-plugin</artifactId>
          <version>3.1.0</version>
        </plugin>
        <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.0</version>
        </plugin>
        <plugin>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.22.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-install-plugin</artifactId>
          <version>2.5.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.8.2</version>
        </plugin>
        <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
        <plugin>
          <artifactId>maven-site-plugin</artifactId>
          <version>3.7.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-project-info-reports-plugin</artifactId>
          <version>3.0.0</version>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>
</project>

2.在指定的目录下创建User.avsc文件

{
  "namespace": "com.blb",
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name","type": "string"},
    {"name": "id","type": "int"},
    {"name": "salary","type": "int"},
    {"name": "age","type": "int"},
    {"name": "address","type": "string"}
  ]
}

3.使用avro-maven插件为avsc文件生成Java类

 

4.序列化示例

/**
 * 序列化测试
 */
@Test
public void write(){
    // 初始化User对象
    User user = new User("张三",1,6500,25,"云南");
    User user1 = new User("李四",2,7000,20,"湖北");

    DatumWriter<User> dw = new SpecificDatumWriter<>(User.class);
    DataFileWriter<User> dfw = new DataFileWriter<>(dw);

    //创建底层的文件输出通道
    //schma - 序列化类的模式
    //path - 文件路径
    try {
        dfw.create(user.getSchema(),new File("G://hadoop//Avro//src//test//user.txt"));
        dfw.append(user);
        dfw.append(user1);
        dfw.close();
    } catch (IOException e) {
        System.out.println("找不到指定文件");
    }
}

5.反序列化测试

/**
 * 反序列化测试
 */
@Test
public void read(){
    DatumReader<User> dr = new SpecificDatumReader<>(User.class);
    try {
        DataFileReader dfr = new DataFileReader<User>(new File("G://hadoop//Avro//src//test//user.txt"), dr);
        // 通过迭代器的方式,迭代出对象数据
        while(dfr.hasNext()){
            System.out.println(dfr.next());
        }
    } catch (IOException e) {
        System.out.println("找不到指定文件");
    }

}

 

posted @ 2020-04-13 20:08  itch  阅读(527)  评论(0)    收藏  举报