软工作业2:个人项目【论文查重】

作业概述

这个作业属于哪个课程 软件工程
这个作业要求在哪里 个人项目
这个作业的目标 完成论文查重项目

Github

https://github.com/Who-i-m/3121004978/tree/第二次作业_论文查重

PSP表

PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
Planning 计划 75 60
Estimate 估计这个任务需要多少时间 10 10
Development 开发 500 550
Analysis 需求分析 400 300
Design Spec 生成设计文档 20 20
Design Review 设计复审 10 10
Coding Standard 代码规范 30 35
Design 具体设计 60 80
Coding 具体编码 400 350
Code Review 代码复审 100 50
Test 测试 100 80
Reporting 报告 100 100
Test Report 测试报告 30 30
Size Measurement 计算工作量 50 50
Postmortem & Process Improvement Plan 事后总结,并提出过程改进计划 60 60
合计 1945 1785

接口的设计与实现过程

1.需求分析

设计一个论文查重算法,给出一个原文文件和一个在这份原文上经过了增删改的抄袭版论文的文件,在答案文件中输出其重复率。

要求输入输出采用文件输入输出,规范如下:

从命令行参数给出:论文原文的文件的绝对路径。

从命令行参数给出:抄袭版论文的文件的绝对路径。

从命令行参数给出:输出的答案文件的绝对路径。

注意:答案文件中输出的答案为浮点型,精确到小数点后两位

2.程序设计流程

image

3.开发环境

开发语言:Java 8
开发工具:Intellij IDEA 2023

4.项目依赖

项目构建工具:maven
单元测试依赖:Junit-4.12

<dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.12</version>
      <scope>test</scope>
</dependency>

性能分析工具:JProfiler 11
依赖的外部 jar 包:汉语言处理包

<dependency>
      <groupId>com.hankcs</groupId>
      <artifactId>hanlp</artifactId>
      <version>portable-1.5.4</version>
</dependency>

5.接口实现

5.1、读写txt文件模块

类名:IOUtils
方法:
1、read :读取txt文件内容,返回String;
2、write :将数字写入到txt文件中。

5.2、计算SimHash模块

类名:SimHashUtils
方法:
1、getHash :传入String,利用MD5计算出它的hash值,并以字符串形式输出;
2、getSimHash :传入String,计算出它的simHash值,并以字符串形式输出。

5.3、计算海明距离模块

类名:HammingUtils
方法:
1、getHammingDistance :输入两个simHash值,计算它们的海明距离;
2、getSimilarity :输入两个simHash值,输出它们的相似度;

6.程序测试

6.1性能分析

内存状态
image

方法调用情况
image

7.单元测试

7.1IOUtilsTest

点击查看代码
package com.genhang.utils;

import org.junit.Test;

public class IOUtilsTest {
    @Test
    public void readTest() {
        // 路径存在,正常读取
        String str = IOUtils.read("src/main/resources/orig.txt");
        String[] strings = str.split(" ");
        for (String string : strings) {
            System.out.println(string);
        }
    }

    @Test
    public void writeTest() {
        // 路径存在,正常写入
        double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
        for (int i = 0; i < elem.length; i++) {
            IOUtils.write(elem[i], "src/main/resources/ans.txt");
        }
    }

    @Test
    public void readFailTest() {
        // 路径不存在,读取失败
        String str = IOUtils.read("src/main/resources/none.txt");
    }

    @Test
    public void writeFailTest() {
        // 路径错误,写入失败
        double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
        for (int i = 0; i < elem.length; i++) {
            IOUtils.write(elem[i], "src/main/resources/ans.txt");
        }
    }
}

image

7.2SimHashUtilsTest

点击查看代码
package com.genhang.utils;

import org.junit.Test;

public class SimHashUtilsTest {
    @Test
    public void getHashTest(){
        String[] strings = {"余华", "是", "一个", "有趣", "的", "作家"};
        for (String string : strings) {
            String stringHash = SimHashUtils.getHash(string);
            System.out.println(stringHash.length());
            System.out.println(stringHash);
        }
    }

    @Test
    public void getSimHashTest(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_add.txt");
        System.out.println(SimHashUtils.getSimHash(str0));
        System.out.println(SimHashUtils.getSimHash(str1));
    }
}

image

7.3HammingUtilsTest

点击查看代码
package com.genhang.utils;

import org.junit.Test;

@SuppressWarnings("all")
public class HammingUtilsTest {
    @Test
    public void getHammingDistanceTest() {
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_add.txt");
        int distance = HammingUtils.getHammingDistance(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        System.out.println("海明距离:" + distance);
        System.out.println("相似度: " + (100 - distance * 100 / 128) + "%");
    }

    @Test
    public void getHammingDistanceFailTest() {
        // 测试str0.length()!=str1.length()的情况
        String str0 = "10101010";
        String str1 = "1010101";
        System.out.println(HammingUtils.getHammingDistance(str0, str1));
    }

    @Test
    public void getSimilarityTest() {
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_add.txt");
        int distance = HammingUtils.getHammingDistance(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        double similarity = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        System.out.println("str0和str1的汉明距离: " + distance);
        System.out.println("str0和str1的相似度:" + similarity);
    }
}

image

7.4MainTest

点击查看代码
package com.genhang.main;

import com.genhang.utils.HammingUtils;
import com.genhang.utils.IOUtils;
import com.genhang.utils.SimHashUtils;
import org.junit.Test;

public class MainTest {
    @Test
    public void origAndAllTest(){
        String[] str = new String[6];
        str[0] = IOUtils.read("src/main/resources/orig.txt");
        str[1] = IOUtils.read("src/main/resources/orig_0.8_add.txt");
        str[2] = IOUtils.read("src/main/resources/orig_0.8_del.txt");
        str[3] = IOUtils.read("src/main/resources/orig_0.8_dis_1.txt");
        str[4] = IOUtils.read("src/main/resources/orig_0.8_dis_10.txt");
        str[5] = IOUtils.read("src/main/resources/orig_0.8_dis_15.txt");
        String ansFileName = "src/main/resources/ansAll.txt";
        for(int i = 0; i <= 5; i++){
            double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str[0]), SimHashUtils.getSimHash(str[i]));
            IOUtils.write(ans, ansFileName);
        }
    }

    @Test
    public void origAndOrigTest(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig.txt");
        String ansFileName = "src/main/resources/ansOrigAndOrigTest.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans, ansFileName);
    }

    @Test
    public void origAndAddTest(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_add.txt");
        String ansFileName = "src/main/resources/ansOrigAndAddTest.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans, ansFileName);
    }

    @Test
    public void origAndDelTest(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_del.txt");
        String ansFileName = "src/main/resources/ansOrigAndDelTest.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans, ansFileName);
    }

    @Test
    public void origAndDis1Test(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_dis_1.txt");
        String ansFileName = "src/main/resources/ansOrigAndDis1Test.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans, ansFileName);
    }

    @Test
    public void origAndDis10Test(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_dis_10.txt");
        String ansFileName = "src/main/resources/ansOrigAndDis10Test.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans, ansFileName);
    }

    @Test
    public void origAndDis15Test(){
        String str0 = IOUtils.read("src/main/resources/orig.txt");
        String str1 = IOUtils.read("src/main/resources/orig_0.8_dis_15.txt");
        String ansFileName = "src/main/resources/ansOrigAndDis15Test.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        IOUtils.write(ans,ansFileName);
    }
}


image

7.5异常测试ShortStringExceptionTest

image

posted @ 2023-09-17 15:23  一梦见浮生  阅读(22)  评论(0编辑  收藏  举报