软工作业2:个人项目
作业概述
| 这个作业属于哪个课程 | 软件工程 |
|---|---|
| 这个作业要求在哪里 | 个人项目 |
| 这个作业的目标 | 完成个人项目:设计一个论文查重算法 |
github链接
PSP表格
| PSP2.1 | Personal Software Process Stages | 预估耗时(分钟) | 实际耗时(分钟) |
|---|---|---|---|
| Planning | 计划 | 20 | 20 |
| · Estimate | · 估计这个任务需要多少时间 | 340 | 360 |
| Development | 开发 | 300 | 320 |
| · Analysis | · 需求分析 (包括学习新技术) | 100 | 100 |
| · Design Spec | · 生成设计文档 | 30 | 25 |
| · Design Review | · 设计复审 | 20 | 15 |
| · Coding Standard | · 代码规范 (为目前的开发制定合适的规范) | 30 | 40 |
| · Design | · 具体设计 | 30 | 40 |
| · Coding | · 具体编码 | 30 | 30 |
| · Code Review | · 代码复审 | 20 | 15 |
| · Test | · 测试(自我测试,修改代码,提交修改) | 40 | 55 |
| Reporting | 报告 | 40 | 40 |
| · Test Repor | · 测试报告 | 15 | 15 |
| · Size Measurement | · 计算工作量 | 10 | 10 |
| · Postmortem & Process Improvement Plan | · 事后总结, 并提出过程改进计划 | 15 | 15 |
| · 合计 | 360 | 380 |
模块接口设计与实现

- TxtIOUtil类:将传入的文件转换为String,也可将String写出到指定的文件中
- SimHashUtils类:传入String,计算出它的hash值,并以字符串形式输出
- HammingUtils类:输入两个simHash值,计算它们的海明距离,并计算输出相似度
- main类:程序主入口,通过传递命令行参数的方式提供文件的位置,调用Util包下的类输出结果
- MainTest类:单元测试类
模块接口部分性能分析


内存上的占用主要在浮点数、数组集合的创建,调用的是Util包中的方法,无需改进。
单元测试展示
- Test1至Test7为原文与仿文的比较:
@Test
public void Test1(){
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str2 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
String str3 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
String str4 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
String str5 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
String str6 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
String ansFileName = "src/test/resources/test/test1.txt";
double ans1 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str1));
double ans2 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str2));
double ans3 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str3));
double ans4 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str4));
double ans5 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str5));
double ans6 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str6));
TxtIOUtil.writeTxt(ans1, ansFileName);
TxtIOUtil.writeTxt(ans2, ansFileName);
TxtIOUtil.writeTxt(ans3, ansFileName);
TxtIOUtil.writeTxt(ans4, ansFileName);
TxtIOUtil.writeTxt(ans5, ansFileName);
TxtIOUtil.writeTxt(ans6, ansFileName);
}
@Test
public void Test2(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String ansFileName = "src/test/resources/test/test2.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test3(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
String ansFileName = "src/test/resources/test/test3.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test4(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
String ansFileName = "src/test/resources/test/test4.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test5(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
String ansFileName = "src/test/resources/test/test5.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test6(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
String ansFileName = "src/test/resources/test/test6.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test7(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
String ansFileName = "src/test/resources/test/test7.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans,ansFileName);
}
- 测试覆盖率:
![]()
- 测试耗时时间:
![]()
异常处理说明
- 文件不存在的异常测试:
/**
* 文件不存在异常测试
* @throws Exception
*/
@Test
public void Test8() throws Exception {
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_.txt");
String ansFileName = "src/test/resources/test/test8.txt";
if(str0 == "" || str1 == ""){
throw new Exception("文件不存在");
}
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
- 文件为空的异常测试:
/**
* 文件为空异常测试
*/
@Test
public void Test9(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.1.txt");
String ansFileName = "src/test/resources/test/test9.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
- 文件字数太少的异常测试:
/**
* 文件字数太少异常测试
*/
@Test
public void Test10(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.2.txt");
String ansFileName = "src/test/resources/test/test10.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
- 在SimHashUtils的getSimHash方法中加入代码来处理异常情况:
try {
if (str.length() == 0) throw new Exception("文件为空");
if (str.length() < 200) throw new ShortStringException("文本过短,难以判断!");
} catch (Exception e) {
e.printStackTrace();
return null;
}



浙公网安备 33010602011771号