第一次编程作业JAVA

论文查重
| 工程概论 | https://edu.cnblogs.com/campus/jmu/ComputerScience21/homework/13034 |

| ----------------- |--------------- |

| 这个作业要求在哪里| https://edu.cnblogs.com/campus/jmu/ComputerScience21/homework/13034 |

| 论文查重 | 通过代码将文字的比较转化为共有单词的比较 |

import java.util.HashMap;
import java.util.Map;

public class PaperPlagiarismChecker {
    public static void main(String[] args) {
        String paper1 = "This is the content of paper 1.";
        String paper2 = "This is the content of paper 2.";

        double similarity = calculateSimilarity(paper1, paper2);
        System.out.println("Similarity: " + similarity);
    }

    private static double calculateSimilarity(String paper1, String paper2) {
        Map<String, Integer> wordFrequency1 = calculateWordFrequency(paper1);
        Map<String, Integer> wordFrequency2 = calculateWordFrequency(paper2);

        int commonWords = 0;
        int totalWords = 0;

        for (String word : wordFrequency1.keySet()) {
            if (wordFrequency2.containsKey(word)) {
                commonWords += Math.min(wordFrequency1.get(word), wordFrequency2.get(word));
            }
            totalWords += wordFrequency1.get(word);
        }

        double similarity = (double) commonWords / totalWords;
        return similarity;
    }

    private static Map<String, Integer> calculateWordFrequency(String paper) {
        Map<String, Integer> wordFrequency = new HashMap<>();

        String[] words = paper.toLowerCase().split("\\s+");

        for (String word : words) {
            wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
        }

        return wordFrequency;
    }
}

思路

,我计算了两篇论文的相似度。首先,我们将每篇论文的内容转换为小写,并按空格分割成单词。然后,使用一个HashMap来计算每个单词在论文中的频率。接下来,我们比较两篇论文中的单词频率,计算共同单词的数量,并将其除以总单词数,得到相似度。但我不会比较文件之间的重复.....

posted @ 2023-09-20 00:00  svv666  阅读(71)  评论(0)    收藏  举报