Java——单词统计

设计思想：

首先是统计字母出现的频率。我写了一个方法，将文本文件里面的内容读出来存入一个字符串变量里面。之后按照要求将字符串里面的字母全部变为小写。创建两个数组，一个用来存储字符串里面的字母，一个用来存储对应的次数，同时设置一个变量来存放字母出现的总次数。为了方便按照出现频率进行排序，我将字母和对应的次数以及对应的频率存入了一个对象中。之后通过排序输出结果。

其次是统计单词出现的次数。首先同样是把文本文件的内容读入一个字符串中方便处理。为了方便将字符串分割为若干个单词，我将字符串里面的标点符号替换为空格，之后按照空格进行分割，得到若干个单词。后面的操作与之前的很相似，最终将单词与对应的次数存入了对象中，排序输出，并且可以指定输出次数排名前几的那些单词。

源代码：

package findword;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
class zimu{
    char zm;
    double ci;
    String pl;
    zimu()
    {
        zm=0;
        ci=0;
        pl=null;
    }
}
class dc{
    String name;
    int num;
    dc()
    {
        name=null;
        num=-1;
    }
}
public class Main {
    public static void findword(String text) throws IOException{
        @SuppressWarnings("resource")
        Scanner scan=new Scanner(System.in);
        int i=0;
        String[] array = {".",",","?","!"};
        for (int i1 = 0; i1 < array.length; i1++) {
            text = text.replace(array[i1]," ");
        }
        String[] textArray = text.split(" ");
        Map<String, Integer> map = new TreeMap<String, Integer>();
        for (int i1 = 0; i1 < textArray.length; i1++) {
            String key = textArray[i1];
            //转为小写
            String key_l = key.toLowerCase();
            if(!"".equals(key_l)){
                Integer num = map.get(key_l);
                if(num == null || num == 0){
                    map.put(key_l, 1);
                }
                else if(num > 0){
                    map.put(key_l, num+1);
                }
            }
        }
        for(@SuppressWarnings("unused") String e:map.keySet()){
            // System.out.println("单词："+e+" 次数："+map.get(e));
            i++;
        }
        dc [] z=new dc[i];
        for(int m=0;m<=i-1;m++) {
            z[m]=new dc();
        }
        int j=0;
        for(String e:map.keySet()) {
            if(z[j]!=null) {
                z[j].name=e;
                z[j].num=map.get(e);
            }
//            if(z[j]!=null&&!nousejudge(e,"nouse.txt")) {
//                z[j].name=e;
//                z[j].num=map.get(e);
//            }
            j++;
        }
        dc t=new dc();
        for(int m=0;m<=i-1;m++)
        {
            for(int n=m;n<=i-1;n++) {
                if(z[m]!=null&&(z[m].num<z[n].num)) {
                    t=z[m];
                    z[m]=z[n];
                    z[n]=t;
                }
            }
        }
        for(int p=0;p<=i-1;p++) {
            System.out.println("单词："+z[p].name+" 次数："+z[p].num);
        }
        System.out.println("请输入想要输出前几位次数较多的单词：");
        int b=scan.nextInt();
        for(int m=0;m<=b-1;m++) {
            if(z[m]!=null) {
                System.out.println("单词："+z[m].name+" 次数："+z[m].num);
            }
        }
    }
    public static void judgezimu(String str1)
    {
        char zm[]=new char[26];
        int ci[]=new int[26];
        DecimalFormat df = new DecimalFormat("0.00");
        double sum=0;
        int i;
        int flag=0;
        String str=str1.toLowerCase();
        int count;
        char chs[]=str.toCharArray();
        for(char ch='a';ch<='z';ch++)
        {
            
            count=0;//计数器
            for(i=0;i<chs.length;i++)
            {
                if(ch==chs[i])
                    count++;
            }
            if(count!=0) {
                zm[flag]=ch;
                ci[flag]=count;
                sum=sum+count;
                flag++;
            }
        }
        zimu z[]=new zimu[flag];
        for(int m=0;m<flag;m++) {
            z[m]=new zimu();
        }
        for(i=0;i<flag;i++)
        {
            z[i].zm=zm[i];
            z[i].ci=ci[i];
            z[i].pl=df.format(ci[i]/sum);
        }
        zimu t=new zimu();
        for(i=0;i<flag;i++)
        {
            for(int j=0;j<flag;j++)
            {
                if(z[i].ci>z[j].ci)
                {
                    t=z[i];
                    z[i]=z[j];
                    z[j]=t;
                }
            }
        }
        for(i=0;i<flag;i++)
        {
            System.out.println(z[i].zm+"：次数："+z[i].ci+"频率："+z[i].pl);
        }
    }
    public static String readtxt(String txt) throws IOException
    {
        File file = new File(txt);//定义一个file对象，用来初始化FileReader
        FileReader reader = null;
        try {
            reader = new FileReader(file);
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }//定义一个fileReader对象，用来初始化BufferedReader
        BufferedReader bReader = new BufferedReader(reader);//new一个BufferedReader对象，将文件内容读取到缓存
        StringBuilder sb = new StringBuilder();//定义一个字符串缓存，将字符串存放缓存中
        String s = "";
        while ((s =bReader.readLine()) != null) {//逐行读取文件内容，不读取换行符和末尾的空格
        sb.append(s);//将读取的字符串添加换行符后累加p存放在缓存中
        }
        bReader.close();
        String str = sb.toString();
        return str;
    }
    public static boolean nousejudge(String danci,String txt) throws IOException {
        String str=readtxt(txt);
        String[] nouse = str.split(" ");
        for(int i=0;i<nouse.length;i++)
        {
            if(danci.equals(nouse[i]))
            {
                return true;//如果是无用词返回true
            }
        }
        return false;
    }
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub
        String str = readtxt("zimu.txt");
        judgezimu(str);
        String str1 = readtxt("danci.txt");
        findword(str1);
    }
}

Mian.java

结果截图：

总结：

这次的实验中，我把将文本文件内容读取到一个字符串变量中的操作作为一个函数写出来了，在主函数中调用，使得主函数很清晰，以后要尽量按功能写多个方法。同时若两个有关的属性需要绑在一起操作，放在一个对象里面会很方便。

posted on 2019-05-05 07:39 丸za 阅读(234) 评论(0) 收藏举报

刷新页面返回顶部

丸子轮

Java——单词统计

导航

公告