给定一个仅包含英文字母和空格的字符串,请实现一个函数找出该字符串中出现次数最多的10个字母(不区分大小写)。
思路:
(1)从文件中读取内容 转化成字符数组
(2)检查内容是否符合要求
(3)计算字母出现频率
(4)找出出现频率最高的前N个字母
(5)打印这些字母
但是要注意一些异常情况的判断:
(1)内容是否只包含英文字母和空格?
(2)如果文件中没有包含字母,该如何处理?
(3)如果给出的字符里面 只有三个字母如"abc",但是确要找出次数出现最多的前10个字母,该如何处理?
具体Java代码如下:
public class ComputeLetterFrequency { public static void main(String[] args) { char[] charArray = null; String fileName = "C:/article.txt"; try { // 从文件中读取内容转化成字符数组 charArray = readCharArrayFromFile(fileName); // 检查内容是否符合要求 checkContentValid(charArray); // 计算字母出现频率 int[] frequency = computeFrequency(charArray); // 获取频率出现最高的前N个字母 int[] tops = getTopLettersByNum(frequency, 3); // 打印这些字母 System.out.println("频率出现最高的字母有:"); for (int i = 0; i < tops.length; i++) { char c = (char) ('a' + tops[i]); System.out.print(c + "\t"); } } catch (FileNotFoundException e) { // TODO Auto-generated catch block System.out.println(fileName + " not found,so program exits!"); System.exit(-1); } catch (RuntimeException e) { e.printStackTrace(); } } public static void checkContentValid(char[] charArr) { for (int i = 0; i < charArr.length; i++) { if ((charArr[i] >= 'a' && charArr[i] <= 'z') || (charArr[i] >= 'A' && charArr[i] <= 'Z') || (charArr[i] == ' ')) { continue; } else { throw new RuntimeException( "The contents is invalid,and the contents should only contain english letter and blankspace"); } } } public static int getValidLettersLength(int[] frequency) { int validLength = 0; for (int i = 0; i < frequency.length; i++) { if (frequency[i] != 0) { validLength++; } } if (validLength == 0) { throw new RuntimeException("No valid letter in file"); } return validLength; } public static int[] getTopLettersByNum(int[] frequency, int num) { // 求出字母出现过一次(或以上)的实际有效字母个数 int validLength = getValidLettersLength(frequency); int[] tops = null; // 实际有效字母个数如果小于num,直接根据实际有效字母数创建数组 if (validLength < num) { tops = new int[validLength]; } else { tops = new int[num]; } for (int i = 0; i < tops.length; i++) { int max = -1; int maxIndex = 0; for (int j = 0; j < frequency.length; j++) { if (frequency[j] > max) { max = frequency[j]; maxIndex = j; } } tops[i] = maxIndex; frequency[maxIndex] = -1; } return tops; } public static int[] computeFrequency(char[] charArr) { int[] frequency = new int[26]; for (int i = 0; i < charArr.length; i++) { if (charArr[i] >= 'a' && charArr[i] <= 'z') { frequency[charArr[i] - 'a']++; } if (charArr[i] >= 'A' && charArr[i] <= 'Z') { frequency[charArr[i] - 'A']++; } } return frequency; } public static char[] readCharArrayFromFile(String fileName) throws FileNotFoundException { File file = new File(fileName); Scanner scanner = new Scanner(file); StringBuilder sb = new StringBuilder(); while (scanner.hasNextLine()) { sb.append(scanner.nextLine()); } return sb.toString().toCharArray(); } }
作者:
Chris Wang
出处:
http://chriswang.cnblogs.com/
文章版权归本人所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。