Java如何识别并读取不同编码的文本文件(转)
相信大部分人都知道,txt文件有四种编码格式,"GBK", "UTF-8", "Unicode", "UTF-16BE",每一种编码格式的区分在于写入文件头的信息不同.为了避免读取乱码的现象,我们应该在读取文本之前先读取文件头信息,以便做出正确的读取编码方式.下面给出方法.
- /**
- * 判断文件的编码格式
- * @param fileName :file
- * @return 文件编码格式
- * @throws Exception
- */
- public static String codeString(String fileName) throws Exception{
- BufferedInputStream bin = new BufferedInputStream(
- new FileInputStream(fileName));
- int p = (bin.read() << 8) + bin.read();
- String code = null;
- switch (p) {
- case 0xefbb:
- code = "UTF-8";
- break;
- case 0xfffe:
- code = "Unicode";
- break;
- case 0xfeff:
- code = "UTF-16BE";
- break;
- default:
- code = "GBK";
- }
- return code;
- }
然后,以字符流的方式读取文本
- FileInputStream fInputStream = new FileInputStream(file);
- //code为上面方法里返回的编码方式
- InputStreamReader inputStreamReader = new InputStreamReader(fInputStream, code);
- BufferedReader in = new BufferedReader(inputStreamReader);
- String strTmp = "";
- //按行读取
- while (( strTmp = in.readLine()) != null) {
- sBuffer.append(strTmp + "/n");
- }
- return sBuffer.toString();