【开发心得】dom4j反序列化0xxx问题

问题:

1. dom4j乱码问题 1字节的UTF-8序列的字节1无效
2. org.xml.sax.SAXParseException:无效的XML字符(Unicode:0x1b)是在CDATA部分
3. dom4j xml read Unicode: 0x1b

解决方案:

保留合法字符

// 保留合法字符
public String stripNonValidXMLCharacters(String in) {
    StringBuffer out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.
 
    if (in == null || ("".equals(in))) return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
        if ((current == 0x9) ||
            (current == 0xA) ||
            (current == 0xD) ||
            ((current >= 0x20) && (current <= 0xD7FF)) ||
            ((current >= 0xE000) && (current <= 0xFFFD)) ||
            ((current >= 0x10000) && (current <= 0x10FFFF)))
            out.append(current);
    }
    return out.toString();
}    

正则去除异常字符

//过滤非法字符
//注意,以下正则表达式过滤不全面,过滤范围为
//  0x00 - 0x08
//  0x0b - 0x0c
//  0x0e - 0x1f
 
public static String stripNonValidXMLChars(String str) {
  if (str == null || "".equals(str)) {
    return str;
  }
  return str.replaceAll("[\\x00-\\x08\\x0b-\\x0c\\x0e-\\x1f]", "");
}

上述两者是传入string,也可以直接传入File对象,参考如下

    private static ByteArrayInputStream filterXmlSpecialChars(File file) {
        try (InputStream input = new FileInputStream(file);
             ByteArrayOutputStream bytestream = new ByteArrayOutputStream();
        ) {

            int ch;
            while ((ch = input.read()) != -1) {
                bytestream.write(ch);
            }
            byte[] data = bytestream.toByteArray();
            List<Byte> newData = new ArrayList<>();
            for (int i = 0; i < data.length; i++) {
                byte curr = data[i];
                if ((curr == 0x9) ||
                        (curr == 0xA) ||
                        (curr == 0xD) ||
                        ((curr >= 0x20) && (curr <= 0xD7FF)) ||
                        ((curr >= 0xE000) && (curr <= 0xFFFD)) ||
                        ((curr >= 0x10000) && (curr <= 0x10FFFF)))
                    newData.add(curr);
            }
            byte[] result = new byte[newData.size()];
            for (int i = 0; i < newData.size(); i++) {
                result[i] = newData.get(i);
            }
            return new ByteArrayInputStream(result);
        } catch (FileNotFoundException e) {
            log.error(e.getMessage(), e);
        } catch (IOException e) {
            log.error(e.getMessage(), e);
        } catch (Exception e) {
            log.error(e.getMessage(), e);
        }
        return null;
    }

参考文章: https://blog.csdn.net/yan3013216087/article/details/81450658

posted @ 2023-02-08 18:17  虹梦未来  阅读(30)  评论(0)    收藏  举报  来源