Java XML SAX 解析注意

版权声明:

欢迎转载,但请保留文章原始出处

作者:GavinCT

出处:http://www.cnblogs.com/ct2011/p/4002738.html

什么时候可以把解析值赋给对象

一般从网上看到的sax解析,都是在Handler中的characters方法进行对象数据的赋值。
示例代码如下:

private TransportFile parseXML(String xml) {
	SAXParserFactory saxfac = SAXParserFactory.newInstance();
	try {
	    SAXParser saxparser = saxfac.newSAXParser();
	    InputStream is = new ByteArrayInputStream(xml.getBytes()); 
	    MySAXHandler handler = new MySAXHandler();
	    saxparser.parse(is, handler);
	    return handler.getData();
	} catch (ParserConfigurationException e) {
	    e.printStackTrace();
	} catch (SAXException e) {
	    e.printStackTrace();
	} catch (FileNotFoundException e) {
	    e.printStackTrace();
	} catch (IOException e) {
	    e.printStackTrace();
	}
	return null;
}
private class MySAXHandler extends DefaultHandler{
	String currentTagName = "";
	TransportFile mData = null ;
	@Override
	public void startElement(String uri, String localName, String qName,
	        Attributes attributes) throws SAXException {
	    currentTagName = qName ;
	    if("file".equals(qName)){
	        mData = new TransportFile();
	    }
	}

	@Override
	public void characters(char[] ch, int start, int length)
	        throws SAXException {
	    String str = new String(ch,start,length);
	    if("guid".equals(currentTagName)){
	        mData.guid = str;
	    }else if("name".equals(currentTagName)){
	        mData.name = str;
	    }else if("type".equals(currentTagName)){
	        mData.type = str;
	    }else if("length".equals(currentTagName)){
	        mData.length = Long.parseLong(str);
	    }else if("index".equals(currentTagName)){
	        mData.index = Integer.parseInt(str);
	    }else if("count".equals(currentTagName)){
	        mData.count = Integer.parseInt(str);
	    }else if("data".equals(currentTagName)){
	        mData.data = Base64.decode(str);
	    }
	}

	@Override
	public void endElement(String uri, String localName, String qName)
	        throws SAXException {
	    currentTagName = "";
	}

	public TransportFile getData(){
	    return mData ;
	}
}

普通的使用场景中上述代码没有问题,但是当xml中某一标签的内容很长时,就会引发上述代码的bug。
实践发现sax解析每次也就解析1k左右的数据,超出部分其实是要分段多次解析的。
所以问题来了,如果在characters方法中解析,那么其他几段的数据仍然会不断覆盖最终返回对象中的数据,导致数据丢失。
因此,对于赋值给最终传回对象的数据,在characters阶段只能不断拼接,解析必须在endElement时才可以完成。否则当数据内容比较大的时候,网上代码中的bug就会凸显出来。
顺便贴上我的代码:

    private class MySAXHandler extends DefaultHandler{
        String currentTagName = "";
        TransportFile mData = null ;
        @Override
        public void startElement(String uri, String localName, String qName,
                Attributes attributes) throws SAXException {
            currentTagName = qName ;
            mStringBuilder = new StringBuilder();
            if("file".equals(qName)){
                mData = new TransportFile();
            }
        }
        private StringBuilder mStringBuilder;
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            mStringBuilder.append(ch, start, length);
        }
    
        @Override
        public void endElement(String uri, String localName, String qName)
                throws SAXException {
            String str = mStringBuilder.toString();
            if("guid".equals(currentTagName)){
                mData.guid = str;
            }else if("name".equals(currentTagName)){
                mData.name = str;
            }else if("type".equals(currentTagName)){
                mData.type = str;
            }else if("length".equals(currentTagName)){
                mData.length = Long.parseLong(str);
            }else if("index".equals(currentTagName)){
                mData.index = Integer.parseInt(str);
            }else if("count".equals(currentTagName)){
                mData.count = Integer.parseInt(str);
            }else if("data".equals(currentTagName)){
                mData.data = Base64.decode(str);
            }
            currentTagName = "";
        }
    
        public TransportFile getData(){
            return mData ;
        }
    } 

characters方法参数注意

ch是当前解析到的字符数组,并不是精确的标签内的内容。
下面是解析第一个标签时characters中 ch 、 start、length输出:

===========characters ch: <?xml version='1.0' encoding='utf-8' standalone='yes' ?><file><guid>678c6f92-d617-40af-bb87-a80c3b2be91f</guid><name>0CAQLTZGO.jpg</name><type>image</type><length>71374</length><index>0</index><count>1</count><data>/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYaKCgoKCgoK.....
===========characters start:31
===========characters length:36

真正当前需要的数据是ch数组从start开始的length个字符。

posted @ 2014-10-01 08:48  GavinCT  阅读(1731)  评论(0编辑  收藏  举报