XML文档处理

1）CDATA部分用<![CDATA[和]]>来限定其界限，它们是字符数据的一种特殊形式，可用使用它们来囊括那些含有<、>，&之类字符的字符串，而不必将它们解释为标记例如：<![CDATA[<]]>，另外需要注意的是CDATA部分不能包含字符串]]>。

2）处理指令（processing instruction）专门处理XML文档的应用程序中使用的指令，它们用<?和?>来限定其界限。例如：<?xml version="1.0"?>。

3）注释 使用限定其界限。

4）解析XML文档有两种类型的解析器一种是树形解析器（DOM），另外一种流机制解析器（SAX），首先介绍树形解析器。

包名：org.w3c.dom

DOM解析器的接口已经被W3C标准化，org.w3c.dom包包含了解析时用各种对象以及方法。解析xml文档首先要创建Document对象，如何创建请看例子：

        DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
        DocumentBuilder builder=factory.newDocumentBuilder();
        Document doc=builder.parse(f);

parse方法中的参数可以是文件或者是流，可以通过调用getDocumentElement方法获取xml文档的根元素，调用getTagName()获取元素的名字，getChildNodes()获取节点列表NodeList，调用getTextContent()获取元素的文本值。

例子：

        //根元素
        Element root=doc.getDocumentElement();
        // 返回节点的集合
        NodeList nodeList=root.getChildNodes();
        // 返回集合中子节点数
        int count=nodeList.getLength();
        //获取元素的名字
        String rootName=root.getTagName();
        //获取元素的文本值包括其子元素
        String rootText=root.getTextContent();

可用instanceof判断节点是否是元素，由于Text类型的节点是终结节点，可以用getData获取Text节点中的字符串，对getData返回的值最好调用trim方法，过滤掉空字符。

        for (int i = 0; i < nodeList.getLength(); i++) {
            //获取单个节点
            Node childNode = nodeList.item(i);
            //判断是否是元素，
            if (childNode instanceof Element) {
                Element element = (Element) childNode;
                //childNode只有一个子节点且子节点是Text类型的 例子：<name>zhangsan<name>
                //Text节点是指的"zhangsan"节点
                Text textNode = (Text) element.getFirstChild();
                //调用getData获取其文本值
                String textString = textNode.getData().trim();
            }
            else{
                // TODO 其他处理
            }

5）XPath定位信息

包名：javax.xml.xpath.XPath

获取XML文档中某一节点的值，如果遍历整个DOM树的节点来查找比较麻烦的，但是使用XPath语言可以轻松的得到指定的节点值或属性值。

XPath可以描述XML文档中一个节点的集，例如，XPath：/gridbag/row，描述了根元素gridbag的子元素中所有的row元素，可以用[]操作符来选择特定的元素：/gridbag/row[1]这表示的是第一行（索引号从1开始），使用@操作符可以获得属性值，例如：/gridbag/row[1]/@anchor，获取第一行row元素属性anchor的值。

XPath表达式是如何创建和使用的，接下来看一段代码就会明白了：

        XPathFactory xPathFactory=XPathFactory.newInstance();
        XPath xPath=xPathFactory.newXPath();
        //evaluate 返回的是字符串，所以很适合用来获取文本值
        String value= xPath.evaluate("/gridbag/row[1]", doc);

evaluate第一个参数XPath表达式，第二参数是DOM对象。

6）流机制解析器

包名：javax.xml.stream

StAX解析器是一种“拉解析器（pull parser）”，使用下面的基本循环来迭代所有的事件即可：

        InputStream in=new FileInputStream("G:\\test.xml");
        XMLInputFactory factory=XMLInputFactory.newInstance();
        XMLStreamReader parser=factory.createXMLStreamReader(in);
        while (parser.hasNext()) {
            // event 对应事件的值
            int event=parser.next();
            // 元素 处理
            if (event==XMLStreamConstants.START_ELEMENT) {
　　　　　　　　　String elementNameString=parser.getLocalName();// 元素名称
　　　　　　　　　String elementValueString=parser.getText();// 元素值
// TODO

            }
            // Call parser methods obtain event details            
        }

7）完整示例：

循环读取xml文档属性及节点值并已key-value形式存入hashmap中

package com;

import org.w3c.dom.*;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.IOException;
import java.util.HashMap;

/**
 * 解析xml文件
 * Created by ysp on 2018-09-17.
 */
public class XmlUtils {

    private static final String XML_PATH = "src/main/resources/com/xmlUtils.xml";

    private static HashMap<String, String> hashMap = new HashMap<String, String>(); 

    /**
     * 获取HashMap
     * @return
     */
    public HashMap<String,String>getHashMap() {
        try {
            Document document = getDocument();
            Element root = document.getDocumentElement();
            putElements(root);
        } catch (IOException io) {
            io.printStackTrace();
            System.out.println("IOException:" + io.getMessage());
        } catch (ParserConfigurationException parser) {
            parser.printStackTrace();
            System.out.println("ParserConfigurationException:" + parser.getMessage());
        } catch (SAXException sax) {
            sax.printStackTrace();
            System.out.println("SAXException:" + sax.getMessage());
        } finally {
            return hashMap;
        }
    }

    /**
     * 遍历每个节点并放入hashmap中
     * @param root
     * @return
     * @throws IOException
     * @throws ParserConfigurationException
     * @throws SAXException
     */
    private HashMap<String,String> putElements(Element root)throws IOException,ParserConfigurationException,SAXException {
        NodeList nodeList = root.getChildNodes();
        int nodeLen=nodeList.getLength();
        for (int n = 0; n < nodeLen; n++) {
            Node node = nodeList.item(n);
            if (node instanceof Element) {
                Element element = (Element) node;
                //属性值
                NamedNodeMap namedNodeMap = element.getAttributes();
                int lenAttr = namedNodeMap.getLength();
                for (int i = 0; i < lenAttr; i++) {
                    Node node1 = namedNodeMap.item(i);
                    String k = node1.getNodeName().trim();
                    String v = node1.getNodeValue().trim();
                    if (!k.isEmpty() && !v.isEmpty() && !hashMap.containsKey(k)) {
                        hashMap.put(k, v);
                    }
                }
                if(element.hasChildNodes()){
                    putElements(element);
                }
            }
            else if(node.getNodeType()==Node.TEXT_NODE){
                Text text = (Text) node;
                String v = text.getData().trim();
                String k = text.getParentNode().getNodeName().trim();
                if ((!k.isEmpty()) && (!v.isEmpty()) && (!hashMap.containsKey(k))) {
                    hashMap.put(k, v);
                }
            }else{
                continue;
            }
        }
        return hashMap;
    }

    /**
     * 实例化Document对象
     * @return
     * @throws ParserConfigurationException
     * @throws SAXException
     * @throws IOException
     */
    private Document getDocument()throws ParserConfigurationException,SAXException,IOException {
        try {
            String filePath = this.getClass().getClassLoader().getResource("com/xmlUtils.xml").getFile();

        File file=new File(filePath);
        DocumentBuilder builder=getBuilder();
        Document document = builder.parse(file);
        return document;
        }catch (Exception ex){
            ex.printStackTrace();
        }
        return null;
    }

    /**
     * 实例化DocumentBuilder 对象
     * @return
     * @throws ParserConfigurationException
     */
    private DocumentBuilder getBuilder() throws ParserConfigurationException{
        DocumentBuilderFactory documentBuilderFactory= DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setIgnoringElementContentWhitespace(true);
        documentBuilderFactory.setExpandEntityReferences(false);
        DocumentBuilder builder=documentBuilderFactory.newDocumentBuilder();
        return builder;
    }
}

View Code

posted @ 2017-05-09 19:53 YSP 阅读(1102) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

阿普

XML文档处理

公告