python 解析xml 文件: Element Tree 方式

环境

python:3.4.4

准备xml文件

首先新建一个xml文件,countries.xml。内容是在python官网上看到的。

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

准备python文件

新建一个test_ET.py,用来解析xml文件。

#!/usr/bin/python
# -*- coding=utf-8 -*-

import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element

tree = ET.parse('countries.xml')

nodes = tree.findall("country")

for node in nodes:
#search node & attribute & text
    print ("*****Country*****")
    if node.attrib["name"]:
        print ("Name:",node.attrib["name"])

    rank=node.find("rank")
    print ("Rank:",rank.text)

    year=node.find("year")
    print ("Year:",year.text)

    gdppc=node.find("gdppc")
    print ("Gdppc:",gdppc.text)

    neighbors=node.findall("neighbor")
    for neighbor in neighbors:
        print ("Neighbor:",neighbor.attrib["name"])

#add node
    rank=node.find("rank")
    element=Element("rank_next", {"name":"Rank","create":"20151231"})
    element.text="5"
    rank.append(element)

#delete node
    year=node.find("year")
    node.remove(year)

#add node attribute
    node.set("force","NewForce")
#update node attribute
    node.set("name","NewNode")
#delete node attribute
    neighbors=node.findall("neighbor")
    for neighbor in neighbors:
        del neighbor.attrib["direction"]

#add node text
    neighbors=node.findall("neighbor")
    for neighbor in neighbors:
        neighbor.text = "Hello,Neighbor"
#update node text
    gdppc=node.find("gdppc")
    gdppc.text = "11111"
#delete node text
    rank=node.find("rank")
    rank.text = ""  

tree.write("./out.xml", encoding="utf-8",xml_declaration=True)

执行结果

控制台:

>python test_ET.py
*****Country*****
Name: Liechtenstein
Rank: 1
Year: 2008
Gdppc: 141100
Neighbor: Austria
Neighbor: Switzerland
*****Country*****
Name: Singapore
Rank: 4
Year: 2011
Gdppc: 59900
Neighbor: Malaysia
*****Country*****
Name: Panama
Rank: 68
Year: 2011
Gdppc: 13600
Neighbor: Costa Rica
Neighbor: Colombia

out.xml文件:

<?xml version='1.0' encoding='utf-8'?>
<data>
    <country force="NewForce" name="NewNode">
        <rank><rank_next create="20151231" name="Rank">5</rank_next></rank>
        <gdppc>11111</gdppc>
        <neighbor name="Austria">Hello,Neighbor</neighbor>
        <neighbor name="Switzerland">Hello,Neighbor</neighbor>
    </country>
    <country force="NewForce" name="NewNode">
        <rank><rank_next create="20151231" name="Rank">5</rank_next></rank>
        <gdppc>11111</gdppc>
        <neighbor name="Malaysia">Hello,Neighbor</neighbor>
    </country>
    <country force="NewForce" name="NewNode">
        <rank><rank_next create="20151231" name="Rank">5</rank_next></rank>
        <gdppc>11111</gdppc>
        <neighbor name="Costa Rica">Hello,Neighbor</neighbor>
        <neighbor name="Colombia">Hello,Neighbor</neighbor>
    </country>
</data>

备注

具有方便友好的API。代码可用性好,速度快,消耗内存少。

最适合用来处理XML文档。

参考:https://docs.python.org/2/library/xml.etree.elementtree.html

tree = ET.parse('countries.xml')

解析countries.xml并返回一个树。

tree.write("./out2.xml", encoding="utf-8",xml_declaration=True)

将元素树写入到文档,采用 “utf-8”编码,具有xml声明。

write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml")
Writes the element tree to a file, as XML. file is a file name, or a file object opened for writing. encoding [1] is the output encoding (default is US-ASCII). xml_declaration controls if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 (default is None). default_namespace sets the default XML namespace (for “xmlns”). method is either "xml", "html" or "text" (default is "xml"). Returns an encoded string.
posted @ 2015-12-31 16:52  微微微笑  阅读(1422)  评论(0编辑  收藏  举报