Python通过lxml库遍历xml通过xpath查询(标签,属性名称,属性值,标签对属性)

xml实例:

版本一:

<?xml version="1.0" encoding="UTF-8"?><country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi"></wulumuqi></xinjiang></provinces></country>

没有空格,换行,的版本

python操作操作实例:

from lxml import etree
class r_xpath_xml(object):
    def __init__(self):
        self.xmetrpa=etree.parse('info.xml') #读取xml数据
        pass
    def xpxm(self):
        xpxlm=self.xmetrpa
        print etree.tostring(xpxlm) #打印xml数据
        root=xpxlm.getroot() #获得该树的树根
        print root.tag,' ',  #打印根标签名
        print root.items() #获得标签属性名称和属性值
        for a in root:  ##遍历根下一集级标签
            print a.tag,a.items(),a.text,' 被打印的类型为: ',type(a)  #打印标签名称,标签属性,标签数据
        for b in a:
            print b.tag,b.items(),b.text#,b
            for c in b:
                print c.tag,c.items(),c.text#,c
        for d in c:
            print d.tag,d.items(),d.test,d
        print xpxlm.xpath('//node()')#.items()#.tag
        print '====================================================================================================='
        xa=xpxlm.xpath('//heilongjiang/*')
        print xa
        for xb in xa:
            print xb.tag,xb.items(),xb.text
        xc=xpxlm.xpath('//xinjiang/*')
        print xc
        for xd in xc:
            print xd.tag,xd.items(),xd.text
if __name__ == '__main__':
    xpx=r_xpath_xml()
    xpx.xpxm()
应用for循环遍历标签层次结构,tag获取标签名,items()通过字典函数获取[('属性名' , '属性值')],text获取标签对之间的数据。tag,items(),text针对的类型为:<type 'lxml.etree._Element'>
打印结果:
<country name="chain"><provinces><heilongjiang name="citys"><haerbin/><daqing/></heilongjiang><guangdong name="citys"><guangzhou/><shenzhen/><huhai/></guangdong><taiwan name="citys"><taibei/><gaoxiong/></taiwan><xinjiang name="citys"><wulumuqi waith="tianqi">&#26228;</wulumuqi></xinjiang></provinces></country>
country   [('name', 'chain')]
provinces [] None  被打印的类型为:  <type 'lxml.etree._Element'>
heilongjiang [('name', 'citys')] None
haerbin [] None
daqing [] None
guangdong [('name', 'citys')] None
guangzhou [] None
shenzhen [] None
huhai [] None
taiwan [('name', 'citys')] None
taibei [] None
gaoxiong [] None
xinjiang [('name', 'citys')] None
wulumuqi [('waith', 'tianqi')] 晴
[<Element country at 0x2d47b20>, <Element provinces at 0x2d47990>, <Element heilongjiang at 0x2d479b8>, <Element haerbin at 0x2d47558>, <Element daqing at 0x2d47328>, <Element guangdong at 0x2d47300>, <Element guangzhou at 0x2d476e8>, <Element shenzhen at 0x2d47530>, <Element huhai at 0x2d472d8>, <Element taiwan at 0x2d47260>, <Element taibei at 0x2d47238>, <Element gaoxiong at 0x2d47080>, <Element xinjiang at 0x2d47710>, <Element wulumuqi at 0x2d47968>, u'\u6674']
=====================================================================================================
[<Element haerbin at 0x2d479b8>, <Element daqing at 0x2d47148>]
haerbin [] None
daqing [] None
[<Element wulumuqi at 0x2d47968>] 类型为: <type 'list'>
wulumuqi [('waith', 'tianqi')] 晴

xml实例:

版本二:

<?xml version="1.0" encoding="UTF-8"?>
<country name="chain">
    <provinces>
        <city:table xmlns:city="http://www.w3school.com.cn/furniture">
        <heilongjiang name="citys"><city:haerbin/><city:daqing/></heilongjiang>
        <guangdong name="citys"><city:guangzhou/><city:shenzhen/><city:zhuhai/></guangdong>
        <taiwan name="citys"><city:taibei/><city:gaoxiong/></taiwan>
        <xinjiang name="citys"><city:wulumuqi></city:wulumuqi></xinjiang>
        </city:table>    
    </provinces>
</country>

实例:
print xpxlm.xpath('//node()')

打印结果:
空格回车字符,命名空间。
[<Element country at 0x2e79b20>, '\n    ', <Element provinces at 0x2e79990>, '\n        ', <Element {http://www.w3school.com.cn/furniture}table at 0x2e79710>, '\n        ', <Element heilongjiang at 0x2e799b8>, <Element {http://www.w3school.com.cn/furniture}haerbin at 0x2e79328>, <Element {http://www.w3school.com.cn/furniture}daqing at 0x2e79968>, '\n        ', <Element guangdong at 0x2e79530>, <Element {http://www.w3school.com.cn/furniture}guangzhou at 0x2e79300>, <Element {http://www.w3school.com.cn/furniture}shenzhen at 0x2e792d8>, <Element {http://www.w3school.com.cn/furniture}zhuhai at 0x2e79260>, '\n        ', <Element taiwan at 0x2e79238>, <Element {http://www.w3school.com.cn/furniture}taibei at 0x2e79080>, <Element {http://www.w3school.com.cn/furniture}gaoxiong at 0x2e79058>, '\n        ', <Element xinjiang at 0x2e796e8>, <Element {http://www.w3school.com.cn/furniture}wulumuqi at 0x2e79558>, u'\u6674', '\n        ', '    \n    ', '\n']

去掉空格:

        xp=xpxlm.xpath('//node()')
        print xp,           #.items()#.tag
        for i in xp:
            if '' in i or '\n' in i:
                continue
            else: 
                print i.tag

通过判断去除空格换行符号

输出结果:

provinces
{city}table
heilongjiang
{city}haerbin
{city}daqing
guangdong
{city}guangzhou
{city}shenzhen
{city}zhuhai
taiwan
{city}taibei
{city}gaoxiong
xinjiang
{city}wulumuqi

 

 

 





posted @ 2017-09-09 13:19  LLSix  阅读(2863)  评论(0)    收藏  举报