python 解析xml文件

https://www.cnblogs.com/handsome1013/p/10058838.html
ET.Parser 用法
https://www.cnblogs.com/yezuhui/p/6853323.html

https://blog.csdn.net/gz153016/article/details/90216737

 Python3 xml解析模块xml.etree.ElementTree简介

https://blog.csdn.net/asty9000/article/details/93627226?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

删除重复xml节点

https://blog.csdn.net/u014203484/article/details/74332815

import xml.etree.ElementTree as ET----------导入xml模块

root = ET.parse('GHO.xml')------------------分析指定xml文件
tree = root.getroot()-----------------------获取第一标签
data = tree.find('Data')--------------------查找第一标签中'Data'标签
for obs in data:----------------------------历遍'Data'中的所有标签
for item in obs:------------------------历遍'Data'中的'obs'标签下的所有标签
key = item.attrib()-----------------提取key值参数
print(list(key))--------------------输出key值 

如何读取属性及节点内容。

怎样将data中的 id,name及其值取出来?

问题解释

两种方式:
1.先取得node
String strID = node.getAttributes().getNamedItem("id").getNodeValue();
String strName = node.getAttributes().getNamedItem("name").getNodeValue();
2.先取得element
String strID = element.getAttribute("id");
String strName = element.getAttribute("name");

小练习

#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET

tree = ET.parse('abcdefg.xml')
root = tree.getroot()

iter_elem = root.findall('.//*')
print(len(iter_elem))
#elem = root.find('')
#print iter_elem
for element in iter_elem:

    if element is None:
        continue
    if element.text is None:
        continue
    print("hello")
    context=[]	
    src_elem = element.find("source")
    if src_elem is None:
        continue
    context.append(src_elem.text)	

    print( "attri :%s"%src_elem.attrib)
    print("tag :%s"%src_elem.tag)		

    #for item in src_elem:
	#    key = item.text()
	#    print list(key)


del duplicatd node:

import xml.etree.ElementTree as ET
path = 'in.xml'
tree = ET.parse(path)
root = tree.getroot()
prev = None

def elements_equal(e1, e2):
    if type(e1) != type(e2):
        return False
    if e1.tag != e1.tag: return False
    if e1.text != e2.text: return False
    if e1.tail != e2.tail: return False
    if e1.attrib != e2.attrib: return False
    if len(e1) != len(e2): return False
    return all([elements_equal(c1, c2) for c1, c2 in zip(e1, e2)])

for page in root:                     # iterate over pages
    elems_to_remove = []
    for elem in page:
        if elements_equal(elem, prev):
            print("found duplicate: %s" % elem.text)   # equal function works well
            elems_to_remove.append(elem)
            continue
        prev = elem
    for elem_to_remove in elems_to_remove:
        page.remove(elem_to_remove)
tree.write("out.xml")

  

 


RapidXml库的使用博客文章推荐:
https://blog.csdn.net/wqvbjhc/article/details/7662931
https://www.cnblogs.com/kanego/articles/2247602.html
http://blog.csdn.net/wqvbjhc/article/details/7662931
http://www.oschina.net/question/873634_81784
http://www.cnblogs.com/kanego/articles/2247602.html
http://blog.sina.com.cn/s/blog_a459dcf501019393.html

 

 

 

posted @ 2020-07-15 20:05  七星望  阅读(325)  评论(0编辑  收藏  举报