python解析xml

python 解析xml

有时候需要从xml 文件中读取数据来做相关的处理，所以需要使用python来处理xml文件，在处理xml 文件时，有以下几种处理方法

一、使用xml.etree.ElementTree模块

这是python 标准库中最简单易用的xml解析方式，适合大多数场景

基本步骤：

导入模块：import xml.etree.ElementTree as ET
解析XML：
- 从文件解析：tree = ET.parse('file.xml')
- 从字符串解析：root = ET.fromstring(xml_string)
获取根元素：root = tree.getroot()
遍历元素：
- 直接访问子元素：for child in root:
- 使用find/findall方法查找特定元素
- 用元素.text 来获取相关信息

示例

示例xml
<?xml version="1.0" encoding="UTF-8"?>
<books>
    <book>
        <title>title1</title>
        <author>author1</author>
        <price>price1</price>
    </book>
    <book>
        <title>title2</title>
        <author>author2</author>
        <price>price2</price>
    </book>
    <book>
        <title>title3</title>
        <author>author3</author>
        <price>price3</price>
    </book>
</books>

import xml.etree.ElementTree as ET

# 从文件解析
tree = ET.parse('books.xml')
root = tree.getroot()

# 遍历所有book元素
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    print(f"书名: {title}, 作者: {author}")

# 使用XPath查找price
for price in root.iter('price'):
    print(price.text)

二、使用DOM方式(xml.dom.minidom)

适合需要完整DOM树操作的场景，内存占用较大

基本步骤：

导入模块：from xml.dom import minidom
解析XML：dom = minidom.parse('file.xml')
获取文档元素：root = dom.documentElement
遍历节点：
- 使用childNodes属性
- 使用getElementsByTagName方法通过tag 来查找元素
- getElementById 通过id 来查找元素
- i.getAttribute("name") 获取属性
- i.firstChild.data 获取dom text

from xml.dom import minidom
dom = minidom.parse('test1.xml')
print(dom.nodeName)
print(dom.firstChild.tagName)
books = dom.getElementsByTagName("book")
print(books.length)
for book in books:
    title = book.getElementsByTagName('title')[0].firstChild.data
    print(title)

三、处理命名空间

命名空间基础概念

命名空间通过URL标识，通常以xmlns 属性声明

格式示例:

   <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></urlset>

带命名空间的元素写法 ns:element

Python中处理命名空间的步骤

步骤1：定义命名空间字典

namespaces = {
    'ns': 'http://example.com/ns',
    'prefix': 'http://other-namespace.com'
}

步骤2：在查找元素时使用命名空间前缀

# 查找单个元素
element = root.find('ns:element_name', namespaces)

# 查找多个元素
elements = root.findall('ns:element_name', namespaces)

完整处理示例

import xml.etree.ElementTree as ET

# 示例XML内容(含命名空间)
xml_content = '''
<root xmlns:ns="http://example.com/ns">
    <ns:item>值1</ns:item>
    <ns:item>值2</ns:item>
</root>
'''

# 解析XML
root = ET.fromstring(xml_content)

# 定义命名空间映射
namespaces = {
    'ns': 'http://example.com/ns'
}

# 使用命名空间查找元素
items = root.findall('ns:item', namespaces)
for item in items:
    print(item.text)

posted @ 2025-06-18 17:14 最大的敌人是自律阅读(69) 评论(0) 收藏举报

刷新页面返回顶部

加载中...

最大的敌人是自律

python解析xml

python 解析xml

一、使用xml.etree.ElementTree模块

二、使用DOM方式(xml.dom.minidom)

三、处理命名空间

公告