用python来解析xml文件(简单情况)
2008-03-02 17:32
495 查看
首先,sax解析最直观,当然,也可以容许xml文件出些错。
先给定一个xml文件book.xml,
<catalog>
<book isbn="0-596-00128-2">
<title>Python & XML</title>
<author>Jones, Drake</author>
</book>
<book isbn="0-596-00085-5">
<title>Programming Python</title>
<author>Lutz</author>
</book>
<book isbn="0-596-00281-5">
<title>Learning Python</title>
<author>Lutz, Ascher</author>
</book>
<book isbn="0-596-00797-3">
<title>Python Cookbook</title>
<author>Martelli, Ravenscroft, Ascher</author>
</book>
<!-- imagine more entries here -->
</catalog>写一个BookHandler, 如下:
# -*- coding: utf-8 -*-
import xml.sax.handler
class BookHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.inTitle = 0 # handle XML parser events
self.mapping = {} # a state machine model
def startElement(self, name, attributes):
if name == "book": # on start book tag
self.buffer = "" # save ISBN for dict key
self.isbn = attributes["isbn"]
elif name == "title": # on start title tag
self.inTitle = 1 # save title text to follow
def characters(self, data):
if self.inTitle: # on text within tag
self.buffer += data # save text if in title
def endElement(self, name):
if name == "title":
self.inTitle = 0 # on end title tag
self.mapping[self.isbn] = self.buffer # store title text in dict
import xml.sax
import pprint
parser = xml.sax.make_parser( )
handler = BookHandler( )
parser.setContentHandler(handler)
parser.parse('book.xml')
pprint.pprint(handler.mapping)
结果如下:
Process started >>>
{u'0-596-00085-5': u'Programming Python',
u'0-596-00128-2': u'Python & XML',
u'0-596-00281-5': u'Learning Python',
u'0-596-00797-3': u'Python Cookbook'}<<< Process finished.
================ READY ================
不过,这是比较简单的情况了。而且我们可以看到,结果全是以unicode串输出的。
先给定一个xml文件book.xml,
<catalog>
<book isbn="0-596-00128-2">
<title>Python & XML</title>
<author>Jones, Drake</author>
</book>
<book isbn="0-596-00085-5">
<title>Programming Python</title>
<author>Lutz</author>
</book>
<book isbn="0-596-00281-5">
<title>Learning Python</title>
<author>Lutz, Ascher</author>
</book>
<book isbn="0-596-00797-3">
<title>Python Cookbook</title>
<author>Martelli, Ravenscroft, Ascher</author>
</book>
<!-- imagine more entries here -->
</catalog>写一个BookHandler, 如下:
# -*- coding: utf-8 -*-
import xml.sax.handler
class BookHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self.inTitle = 0 # handle XML parser events
self.mapping = {} # a state machine model
def startElement(self, name, attributes):
if name == "book": # on start book tag
self.buffer = "" # save ISBN for dict key
self.isbn = attributes["isbn"]
elif name == "title": # on start title tag
self.inTitle = 1 # save title text to follow
def characters(self, data):
if self.inTitle: # on text within tag
self.buffer += data # save text if in title
def endElement(self, name):
if name == "title":
self.inTitle = 0 # on end title tag
self.mapping[self.isbn] = self.buffer # store title text in dict
import xml.sax
import pprint
parser = xml.sax.make_parser( )
handler = BookHandler( )
parser.setContentHandler(handler)
parser.parse('book.xml')
pprint.pprint(handler.mapping)
结果如下:
Process started >>>
{u'0-596-00085-5': u'Programming Python',
u'0-596-00128-2': u'Python & XML',
u'0-596-00281-5': u'Learning Python',
u'0-596-00797-3': u'Python Cookbook'}<<< Process finished.
================ READY ================
不过,这是比较简单的情况了。而且我们可以看到,结果全是以unicode串输出的。
相关文章推荐
- 用python来解析xml文件(简单情况)
- python解析XML文件(基础)
- python argparse模块解析命令行选项简单使用
- Python基础(十一) 使用xml.dom 创建XML文件与解析
- 原创:Js解析xml文件并简单实现省市区级联菜单(并解决各浏览器兼容性问题).
- Python:HTMLParser模块进行简单的html解析
- JSON的简单数据解析与转换(Python)
- Python SAX模块对大xml文件解析的错误认识
- Android中pull解析XML文件的简单使用
- SAX 解析XML文件(相对解析的所有方法来说,这种最简单)
- python-21-如何读写json数据?如何解析简单的xml文档?
- Js解析xml文件并简单实现省市区级联菜单(并解决各浏览器兼容性问题).
- 使用python对xml文件实现增删改查的简单封装
- Python解析xml文件(二)
- Dom4j解析xml文件的简单快速用法
- python解析xml文件实例分享
- python解析VOC的xml文件并转成自己需要的txt格式
- 用java去解析一个最简单的XML文件
- python实现XML文件解析与修改
- python3解析XML文件