您的位置:首页 > 编程语言 > Python开发

python3基础教程 项目3:万能的XML

2018-08-01 16:08 232 查看

模块介绍:

在python中使用sax方式处理xml要先引入xml.sax中的parse函数,还有xml.sax.handler中的ContentHandler

parse函数:用于解析xml文件

ContentHandler类:包含多个事件处理方法(详见https://blog.csdn.net/mengzhongdaima/article/details/81283117

几个注意点:

getattr()函数:用于返回一个对象属性值。

callable() 函数:用于检查一个对象是否是可调用的。

os.join()函数:使用正确的分隔符(‘/’)将多条路径合二为一。

os.makedirs()函数:在指定的路径中创建必要的目录。

变量名后加逗号:将该变量转换成元组类型。(详见https://blog.csdn.net/mengzhongdaima/article/details/81331674

以下是代码及注释:

[code]from xml.sax.handler import ContentHandler
from xml.sax import parse
import os

class Dispatcher: #分派器类,该类负责为指定的需处理的事件查找与其对应的处理程序

def dispatch(self, prefix, name, attrs=None):
#负责查找合适的处理程序、创建参数元素并使用这些参数调用处理程序
mname = prefix + name.capitalize()
dname = 'default' + prefix.capitalize()
method = getattr(self, mname, None)
if callable(method): args = ()
else:
method = getattr(self, dname, None)
args = name,
if prefix == 'start':args += attrs,
if callable(method): method(*args)

#以下两条为基本的事件处理程序,它们只是调用方法dispatch
def startElement(self, name, attrs):
self.dispatch('start', name, attrs)

def endElement(self, name):
self.dispatch('end', name)

class WebsiteConstructor(Dispatcher, ContentHandler):

passthrough = False     #利用passthrough确定当前是否在某一元素(xml文本块)内

def __init__(self, directory):
self.directory = [directory]
self.ensureDirectory()

def ensureDirectory(self):
path = os.path.join(*self.directory)
os.makedirs(path, exist_ok = True)      #在指定的路径中创建必要的目录

def characters(self, chars):    #遇到字符串自动调用
if self.passthrough: self.out.write(chars)

def defaultStart(self, name, attrs):    #处理除了标题和文件头以外的xml块
if self.passthrough:
self.out.write('<' + name)
for key,val in attrs.items():
self.out.write(' {}="{}"'.format(key,val))
self.out.write('>')
def defaultEnd(self, name):
if self.passthrough:
self.out.write('</{}>'.format(name))

def startDirectory(self, attrs):
self.directory.append(attrs['name'])
self.ensureDirectory()

def endDirectory(self):
self.directory.pop()

def startPage(self, attrs):
filename = os.path.join(*self.directory + [attrs['name'] + '.html'])
self.out = open(filename, 'w')
self.writeHeader(attrs['title'])
self.passthrough = True

def endPage(self):
self.passthrough = False
self.writeFooter()
self.out.close()

def writeHeader(self, title):   #将首部写入文件
self.out.write('<html>\n <head>\n   <title>')
self.out.write(title)
self.out.write('</title>\n </head>\n <body>\n')

def writeFooter(self):      #将尾部写入文件
self.out.write('\n </body>\n</html>\n')

parse('website.xml',WebsiteConstructor('public_html'))

website.xml:

[code]<website>
<page name = "index" title = "Home Page">
<h1> Welcome to My Home Page</h1>

<p>Hi, there. My name is Mr.Gumby, and this is my home page.
Here are some of my interests
</p>

<ul>
<li><a href = "interests/shouting.html">Shouting</a></li>
<li><a href = "interests/sleeping.html">Sleeping</a></li>
<li><a href = "interests/eating.html">Eating</a></li>
</ul>
</page>
<directory name="interests">
<page name="shouting" title="Shoutin">
<h1>Mr.Gumby's Shouting Page</h1>
<p>...</p>
</page>
<page name="sleeping" title="Sleeping">
<h1>Mr.Gumby's Sleeping Page</h1>
<p>...</p>
</page>
<page name="eating" title="Eating">
<h1>Mr.Gumby's Eating Page</h1>
<p>...</p>
</page>
</directory>
</website>

 

阅读更多
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: