您的位置:首页 > 编程语言 > Java开发

java xml 和 dom4j 使用ABC

2014-12-22 19:59 423 查看
java应用中到处可见xml的身影 ,从web.xml到各框架的配置文件,至wsdp中多数是xml:jaxp(java api for xml processing),jaxb(java Architecture for xml binding),sjsxp(sun java streaming xml parser),....可以访问wsdp(java web service developer pack)的网址:http://www.oracle.com/technetwork/java/webservicespack-jsp-140788.html.我只起个头.里面涉及的知识很广。看完本文的读者可以阅读这两本书:java
and xml,java web service

part 1: java xml

1.xml解析器

相信多数java程序员都知道,官方只负责组织架构api,api的实现由不同的厂商来开发.在xml上,好像是迟了一点,在StAX面世之后,jaxp还没出炉,已知的技术有:

1.1 w3c.org 的Dom,跨语言的,跨平台的文档对象模型 .这个不论是在前端和服务器中都有汲列。Document是可读可写的内存树,在非顺序读和写的时候比较出色通常配合xpath

1.2 saxproject.org的sax,也是跨语言的,基于事件回调的处理模型, 比dom的优势是省内存,是顺序读。

1.3 bea的StAX试图统一dom和sax的领土,在jsr 173项目之下.网址:https://www.jcp.org/en/jsr/detail?id=173.我没用过,就不多说了

上面的都是卖点,但不是解析器。我知道的解析器有apache 的 xerces:http://xerces.apache.org/.

支持

    SAX 2.0.2

    DOM Level 3 Core, Load and Save

    DOM Level 2 Core, Events, Traversal and Range

    JAXP 1.4

    StAX 1.0 Event API (javax.xml.stream.events)

2.jaxp是什么

是java对sax,dom的封口,让你用一个jaxp即可使用dom,也可以使用sax.用sax解析xml时使用SAXParserFactory;用dom解析时使用DocumentBuilderFactory

3.jaxb是什么

在java对象和xml之间架起一座桥梁。让你不用理会dom,sax,stax.你面对的要么是xml,要么是java bean.使用marshaller把java对象转到xml(可以一个实例一个xml也可以多个实例一个xml),unmarshaller把xml中的数据还原为java对象实例 ,包的位置:javax.xml.bind.

part 2: dom4j

1.dom4j不是xml解析器,它跟jdom不同之处在于其提供了一套xml抽像接口,顶接口:node,Attribute, Branch, CDATA, CharacterData, Comment, Document(不是w3c的Document), DocumentType, Element, Entity, ProcessingInstruction, Text都是node的子接口

2.默认工厂:DocumentFactory

还有几个具体用处的子工厂:BeanDocumentFactory, DatatypeDocumentFactory, DatatypeElementFactory, DOMDocumentFactory, IndexedDocumentFactory, NonLazyDocumentFactory, UserDataDocumentFactory

说一说:DOMDocumentFactory,它继承了DocumentFactory并实现了org.w3c.dom.DOMImplementation.如果有一个方法接受org.w3c.dom.Element,你可以传给它一个DOMDocumentFactory实例创建的org.dom4j.Element

3.dom4j中也可以使用sax,dom,stax来解析xml,创建的解析器是通过jaxp创建的,他们都在org.dom4j.io包中,org.dom4j.io.DOMReader,org.dom4j.io.SAXReader.org.dom4j.io.XPP3Reader.解析方法都是:read

4.序列化:指的是输出到字符串对象,文件,控制台,可使用:org.dom4j.io.XMLWriter,除此之外还有:

org.dom4j.io.DOMWriter输出到org.dom4j.document中返回一个org.w3c.dom.Document,

org.dom4j.io.SAXWriter输出到org.xml.sax.ContentHandler

5.使用Dom4j解析RSS url

5.1 使用ElementHandler

public class SAXRssParser{
private final SAXReader reader;
private final List<RssItem> items;

public SAXRssParser() {
super();
this.reader =  new SAXReader();
this.items = new ArrayList<>();
}

public boolean parser(final URL url) {
// TODO Auto-generated method stub
reader.addHandler("/rss/channel/item",new ElementHandler(){
final ItemChildElementHandler titleHandler=new ItemChildElementHandler();
final ItemChildElementHandler linkHandler=new ItemChildElementHandler();
final ItemChildElementHandler dateHandler=new ItemChildElementHandler();
final ItemChildElementHandler descripHandler=new ItemChildElementHandler();

@Override
public void onStart(ElementPath elementPath) {
// TODO Auto-generated method stub
elementPath.addHandler("title",titleHandler);
elementPath.addHandler("link",linkHandler);
elementPath.addHandler("pubDate",dateHandler);
elementPath.addHandler("description",descripHandler);
}

@Override
public void onEnd(ElementPath elementPath) {
// TODO Auto-generated method stub
elementPath.removeHandler("title");
elementPath.removeHandler("link");
elementPath.removeHandler("pubDate");
elementPath.removeHandler("description");
try {
URL curURL = processRemoteLink(linkHandler.getNodeContent(),url);//处理方法
Date curDate = processDate(dateHandler.getNodeContent());//处理方法
items.add(
new RssItem(
curURL,
titleHandler.getNodeContent(),
descripHandler.getNodeContent(),
curDate));
} catch (MalformedURLException e) {
e.printStackTrace();
}
}
});
try {
reader.read(url);
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return items.size()>0?true:false;
}

public List<RssItem> getEntryList(){
return items;
}

private class ItemChildElementHandler implements ElementHandler{
private String tagName;
private String tagText;

@Override
public void onStart(ElementPath elementPath) {
// TODO Auto-generated method stub
Element elt = elementPath.getCurrent();
tagName=elt.getName();
}
@Override
public void onEnd(ElementPath elementPath) {
// TODO Auto-generated method stub
Element elt = elementPath.getCurrent();
tagText=elt.getText();
}

@SuppressWarnings("unused")
public String getNodeNames(){
return tagName;
}

public String getNodeContent(){
return tagText;
}
}
}


RssItem对像

@ThreadSafe
public class RssItem implements Serializable{
private static final long serialVersionUID = 673250215751499564L;
/**
* 条目的连接地址
*/
private final URL url;
/**
* 条目标题
*/
private final String title;
/**
* 条目简述
*/
private final String description;
/**
* 条目发布日期
*/
private final Date date;

public RssItem(
URL url,
String title,
String description,
Date date) {
super();
this.url = url;
this.title = title;
this.description = description;
this.date = date;
}
public URL getUrl() {
return url;
}
public String getTitle() {
return title;
}
public String getDescription() {
return description;
}
public Date getDate() {
return date;
}
@Override
public int hashCode() {
//ETC
}
@Override
public boolean equals(Object obj) {
//ETC
}
@Override
public String toString() {
//ETC
}
}


测试

public class SAXRssParserTest {

public static void main(String[] args) {
// TODO Auto-generated method stub
String b="http://news.baidu.com/n?cmd=7&loc=4075&name=%D1%CC%CC%A8&tn=rss";
final long beginTime=System.nanoTime();
SAXRssParser sap=new SAXRssParser();
try{
if(sap.parser(new URL(b))){
List<RssItem> news=sap.getEntryList();
System.out.println("size:"+news.size());
for(RssItem ri:news){
System.out.println("title:"+ri.getTitle()+"@"+ri.getDate());
System.out.println("link:"+ri.getUrl());
}
}
}catch(MalformedURLException e){
e.printStackTrace();
}
final long endTime=System.nanoTime();
System.out.println("used Second: "+(endTime-beginTime)/1.0e9);
}

}


5.2 使用VisitorSupport

public class SAXRssParser{
private final SAXReader reader;
private final List<RssItem> items;

public SAXRssParser() {
super();
this.reader =  new SAXReader();
this.items = new ArrayList<>();
}

public boolean parser(final URL url) {
// TODO Auto-generated method stub
Document document=reader.read(url);
final RssVisitorSupport rvs=new RssVisitorSupport(url);
document.accept(rvs);
items.addAll(rvs.getNews());
return rvs.getTotalStep()>0?true:false;
}

public List<RssItem> getEntryList(){
return items;
}

class RssVisitorSupport extends VisitorSupport{
private int step=0;
private RssItemBuilder build=null;
private final List<RssItem> news;
private final URL referURL;

public RssVisitorSupport(final URL referURL){
this.referURL=referURL;
this.news=new ArrayList<>();
}
@Override
public void visit(Element node) {
// TODO Auto-generated method stub
String eleName=node.getName();
if(eleName.equals("item")){
build=new RssItemBuilder();
step++;
}
if (eleName.equals("title") && build!=null) {
build.setTitle(node.getText());
}
if (eleName.equals("link") && build!=null) {
try{
build.setURL(processRemoteLink(node.getText(),referURL));
}catch(MalformedURLException e){
e.printStackTrace();
}
}
if (eleName.equals("pubDate") && build!=null) {
build.setDate(processDate(node.getText()));
}
if (eleName.equals("description") && build!=null) {
build.setDescription(node.getText());
}
if(build!=null && !build.isEmpty()){
news.add(build.build());
build=null;//不设置此值会出现重复数据
}
}
public int getTotalStep(){
return step;
}
public List<RssItem> getNews(){
return news;
}
}
}

由于RssItem设计为不可变对象,所以在RssVisitorSupport中使用的对象:RssItemBuilder,使用了构建模式。关于Builder设计模式可以参考此文:

Builder Design Pattern in Java

我测了几个rss地址发现:VisitorSupport > ElementHandler > Iterator

6.jaxb示例

场景:以前在写后台程序时都有一个功能管理菜单,不知道jaxb为何时,都会创建一份xml,用一种解析器在程序启动时创建一个单例

6.1功能管理菜单xml

<?xml version="1.0" encoding="UTF-8"?>
<root ico="sec">
<group name="会员管理" link="/user" symbol="sec_1">
<item>
<anchor>会员列表</anchor>
<id>child_1_1</id>
<link>/user</link>
</item>
<item>
<anchor>个人信息</anchor>
<id>child_1_2</id>
<link>/user/person</link>
</item>
<item>
<anchor>企业信息</anchor>
<id>child_1_3</id>
<link>/user/company</link>
</item>
<item>
<anchor>安全问题</anchor>
<id>child_1_4</id>
<link>/user/secret</link>
</item>
<item>
<anchor>信用记录</anchor>
<id>child_1_5</id>
<link>/user/trust</link>
</item>
</group>
<group name="商品管理" link="/product" symbol="sec_2">
<item>
<anchor>商品列表</anchor>
<id>child_2_1</id>
<link>/product</link>
</item>
<item>
<anchor>交易帐号</anchor>
<id>child_2_2</id>
<link>/product/account</link>
</item>
<item>
<anchor>扩展字段</anchor>
<id>child_2_3</id>
<link>/product/field</link>
</item>
<item>
<anchor>类型模板</anchor>
<id>child_2_4</id>
<link>/product/field/template</link>
</item>
</group>
<group name="订单管理" link="/order" symbol="sec_3">
<item>
<anchor>订单列表</anchor>
<id>child_3_1</id>
<link>/order</link>
</item>
<item>
<anchor>清单管理</anchor>
<id>child_3_2</id>
<link>/order/inventory</link>
</item>
<item>
<anchor>点评管理</anchor>
<id>child_3_3</id>
<link>/order/pointer</link>
</item>
</group>
<group name="财务管理" symbol="sec_4">
<item>
<anchor>网银交易渠道</anchor>
<id>child_4_1</id>
<link>/channel</link>
</item>
<item>
<anchor>充值记录</anchor>
<id>child_4_2</id>
<link>/channel/cache</link>
</item>
<item>
<anchor>银行卡管理</anchor>
<id>child_4_3</id>
<link>/bank/card</link>
</item>
<item>
<anchor>帐单管理</anchor>
<id>child_4_4</id>
<link>/bill</link>
</item>
<item>
<anchor>现金记录</anchor>
<id>child_4_5</id>
<link>/bank/saction</link>
</item>
<item>
<anchor>支付宝转账记录</anchor>
<id>child_4_6</id>
<link>/bill/ali</link>
</item>
</group>
<group name="新闻管理" link="/news" symbol="sec_5">
<item>
<anchor>新闻列表</anchor>
<id>child_5_1</id>
<link>/news</link>
</item>
<item>
<anchor>新闻栏目</anchor>
<id>child_5_2</id>
<link>/news/category</link>
</item>
<item>
<anchor>新闻标题标识</anchor>
<id>child_5_3</id>
<link>/news/level</link>
</item>
</group>
<group name="系统管理" symbol="sec_6">
<item>
<anchor>投诉/意见反馈</anchor>
<id>child_6_1</id>
<link>/feedback</link>
</item>
<item>
<anchor>活跃日志</anchor>
<id>child_6_2</id>
<link>/user/active</link>
</item>
<item>
<anchor>会员等级</anchor>
<id>child_6_3</id>
<link>/user/level</link>
</item>
<item>
<anchor>手机短信</anchor>
<id>child_6_4</id>
<link>/recaptcha</link>
</item>
<item>
<anchor>站内消息</anchor>
<id>child_6_5</id>
<link>/message</link>
</item>
<item>
<anchor>关键词</anchor>
<id>child_6_6</id>
<link>/word</link>
</item>
</group>
</root>

6.2使用dom4j的saxreader解析上面的xml文件
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import net.project.entity.Group;
import net.project.entity.GroupItem;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
/**
* 传统的sax解析
* @author xiaofanku
* 20130701
*/
public class ParserManagerPanel {
private static ParserManagerPanel instance=null;
private final List<Group> group;

private ParserManagerPanel(InputStream stream){
this.group=new ArrayList<Group>();

try{
parser(new SAXReader().read(stream));
}catch(DocumentException e){
e.printStackTrace();
}
}
private void parser(final Document doc){

List<Node> list = doc.selectNodes("//group");
for (Iterator<Node> iter = list.iterator(); iter.hasNext(); ) {
Element currentGroup=(Element)iter.next();

Group mg=new Group();
String defaultLink=currentGroup.attributeValue("link");
if(defaultLink==null || defaultLink.isEmpty()){
defaultLink="-";
}
mg.setLink(defaultLink);
mg.setName(currentGroup.attributeValue("name"));
mg.setSymbol(currentGroup.attributeValue("symbol"));

List<Node> groupChild=currentGroup.selectNodes("./item");
for(Node currentItem:groupChild){
Element anchor=(Element)currentItem.selectSingleNode("./anchor");
Element idEle=(Element)currentItem.selectSingleNode("./id");
Element link=(Element)currentItem.selectSingleNode("./link");
try{
GroupItem item=new GroupItem();
item.setAnchor(anchor.getText());
item.setId(idEle.getText());
item.setLink(link.getText());
mg.getItems().add(item);
}catch(NullPointerException e){
e.printStackTrace();
}
}
group.add(mg);
}

}
public static ParserManagerPanel getInstance(InputStream input){
if(instance==null){
instance=new ParserManagerPanel(input);
}
return instance;
}
public List<Group> getStruct(){
return group;
}
}

其中汲及的对象
public class Group implements Serializable{

/**
*
*/
private static final long serialVersionUID = 1L;
private String name;
private String symbol;
private String link;
private List<GroupItem> items= null;

public Group() {
super();
// TODO Auto-generated constructor stub
items=new ArrayList<>();
}
//SET/GET
}
public class GroupItem implements Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
private String anchor;
private String id;
private String link;

public GroupItem() {
super();
// TODO Auto-generated constructor stub
}
//SET/GET
}


6.3如果使用jaxb只需要多加几个注解,完全可以不用dom4j来将xml转成对象
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="group")
public class Group implements Serializable{

/**
*
*/
private static final long serialVersionUID = 1L;

@XmlAttribute
private String name;

@XmlAttribute
private String symbol;

@XmlAttribute(required = false)
private String link;

@XmlElement(name="item")
private List<GroupItem> items= null;

public Group() {
super();
// TODO Auto-generated constructor stub
items=new ArrayList<>();
}
//GET/SET
}
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="item")
public class GroupItem implements Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
@XmlElement
private String anchor;

@XmlElement
private String id;

@XmlElement
private String link;

public GroupItem() {
super();
// TODO Auto-generated constructor stub
}
//GET/SET
}

新增一个类
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name="root")
public class GroupPanel {

@XmlElement(name="group")
private List<Group> groups= null;

@XmlAttribute
private String ico;

public GroupPanel() {
super();
// TODO Auto-generated constructor stub
groups=new ArrayList<>();
}
//GET/SET
}

最后是调用时的测试代码
JAXBContext jc = JAXBContext.newInstance(GroupPanel.class, Group.class, GroupItem.class);
Unmarshaller u = jc.createUnmarshaller();
GroupPanel gs = (GroupPanel) u.unmarshal(new File("/managerGroup.xml"));
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Java xml dom4j dom sax