您的位置:首页 > 编程语言 > Java开发

SAX 解析 XML——JAVA

2015-04-30 16:28 295 查看
<?xml version='1.0' encoding='UTF-8'?>
<samples>
<search_results><query id="7015">the raven</query><engine status="OK" timestamp="2014-05-15 13:43:06" name="CiteSeerX" id="FW14-e004"/><snippets><snippet id="FW14-e004-7015-01"><link cache="FW14-topics-docs/e004/7015_01.html" timestamp="2014-05-15 13:43:07">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.51.7167&rank=1</link><title>The Raven System</title><description>The Raven System

by Donald Acton, Terry Coatta, Gerald Neufeld , 1992

"... The Raven System 1 Donald Acton, Terry Coatta and Gerald Neufeld Technical Report TR 92-15 August ..."

Abstract \- Cited by 7 (4 self) \- Add to MetaCart

This report describes the distributed object-oriented system, <em>Raven</em>. <em>Raven</em> is both a distributed</description></snippet><snippet id="FW14-e004-7015-02"><link cache="FW14-topics-docs/e004/7015_02.html" timestamp="2014-05-15 13:43:08">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.35.4932&rank=2</link><title>The Raven System</title><description>The Raven System

by Donald Acton Terry, Terry Coatta, Gerald Neufeld , 1992

"... The Raven System 1 Donald Acton, Terry Coatta and Gerald Neufeld Technical Report TR 92-15 August ..."

Abstract \- Add to MetaCart

This report describes the distributed object-oriented system, <em>Raven</em>. <em>Raven</em> is both a distributed</description></snippet><snippet id="FW14-e004-7015-03"><link cache="FW14-topics-docs/e004/7015_03.html" timestamp="2014-05-15 13:43:08">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.276.8949&rank=3</link><title>book In the Company of Crows and Ravens</title><description>book In the Company of Crows and Ravens

by Marzluff Jm, John Marzluff, Tony Angell, Quote Reverend Henry Ward Beecher’s

"... Book Reviews/Science in the Media Living with the Trickster: Crows, Ravens, and Human Culture ..."

Abstract \- Add to MetaCart

Few groups of wild animals inspire such extreme opinions in the humans who observe them than</description></snippet><snippet id="FW14-e004-7015-04"><link cache="FW14-topics-docs/e004/7015_04.html" timestamp="2014-05-15 13:43:09">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.395.6124&rank=4</link><title>Design by Raven Design</title><description>Design by Raven Design

by Third-level Programmes, Edited Irene Sheridan, Dr Margaret Linehan

"... by Raven Design Printed by City Print Ltd © CIT Press 2011 ISBN 978-1-906953-07-2 The toolkit includes ..."

Abstract \- Add to MetaCart

Work Placement in Third-Level Programmes is one of a number of significant outputs of the Roadmap for Employment–Academic Partnerships (REAP) Project. This report draws together for the first time perspectives on placement from all of the key stakeholders. In addition to providing a unique overview of the placement experience the project team have used the information gathered to develop a useful, transferable toolkit for placement. Publication Information Although every effort has been made to ensure the accuracy of the material contained in this publication, complete accuracy cannot be guaranteed. All or part of this publication may be reproduced without further</description></snippet><snippet id="FW14-e004-7015-05"><link cache="FW14-topics-docs/e004/7015_05.html" timestamp="2014-05-15 13:43:09">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.149.3392&rank=5</link><title>Basic objects in natural categories</title><description>Basic objects in natural categories

by Eleanor Rosch, Carolyn B. Mervis, Wayne D. Gray, David M, Penny Boyes-braem \- Cognitive Psychology , 1976

"... , & Raven, 1966); and finally, the location of natural groupings at a particular level of abstraction ..."

Abstract \- Cited by 487 (1 self) \- Add to MetaCart

Categorizations which humans make of the concrete world are not arbitrary but highly determined. In taxonomies of concrete objects, there is one level of abstraction at which the most basic category cuts are made. Basic categories are those which carry the most information, possess the highest category cue validity, and are, thus, the most differentiated from one another. The four experiments of Part I define basic objects by demonstrating that in taxonomies of common concrete nouns in English based on class inclusion, basic objects are the most inclusive categories whose members: (a) possess significant numbers of attributes in common, (b) have motor programs which are similar to one another, (c) have similar shapes, and (d) can be identified from averaged shapes of members of the class. The eight experiments of Part II explore implications of the structure of categories. Basic objects are shown to be the most inclusive categories for which a concrete image of the category as a whole can be formed, to be the first categorizations made during perception of the environment, to be the earliest categories sorted and earliest named by children, and to be the categories</description></snippet><snippet id="FW14-e004-7015-06"><link cache="FW14-topics-docs/e004/7015_06.html" timestamp="2014-05-15 13:43:10">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.300.2871&rank=6</link><title>A Bayesian Model of Rule Induction in Raven’s Progressive Matrices</title><description>A Bayesian Model of Rule Induction in Raven’s Progressive Matrices

by Daniel R. Little, Stephan Lewandowsky, Crawley Wa, Thomas L. Griffiths (tom

"... A Bayesian Model of Rule Induction in Raven’s Progressive Matrices Daniel R. Little (daniel ..."

Abstract \- Cited by 1 (0 self) \- Add to MetaCart

<em>Raven’s</em> Progressive Matrices (<em>Raven</em>, <em>Raven</em>, & Court, 1998) is one of the most prevalent assays</description></snippet><snippet id="FW14-e004-7015-07"><link cache="FW14-topics-docs/e004/7015_07.html" timestamp="2014-05-15 13:43:11">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.225.3291&rank=7</link><title>A Structure-Mapping Model of Raven’s Progressive Matrices</title><description>A Structure-Mapping Model of Raven’s Progressive Matrices

by Andrew Lovett, Kenneth Forbus, Jeffrey Usher

"... A Structure-Mapping Model of Raven’s Progressive Matrices Andrew Lovett (andrew ..."

Abstract \- Cited by 5 (2 self) \- Add to MetaCart

We present a computational model for solving <em>Raven’s</em> Progressive Matrices. This model combines</description></snippet><snippet id="FW14-e004-7015-08"><link cache="FW14-topics-docs/e004/7015_08.html" timestamp="2014-05-15 13:43:12">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.231.3664&rank=8</link><title>RAVEN – Active Learning of Link Specifications</title><description>RAVEN – Active Learning of Link Specifications

by Axel-cyrille Ngonga Ngomo, Jens Lehmann, Sören Auer, Konrad Höffner

"... RAVEN – Active Learning of Link Specifications Axel-Cyrille Ngonga Ngomo, Jens Lehmann, Sören Auer ..."

Abstract \- Cited by 7 (1 self) \- Add to MetaCart

for a link discovery problem is a tedious task that must still be carried out manually. We present <em>RAVEN</em></description></snippet><snippet id="FW14-e004-7015-09"><link cache="FW14-topics-docs/e004/7015_09.html" timestamp="2014-05-15 13:43:13">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.103.8446&rank=9</link><title>RAVEN: Real-Time Analyzing and Verification Environment</title><description>RAVEN: Real-Time Analyzing and Verification Environment

by Jürgen Ruf \- Journal on Universal Computer Science (J.UCS), Springer , 2001

"... RAVEN: Real-Time Analyzing and Verification Environment Jürgen Ruf (University of Tübingen ..."

Abstract \- Cited by 16 (3 self) \- Add to MetaCart

Abstract: In this paper we present the real-time verification and analysis tool <em>RAVEN</em>. <em>RAVEN</em></description></snippet><snippet id="FW14-e004-7015-10"><link cache="FW14-topics-docs/e004/7015_10.html" timestamp="2014-05-15 13:43:13">http://citeseerx.ist.psu.edu/viewdoc/summary;jsessionid=EBAC5670A019281E8386BCB54B1D1398?doi=10.1.1.39.9827&rank=10</link><title>The Advantages of Evolutionary Computation</title><description>The Advantages of Evolutionary Computation

by David B. Fogel , 1997

"... variants. Others (Atmar, 1979; Raven and Johnson, 1986, pp. 400-401) have suggested that it is more ..."

Abstract \- Cited by 396 (5 self) \- Add to MetaCart

Evolutionary computation is becoming common in the solution of difficult, realworld problems in industry, medicine, and defense. This paper reviews some of the practical advantages to using evolutionary algorithms as compared with classic methods of optimization or artificial intelligence. Specific advantages include the flexibility of the procedures, as well as the ability to self-adapt the search for optimum solutions on the fly. As desktop computers increase in speed, the application of evolutionary algorithms will become routine. 1 Introduction Darwinian evolution is intrinsically a robust search and optimization mechanism. Evolved biota demonstrate optimized complex behavior at every level: the cell, the organ, the individual, and the population. The problems that biological species have solved are typified by chaos, chance, temporality, and nonlinear interactivities. These are also characteristics of problems that have proved to be especially intractable to classic methods of o...</description></snippet></snippets></search_results></samples>


import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;
import java.util.Vector;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;
import org.apache.lucene.document.*;

public class SAXXMLDocument extends DefaultHandler{

private StringBuilder elementBuffer = new StringBuilder();
private Map<String, String> attributeMap = new HashMap<String, String>();

private HashMap<String, String> vertical = new HashMap<String, String>();
private HashMap<String, String> urls = new HashMap<String, String>();

private Vector<Document> docs;
private Document doc;

public Document getDocument(InputStream is) throws Exception {
SAXParserFactory spf = SAXParserFactory.newInstance();

try {
SAXParser parser = spf.newSAXParser();
parser.parse(is, this);
} catch (Exception e) {
throw new Exception("Cannot parse XML document", e);
}
return doc;
}

public void startDocument() {
//doc = new Document();
}

private String queryid, querytext, engineid, enginename, verticalid, snippetid, engineurl;

public void startElement(String uri, String localName, String qName, Attributes atts) {
elementBuffer.setLength(0);
attributeMap.clear();
int numAtts = atts.getLength();
if(numAtts > 0) {
for(int i=0; i<numAtts; i++) {
attributeMap.put(atts.getQName(i), atts.getValue(i));
}
}
if(qName.equals("snippet")) {
//doc = new Document();
snippetid = attributeMap.get("id");
}
}

public void characters(char[] text, int start, int length) {
elementBuffer.append(text, start, length);
}

public void endElement(String uri, String localName, String qName) {
if(qName.equals("query")) {
/*
for(Entry<String, String> attribute : attributeMap.entrySet()) {
String attName = attribute.getKey();
}
*/
queryid = attributeMap.get("id");
querytext = elementBuffer.toString();

System.out.println(attributeMap.get("id"));
System.out.println(elementBuffer.toString());
}
else if(qName.equals("engine")) {
engineid = attributeMap.get("id");
engineurl = urls.get(engineid);
enginename = attributeMap.get("Name");
verticalid = vertical.get(engineid);

System.out.println(attributeMap.get("id"));
System.out.println(elementBuffer.toString());
System.out.println("v: "+ engineid + vertical.get(engineid));

}
else if(qName.equals("link")) {
System.out.println("link: " + elementBuffer.toString());
}
else if(qName.equals("title")) {
System.out.println("title: " + elementBuffer.toString());
}
else if(qName.equals("description")) {
System.out.println("description: " + elementBuffer.toString());
}
else if(qName.equals("snippet")) {
//docs.add(doc);
//文件结束
System.out.println("________________________________________________");
//System.out.println("snippet" + elementBuffer.toString());
System.out.println("________________________________________________");
}

}

public static void main(String[] args) throws FileNotFoundException, Exception {

// TODO Auto-generated method stub
SAXXMLDocument handler = new SAXXMLDocument();
handler.initResourceInfo();
String input_file = "E:\\FW14-topics-search\\e004\\7015.xml";
Document doc = handler.getDocument(new FileInputStream(new File(input_file)));
System.out.println(doc);

}

public void initResourceInfo() throws FileNotFoundException {

Scanner cin = new Scanner(new File("E:\\resources_fedweb2014.txt"));
cin.nextLine();
while(cin.hasNext()) {
String line = cin.nextLine();
String[] s = line.split("\t");
String engineid = s[0];
String urlid = s[2];
String engineVertical = s[3];
//System.out.println(engineid + " # " + engineVertical);
vertical.put(engineid, engineVertical);
urls.put(engineid, urlid);
}
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: