Stanford Parser 使用方法
2014-03-31 17:04
274 查看
一、stanford parser是什么?
stanford parser是stanford nlp小组提供的一系列工具之一,能够用来完成语法分析任务。支持英文、中文、德文、法文、阿拉伯文等多种语言。可以从这里(http://nlp.stanford.edu/software/lex-parser.shtml#Download)下载编译好的jar包、源码、javadoc等等。
http://nlp.stanford.edu/software/parser-faq.shtml是FAQ,看一下FAQ基本就能明白很多东西。当然,你得懂英文是吧?哈哈。
二、 stanford parser怎么用?
首先 登录 http://nlp.stanford.edu/software/lex-parser.shtml#Download 下载
这里我选择下载 3.31 最新的版本
解压后文件如下
可以看到有两个Demo,这是stanfrord 大学帮助使用者理解的,那么怎么把它们导入到 exclipse 中运行呢?
首先建立文件parser ,然后右击 属性,Build path,把两个jar文件加入,就可以直接调用其中的类了。
然后把代码直接贴入
import java.io.IOException;
import java.io.StringReader;
import java.util.*;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.Label;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.Tokenizer;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
class ParserDemo2 {
/** This example shows a few more ways of providing input to a parser.
*
* Usage: ParserDemo2 [grammar [textFile]]
*/
public static void main(String[] args) throws IOException {
String grammar = args.length > 0 ? args[0] : "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";
String[] options = { "-maxLength", "80", "-retainTmpSubcategories" };
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
TreebankLanguagePack tlp = lp.getOp().langpack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
Iterable<List<? extends HasWord>> sentences;
if (args.length > 1) {
DocumentPreprocessor dp = new DocumentPreprocessor(args[1]);
List<List<? extends HasWord>> tmp =
new ArrayList<List<? extends HasWord>>();
for (List<HasWord> sentence : dp) {
tmp.add(sentence);
}
sentences = tmp;
} else {
// Showing tokenization and parsing in code a couple of different ways.
String[] sent = { "This", "is", "an", "easy", "sentence", "." };
List<HasWord> sentence = new ArrayList<HasWord>();
for (String word : sent) {
sentence.add(new Word(word));
}
String sent2 = ("It has long been known that the rate of oxidative metabolism (the process that uses oxygen to convert food into energy) in any animal has a profound effect on its living patterns. The high metabolic rate of small animals, for example, gives them sustained power and activity per unit of weight, but at the cost of requiring constant consumption of food and water. Very large animals, with their relatively low metabolic rates, can survive well on a sporadic food supply, but can generate little metabolic energy per gram of body weight. If only oxidative metabolic rate is considered, therefore, one might assume that smaller, more active, animals could prey on larger ones, at least if they attacked in groups. Perhaps they could if it were not for anaerobic glycolysis, the great equalizer.");
// Use the default tokenizer for this TreebankLanguagePack
Tokenizer<? extends HasWord> toke =
tlp.getTokenizerFactory().getTokenizer(new StringReader(sent2));
List<? extends HasWord> sentence2 = toke.tokenize();
List<List<? extends HasWord>> tmp =
new ArrayList<List<? extends HasWord>>();
tmp.add(sentence);
tmp.add(sentence2);
sentences = tmp;
}
for (List<? extends HasWord> sentence : sentences) {
Tree parse = lp.parse(sentence);
parse.pennPrint();
//
//
//
System.out.println();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
System.out.println("The words of the sentence:");
for (Label lab : parse.yield()) {
if (lab instanceof CoreLabel) {
System.out.println(((CoreLabel) lab).toString("{map}"));
} else {
System.out.println(lab);
}
}
System.out.println();
System.out.println(parse.taggedYield());
System.out.println();
}
// This method turns the String into a single sentence using the
// default tokenizer for the TreebankLanguagePack.
String sent3 = "This is one last test!";
lp.parse(sent3).pennPrint();
}
private ParserDemo2() {} // static methods only
}
下面是运行结果:
相关文章推荐
- Stanford机器学习---第一讲. Linear Regression with one variable
- Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable
- Stanford机器学习---第四讲. 神经网络的表示 Neural Networks representation
- Stanford机器学习---第五讲. 神经网络的学习 Neural Networks learning
- Stanford机器学习---第六讲. 怎样选择机器学习方法、系统
- Stanford机器学习---第七讲. 机器学习系统设计
- Stanford机器学习---第八讲. 支持向量机SVM
- Stanford机器学习---第九讲. 聚类
- Stanford机器学习---第十讲. 数据降维
- [android]Lyric LRC格式文件解析
- [android]Lyric LRC格式文件解析
- [android]Lyric LRC格式文件解析
- [android]Lyric LRC格式文件解析
- ffmpeg + SDL 系列 4 ASF文件解析相关资料整理
- 带你五分钟搞定Jackson JSON Processor
- HTML文本解析器C模块 for LUA
- json数据解析
- 常见C/C++ XML解析器比较
- MSQL Language
- 学习 bison 原理(一)