CoNLL Multi-lingual Dependency Parsing 格式
2014-07-07 00:56
357 查看
CoNLL 任务地址:http://ilk.uvt.nl/conll/
---------------------------------------------------------
Data files contain sentences separated by a blank line.
A sentence consists of one or tokens, each one starting on a new line.
A token consists of ten fields described in the table below. Fields are separated by a single tab character. Space/blank characters are not allowed in within fields
All data files will contains these ten fields, although only the ID, FORM, CPOSTAG, POSTAG, HEAD and DEPREL columns are guaranteed to contain non-dummy (i.e. non-underscore) values for all languages.
Data files are UTF-8 encoded (Unicode). If you think this will be a problem, have a look
here.
Some questions:1.什么是FEATS?
2.PHEAD是不是只有在projective的语言里才能用到?
CoNLL 任务地址:http://ilk.uvt.nl/conll/
---------------------------------------------------------
Data format
Data adheres to the following rules:Data files contain sentences separated by a blank line.
A sentence consists of one or tokens, each one starting on a new line.
A token consists of ten fields described in the table below. Fields are separated by a single tab character. Space/blank characters are not allowed in within fields
All data files will contains these ten fields, although only the ID, FORM, CPOSTAG, POSTAG, HEAD and DEPREL columns are guaranteed to contain non-dummy (i.e. non-underscore) values for all languages.
Data files are UTF-8 encoded (Unicode). If you think this will be a problem, have a look
here.
Field number: | Field name: | Description: |
---|---|---|
1 | ID | Token counter, starting at 1 for each new sentence. |
2 | FORM | Word form or punctuation symbol. |
3 | LEMMA | Lemma or stem (depending on particular data set) of word form, or an underscore if not available. |
4 | CPOSTAG | Coarse-grained part-of-speech tag, where tagset depends on the language. |
5 | POSTAG | Fine-grained part-of-speech tag, where the tagset depends on the language, or identical to the coarse-grained part-of-speech tag if not available. |
6 | FEATS | Unordered set of syntactic and/or morphological features (depending on the particular language), separated by a vertical bar (|), or an underscore if not available. |
7 | HEAD | Head of the current token, which is either a value of ID or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero. |
8 | DEPREL | Dependency relation to the HEAD. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningfull or simply 'ROOT'. |
9 | PHEAD | Projective head of current token, which is either a value of ID or zero ('0'), or an underscore if not available. Note that depending on the original treebank annotation, there may be multiple tokens an with ID of zero. The dependency structure resulting from the PHEAD column is guaranteed to be projective (but is not available for all languages), whereas the structures resulting from the HEAD column will be non-projective for some sentences of some languages (but is always available). |
10 | PDEPREL | Dependency relation to the PHEAD, or an underscore if not available. The set of dependency relations depends on the particular language. Note that depending on the original treebank annotation, the dependency relation may be meaningfull or simply 'ROOT'. |
2.PHEAD是不是只有在projective的语言里才能用到?
相关文章推荐
- 对SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing的浅薄理解
- 对SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing的理解
- Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
- JSONArray 将数组格式的json数据 需要的包 <dependency>
- 【神经网络】Dependency Parsing的两种解决方案
- [ICLR2017]Deep Biaffine Attention for Neural Dependency Parsing
- CS224n (Spring 2017) assignment 2-----2. Neural Transition-Based Dependency Parsing
- 【segmentation & parsing & dependency parsing】nltk调用stanford NLP工具包
- Ubuntu apt 出错解决办法: Problem parsing dependency Depends
- 带高级搜索、书签功能、文件菜单等功能的Hibernate 3.2 Spring 2.0的chm格式Reference
- linux文件、目录与磁盘格式
- Quartz Cron 表达式(时间格式的写法)
- 格式化SYS_GUID()成为标准格式
- C实现将指定秒数转化为时:分:秒格式
- 如何在Dreamweaver cs6 中设置默认文档格式为html5
- eclipse中如何修改编码格式
- Winform TextBox 验证输入的数字格式是否合法
- 各种文件格式
- Matlab读写TIFF格式文件
- 音频数据文件格式(PCM,WAV,MIDI)简记