您的位置:首页 > 其它

中文分词资料

2013-06-17 15:19 148 查看
1,ictcas 包括Java,LinuxC, WindowsC 的版本均在 http://w http:// ww.ictclas.org/index.html 有下载。

2,imdict-chinese-analyzerimdict智能词典 的智能中文分词模块,作者高小平,算法基于隐马尔科夫模型(Hidden
Markov Model, HMM),是中国科学院计算技术研究所的ictclas中文分词程序的重新实现(基于Java),可以直接为lucene搜索引擎提供中文分词支持。 也可以在 http://www.ictclas.org/index.html 下载。

3,LingPipe is a suite of Java libraries for the linguistic analysis of human language. http://alias-i.com/lingpipe/index.html。 这个工具中的分词部分中,可以通过学习形成模型,或者从网站上下载模型。

4,MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm

5,Lucene 中文分词

6,开源中国社区中文分词
7,Microsoft Research S-MSRSeg

2012

Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
Sun, Xu and Wang, Houfeng and Li, Wenjie
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)

Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Sun, Weiwei and Wan, Xiaojun
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)

Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Sun, Weiwei and Uszkoreit, Hans
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)

Joint Chinese Word Segmentation, {POS} Tagging and Parsing
Qian, Xian and Liu, Yang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Li, Zhongguo and Zhou, Guodong
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
Jiang, Wenbin and Meng, Fandong and Liu, Qun and Lü, Yajuan
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Incremental Joint Approach to Word Segmentation, {POS} Tagging, and Dependency Parsing in Chinese
Hatori, Jun and Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics {(Volume} 1: Long Papers)

2011

Improving Chinese Word Segmentation and {POS} Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data
Wang, Yiou and Kazama, Jun'ichi and Tsuruoka, Yoshimasa and Chen, Wenliang and Zhang, Yujie and Torisawa, Kentaro
Proceedings of 5th International Joint Conference on Natural Language Processing

Enhancing Chinese Word Segmentation Using Unlabeled Data
Sun, Weiwei and Xu, Jia
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Sun, Weiwei
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Li, Zhongguo
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Syntactic Processing using the Generalized Perceptron and Beam Search
Zhang, Y. and Clark, S.
Computational Linguistics

A New Unsupervised Approach to Word Segmentation
Wang, H. and Zhu, J. and Tang, S. and Fan, X.
Computational Linguistics

2010

A Fast Decoder for Joint Word Segmentation and {POS-Tagging} Using a Single Discriminative Model
Zhang, Yue and Clark, Stephen
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Joint Tokenization and Translation
Xiao, Xinyan and Liu, Yang and Hwang, {YoungSook} and Liu, Qun and Lin, Shouxun
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)

A Character-Based Joint Model for Chinese Word Segmentation
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Proceedings of the 23rd International Conference on Computational Linguistics {(Coling} 2010)

A Local Generative Model for Chinese Word Segmentation
Zhang, K. and Sun, M. and Xue, P.
Information Retrieval Technology

Joint training and decoding using virtual nodes for cascaded segmentation and tagging tasks
Qian, X. and Zhang, Q. and Zhou, Y. and Huang, X. and Wu, L.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and {POS} Tagging – A Case Study
Jiang, Wenbin and Huang, Liang and Liu, Qun
Proceedings of the 47th {ACL}

Character-Level Dependencies in Chinese: Usefulness and Learning
Zhao, Hai
Proceedings of the 12th Conference of the European Chapter of the {ACL} {(EACL} 2009)

基于字依存树的中文词法-句法一体化分析
赵, 海 and 揭, 春雨 and 宋, 彦
中国计算机语言学研究前沿进展 (2007-2009)

基于 {CRFs} 的中文分词和短文本分类技术
滕, 少华

A Simple and Efficient Model Pruning Method for Conditional Random Fields
Zhao, H. and Kit, C.

Chinese text segmentation: A hybrid approach using transductive learning and statistical association measures
Tsai, R. T. H.
Expert Systems with Applications

Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Proceedings of the Joint Conference of the 47th Annual Meeting of the {ACL} and the 4th International Joint Conference on Natural Language Processing of the {AFNLP}

Punctuation as Implicit Annotations for Chinese Word Segmentation
Li, Zhongguo and Sun, Maosong
Computational Linguistics

An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and {POS} Tagging
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Proc. of {ACL-IJCNLP} 2009

2008

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Mi, Haitao and Liu, Qun
Proceedings of the 22nd International Conference on Computational Linguistics {(Coling} 2008)

Joint Word Segmentation and {POS} Tagging Using a Single Perceptron
Zhang, Yue and Clark, Stephen
Proceedings of {ACL-08:} {HLT}

A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Jiang, Wenbin and Huang, Liang and Liu, Qun and Lü, Yajuan
Proceedings of {ACL-08:} {HLT}

Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition
Zhao, Hai and Kit, Chunyu
The Sixth {SIGHAN} Workshop on Chinese Language Processing

An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework
Zhao, Hai and Kit, Chunyu
The Third International Joint Conference on Natural Language Processing {(IJCNLP-2008)}, Hyderabad, India

Bayesian semi-supervised chinese word segmentation for statistical machine translation
Xu, J. and Gao, J. and Toutanova, K. and Ney, H.
Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1

Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation
Qiao, W. and Sun, M. and Menzel, W.
Text, Speech and Dialogue

Information retrieval oriented word segmentation based on character associative strength ranking
Liu, Y. and Wang, B. and Ding, F. and Xu, S.
Proceedings of the Conference on Empirical Methods in Natural Language Processing

2007

Chinese Segmentation with a Word-Based Perceptron Algorithm
Zhang, Yue and Clark, Stephen

基于有效子串标注的中文分词
赵, 海 and 揭, 春雨
中文信息学报

中文分词十年回顾
黄, 昌宁 and 赵, 海
中文信息学报

A dual-layer {CRFs} based joint decoding method for cascaded segmentation and labeling tasks
Shi, Y. and Wang, M.
Proceedings of {IJCAI}

A hybrid approach to word segmentation and pos tagging
Nakagawa, Tetsuji and Uchimoto, Kiyotaka
{ANNUAL} {MEETING-ASSOCIATION} {FOR} {COMPUTATIONAL} {LINGUISTICS}

Rethinking Chinese word segmentation: tokenization, character classification, or wordbreak identification
Huang, Chu-Ren and Simon, Petr and Hsieh, Shu-Kai and Prévot, L.
Proceedings of the 45th Annual Meeting of the {ACL} on Interactive Poster and Demonstration Sessions

2006

Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Zhang, Ruiqiang and Kikui, Genichiro and Sumita, Eiichiro
Proceedings of the {COLING/ACL} 2006 Main Conference Poster Sessions

汉语词典的快速查询算法研究
李, 江波 and 周, 强 and 陈, 祖舜
中文信息学报

An improved Chinese word segmentation system with conditional random field
Zhao, H. and Huang, C. N. and Li, M.
Proceedings of the Fifth {SIGHAN} Workshop on Chinese Language Processing

Discriminative pruning of language models for Chinese word segmentation
Li, J. and Wang, H. and Ren, D. and Li, G.
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Contextual Dependencies in Unsupervised Word Segmentation
Goldwater, Sharon and Griffiths, Thomas L. and Johnson, Mark
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

2005

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Gao, Jianfeng and Li, Mu and Huang, Chang-Ning and Wu, Andi
Computational Linguistics

A conditional random field word segmenter for sighan bakeoff 2005
Tseng, H. and Chang, P. and Andrew, G. and Jurafsky, D. and Manning, C.
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing

Perceptron Learning for Chinese Word Segmentation
Li, Y. and Miao, C. and Bontcheva, K. and Cunningham, H.
Proceedings of Fourth {SIGHAN} Workshop on Chinese Language processing {(Sighan-05)}

The second international chinese word segmentation bakeoff
Emerson, Thomas
Proceedings of the Fourth {SIGHAN} Workshop on Chinese Language Processing

A Statistic Study of Three-character Unknown Words in Chinese
Duan, {ZWXZH}
Journal of Chinese Language and Computing

2004

Chinese Segmentation and New Word Detection using Conditional Random Fields
Peng, Fuchun and Feng, Fangfang and {McCallum}, Andrew
Proceedings of Coling 2004

Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Ng, Hwee Tou and Low, Jin Kiat
Proceedings of {EMNLP} 2004

基于无指导学习策略的无词表条件下的汉语自动分词
孙, 茂松 and 肖, 明 and 邹, 嘉彦
计算机学报

Applying conditional random fields to Japanese morphological analysis
Kudo, T. and Yamamoto, K. and Matsumoto, Y.
Proc. of {EMNLP}

Adaptive Chinese word segmentation
Gao, J. and Wu, A. and Li, M. and Huang, C. N. and Li, H. and Xia, X. and Qin, H.
Proceedings of {ACL-2004}

Unsupervised segmentation of Chinese corpus using accessor variety
Feng, Haodi and Chen, Kang and Kit, Chunyu and Deng, Xiaotie
Natural Language Processing {IJCNLP} 2004

Accessor variety criteria for Chinese word extraction
Feng, Haodi and Chen, Kang and Deng, Xiaotie and Zheng, Weimin
Computational Linguistics

2003

{HHMM-based} Chinese lexical analyzer {ICTCLAS}
Zhang, H. P. and Yu, H. K. and Xiong, D. Y. and Liu, Q.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17

Chinese lexical analysis using hierarchical hidden markov model
Zhang, H. P. and Liu, Q. and Cheng, X. Q. and Zhang, H. and Yu, H. K.
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17

Chinese Word Segmentation as {LMR} Tagging
Xue, Nianwen and Shen, Libin
Proceedings of the second {SIGHAN} workshop on Chinese language processing-Volume 17

Chinese Word Segmentation as Character Tagging
Xue, Nianwen
Computational Linguistics and Chinese Language Processing

The first international Chinese word segmentation bakeoff
Sproat, R. and Emerson, T.
Proceedings of the second {SIGHAN} workshop on Chinese language processing

A maximum entropy Chinese character-based parser
Luo, X.

Improved source-channel models for Chinese word segmentation
Gao, J. and Li, M. and Huang, C. N.
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics

Chinese word segmentation using minimal linguistic knowledge
Chen, A.
Proceedings of the second {SIGHAN} workshop on Chinese language processing

Combining segmenter and chunker for Chinese word segmentation
Asahara, M. and Goh, C. L. and Wang, X. and Matsumoto, Y.
Proceedings of the 2nd {SIGHAN} Workshop on Chinese Language Processing

2002

Combining classifiers for Chinese word segmentation
Xue, Nianwen and Converse, Susan
Proceedings of the 1st {SIGHAN} Workshop on Chinese Language Processing

Corpus-based methods in Chinese morphology
Sproat, R. and Shih, C.
Tutorial at the 19th {COLING}

Corpus-based methods in Chinese morphology and phonology
Sproat, R. and Shih, C.
{COOLING} 2002

2001

汉语自动分词研究评述
孙, 茂松 and 邹, 嘉彦
当代语言学

Defining and automatically identifying words in Chinese
Xue, Nianwen

Self-supervised Chinese word segmentation
Peng, F. and Schuurmans, D.
Advances in Intelligent Data Analysis

2000

A compression-based algorithm for Chinese word segmentation
Teahan, W. J. and {McNab}, Rodger and Wen, Yingying and Witten, Ian H.
Comput. Linguist.

1999

Discovering Chinese words from unsegmented text (poster abstract)
Ge, X. and Pratt, W. and Smyth, P.
Proceedings of the 22nd annual international {ACM} {SIGIR} conference on Research and development in information retrieval

1998

串频统计和词形匹配相结合的汉语自动分词系统
刘, 挺 and 吴, 岩
中文信息学报

Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Sun, Maosong and Shen, Dayang and Tsou, Benjamin K
Proceedings of the 17th international conference on Computational linguistics-Volume 2

A hybrid approach to word segmentation
Kazakov, D. and Manandhar, S.
Lecture notes in computer science

1997

中文信息处理中的分词问题
黄, 昌宁
Applied Linguistics

An unsupervised iterative method for Chinese new lexicon extraction
Chang, J. S and Su, K. Y
International Journal of Computational Linguistics \& Chinese Language Processing

1996

A stochastic finite-state word-segmentation algorithm for Chinese
Sproat, R. and Gale, W. and Shih, C. and Chang, N.
Computational Linguistics

Useg: A retargetable word segmentation procedure for information retrieval
Ponte, J. M. and Croft, W. B.
Symposium on Document Analysis and Information Retrieval

1992

An efficient implementation of trie structures
Aoe, {Jun‐Ichi} and Morimoto, Katsushi and Sato, Takashi
Software: Practice and Experience

Word identification for Mandarin Chinese sentences
Chen, K. J and Liu, S. H
Proceedings of the 14th conference on Computational linguistics-Volume 1
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: