SIGIR 2016 Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval
2016-10-30 04:44
387 查看
中文简介:本文对如何基于Paragraph Vector model改进Ad-hoc Retrieval task进行了分析,主要针对IR的场景提出了对PV model的三方面的改进。实验表明,改进后的模型进行检索的效果超过了基于topic model增强的LM的效果。
论文出处:SIGIR'16
英文摘要:Incorporating topic level estimation into language models has been shown to be beneficial for information retrieval(IR) models such as cluster-based retrieval and LDA-based document representation. Neural embedding models, such as paragraph vector (PV) models, on the other hand have shown their eeffectiveness and efficiency in learning semantic representations of documents and words in multiple Natural Language Processing (NLP) tasks. However, their effectiveness in information retrieval is mostly unknown. In this paper, we study how to effectively use the PV model to improve ad-hoc retrieval. We propose three major improvements over the original PV model to adapt it for the IR scenario: (1) we use a document frequency-based rather than the corpus frequency-based negative sampling strategy so that the importance of frequent words will not be sup-pressed excessively; (2) we introduce regularization over the document representation to prevent the model overtting short documents along with the learning iterations; and (3) we employ a joint learning objective which considers both the document-word and word-context associations to produce better word probability estimation. By incorporating this enhanced PV model into the language modeling frame-work, we show that it can significantly outperform the state-of-the-art topic enhanced language models
下载链接:https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1227
论文出处:SIGIR'16
英文摘要:Incorporating topic level estimation into language models has been shown to be beneficial for information retrieval(IR) models such as cluster-based retrieval and LDA-based document representation. Neural embedding models, such as paragraph vector (PV) models, on the other hand have shown their eeffectiveness and efficiency in learning semantic representations of documents and words in multiple Natural Language Processing (NLP) tasks. However, their effectiveness in information retrieval is mostly unknown. In this paper, we study how to effectively use the PV model to improve ad-hoc retrieval. We propose three major improvements over the original PV model to adapt it for the IR scenario: (1) we use a document frequency-based rather than the corpus frequency-based negative sampling strategy so that the importance of frequent words will not be sup-pressed excessively; (2) we introduce regularization over the document representation to prevent the model overtting short documents along with the learning iterations; and (3) we employ a joint learning objective which considers both the document-word and word-context associations to produce better word probability estimation. By incorporating this enhanced PV model into the language modeling frame-work, we show that it can significantly outperform the state-of-the-art topic enhanced language models
下载链接:https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1227
相关文章推荐
- ICTIR 2016 Analysis of the Paragraph Vector Model for Information Retrieval
- Joint Head Pose / Soft Label Estimation for Human Recognition In-The-Wild [2016]
- Aspect-Oriented Programming with the e Verification Language: A Pragmatic Guide for Testbench Develo
- 存档: Automation for the people: Improving code with Eclipse plugins
- Similarity of texts: The Vector Space Model with Python
- Automation for the people: Improving code with Eclipse plugins
- It is time for me to continue the contract with company
- To set the div position(used for DataGrid Imeplate Column with DropDownList item)
- A Simple Compiler for the Common Language Runtime
- ASP.NET 2.0 Language Swithcer and Theme Swicher
- ASP.NET 2.0 Language Swithcer and Theme Swicher 多语言转换和多样式主题转换
- Programming Language Popularity: The TCP Index for December, 2004
- ASP.NET操作EXCEL时出现的错误 Retrieving the COM class factory for component with CLSID
- 看了Code Conventions for the JavaTM Programming Language后自己总结一下,提醒自己
- ASP.NET操作EXCEL时出现的错误 Retrieving the COM class factory for component with CLSID(转)
- Designing ActiveX Components with the MFC Document/View Model
- The C# Programming Language for Java Developers
- Compiling Your Application with the Microsoft Layer for Unicode
- TheServerSide:Liferay Portal 4.0 released with new fine-grained security model
- ad for dollars----WHAT IS MESOTHELIOMA?