#Paper Reading# SumView: A Web-based engine for summarizing product reviews and customer opinions
2017-03-04 00:05
435 查看
论文题目:SumView: A Web-based engine for summarizing product reviews and customer opinions
论文地址:http://www.sciencedirect.com/science/article/pii/S0957417412007865
论文发表于:Expert Systems with Applications 2012(CCF C类期刊,影响因子2.981)
论文大体内容:
本文主要使用了NMF来做文本摘要,实现了一个能够自动根据用户关心的Amazon商品的某些特点(比如prize,size,quality等),通过爬取商品评论,并生成对应的文本摘要的系统。
1、整体过程如下:
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/a9b96d9de56d4d39b8ec8af32a5a9106)
2、关注点推荐:
①过滤,使得D矩阵的terms都是名词或名词短语;
②先选20个D矩阵中tfidf值最大的terms;
③从20个terms中选出周围出现adj次数最多的top 5,这5个就是推荐给用户的常见词(size,prize等);
④用户也可以自己输入想了解的关键点,增强系统的可用性;
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/d77ff0d8414a8527171e47176c1c5b78)
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/34bcfe6df285d0e162430c7c433afb30)
3、摘要过程:
①K是用户选择的feature terms个数;
②初始化U的每一列选自在D矩阵中feature term出现次数最多的句子;
③初始化V^T=(U^T*U)^(-1)*U^T*A;
④NMF
⑤每个topic选V矩阵中值最大的那个句子,作为摘要;
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/c2ff709c85b2b21a0a92095cb54896d4)
4、摘要去重改进
基本思想就是选择第C个摘要句子的时候(每个主题选一个摘要句子),从未选择句子中选出在主题下(概率值(即V矩阵的值)-句子与已选择的C-1个摘要句子的相似度平均值*λ)的最大值,通过λ控制相似度权重。(但下面的实验中并没有用上)
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/6ee197f0bd862a1f7fcaf79fdf1e214d)
5、效果展示
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/7e5f370ab64f3e67a67a57eb5c9a2672)
6、效果对比
①DUC2005数据集上
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/7a2ab7ddcd6042221f0d03498773f744)
②DUC2006数据集上
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/b0f3b99082573ad819b77e67fb4933a6)
③人工评分
![](https://oscdn.geek-share.com/Uploads/Images/Content/202012/14/a30970a14ca5bdc43d8074ea8d1c3ee2)
7、思考
感觉这种方式比较简单,效果也不是很理想,总体的motivation是做出一个现爬现生成摘要的系统。
以上均为个人见解,因本人水平有限,如发现有所错漏,敬请指出,谢谢!
论文地址:http://www.sciencedirect.com/science/article/pii/S0957417412007865
论文发表于:Expert Systems with Applications 2012(CCF C类期刊,影响因子2.981)
论文大体内容:
本文主要使用了NMF来做文本摘要,实现了一个能够自动根据用户关心的Amazon商品的某些特点(比如prize,size,quality等),通过爬取商品评论,并生成对应的文本摘要的系统。
1、整体过程如下:
2、关注点推荐:
①过滤,使得D矩阵的terms都是名词或名词短语;
②先选20个D矩阵中tfidf值最大的terms;
③从20个terms中选出周围出现adj次数最多的top 5,这5个就是推荐给用户的常见词(size,prize等);
④用户也可以自己输入想了解的关键点,增强系统的可用性;
3、摘要过程:
①K是用户选择的feature terms个数;
②初始化U的每一列选自在D矩阵中feature term出现次数最多的句子;
③初始化V^T=(U^T*U)^(-1)*U^T*A;
④NMF
⑤每个topic选V矩阵中值最大的那个句子,作为摘要;
4、摘要去重改进
基本思想就是选择第C个摘要句子的时候(每个主题选一个摘要句子),从未选择句子中选出在主题下(概率值(即V矩阵的值)-句子与已选择的C-1个摘要句子的相似度平均值*λ)的最大值,通过λ控制相似度权重。(但下面的实验中并没有用上)
5、效果展示
6、效果对比
①DUC2005数据集上
②DUC2006数据集上
③人工评分
7、思考
感觉这种方式比较简单,效果也不是很理想,总体的motivation是做出一个现爬现生成摘要的系统。
以上均为个人见解,因本人水平有限,如发现有所错漏,敬请指出,谢谢!
相关文章推荐
- Addressing Complex and Subjective Product-Related Queries with Customer Reviews-www2016-20160505
- ACM Web Search and Data Mining (WSDM) Call For Paper
- Reading "Case Study of Customer Input For a Successful Product"
- The Design of Sites: Patterns, Principles, and Processes for Crafting a Customer-Centered Web Experience
- #Paper Reading# Online Knowledge-Based Model for Big Data Topic Extraction
- The Design of Sites: Patterns, Principles, and Processes for Crafting a Customer-Centered Web Experi
- No result defined for action org.strutsOne.web.SumAction and result
- Mining and Summarizing Customer Reviews (Hu 2004)
- #Paper Reading# Joint Matrix Factorization and Manifold-Ranking for Topic-Focused Multi-Document Sum
- #Paper Reading# SummaRuNNer: A RNN based Sequence Model for Extractive Summarization of Documents
- #Paper Reading# Lifelong Machine Learning for Topic Modeling and Beyond
- The Design of Sites: Patterns, Principles, and Processes for Crafting a Customer-Centered Web Experi
- Mining and summarizing customer reviews论文总结
- the Semantic Web (English and Chinese version) , the bible paper
- PHP Oracle Web Development: Data processing, Security, Caching, XML, Web Services, and Ajax (Paperback) Aug.2007.eBook-BBL
- Paper摘记:Bagging and Boosting for the Nearest Mean Classifier:
- Position Paper For the Workshop on Web of Services for Enterprise Computing
- SUSE Linux Toolbox: 1000+ Commands for openSUSE and SUSE, Linux Enterprise (Paperback) Dec.2007.eBook-BBL
- OSWorkflow: A guide for Java developers and architects to integrating, open-source Business Process Management (Paperback)