type
status
date
slug
summary
tags
category
icon
password
org
INTRODUCTION
背景
商品评价繁杂
两个任务
- To find product features that have been commented on by reviewers
- To decide whether the comments are positive or negative
本文关注task2 ,we want to oaccurately identify th esemanic orientations of opinions expressed on each prduct feature by each reviewer. 判断积极、消极、中立。
【13】方法:
使用feature附近的opinion words判断对某个特征的态度,通过positive和nagative词语的数量比较简单判断。
缺点:
- 针对不同语境,词语情感导向不同,专家提供知识库不可取
提出方法
- 运用其他语句、评论的外部信息、证据,推断opinion words的情感导向,不需要前置知识和任何输入
- 针对同一句话中相反情感的词语,提出新模型:considering the distance between each opinion word and the product feature. This turns out to be highly effective.
测试方法
测试数据
- bench mark review data set used in [13, 28] 5个产品的大量评论
- 3个产品的新的数据集
RELATED WORK
两个研究方向
sentiment classification 情感分类:特定文本的情感分类——文档级、语句级
feature-based opinion mining 基于特征的语义挖掘
>更细
e.g.,“the voice quality of this phone is great and so is the reception,but the battery life is short.”“voice quality” , “reception” and“battery life” are features. The opinion on “voice quality”,“reception” are positive, and the opinion on “battery life” isnegative.
两种基于单词短语分类的方法
- Corpus-based approaches:co-occurrence patterns of words 【10,32,34】
- Dictionary-based approaches: approaches use synonyms(同义词) and antonyms(反义词) in WordNet to determine word sentiments based on a set of seed opinion words. 【1,8,13,17】
lexicon-based method(基于词库)方法
【13】提出,also used in【17】,improved in 【28】by a more sophisticated method based on relaxation labeling(松弛标记法)
A domain specific system
【37】分析电影评论
The extraction of comparative sentences and relations
【14】比较关系抽取
相似研究
Holistic lexicon-based approach整体的基于词典的方法
识别domain opinion words 【11,16】 use conjunction rules(连接规则):两个词使用and连接,情感倾向相同
“this room is beautiful and spacious”, both “beautiful” and “spacious”are positive opinion words
区别:虽然也使用了语言规则或习惯,但是仍有不同
- 同一领域的相同词汇也可能表达不同的情感,要同时关注特征和opinion word
For example, in the following review sentences in thecamera domain,“the battery life is very long” and “it takes a longtime to focus”, “long”is positive in the first sentence, but negative in the second.
- More flexible:不需要前置训练知识,可以online做决策
PROBLEM DEFINITION
Object: An object O is an entity which can be a product, person, event, organization, or topic. It is associated with a pair, O:(T, A), where T is a hierarchy or taxonomy of components (or parts), subcomponents, and soon, and A is aset of attributes of O. Each component has its own set of subcomponents and attributes.
T:组件、子组件,A:事物属性的集合
This content is only supported in a Feishu Docs
可以构成树结构,根节点为object,非根节点为component and subcomponent,节点都有对应的属性。
显式特征、隐式特征:句子中提到了的特征为显式特征,句子中没提到的但能推断出来的特征为隐式特征
显式特征:“The battery life of this camera is too short”隐式特征:“This camera is too large”、
Opinion passage on a feature:可能一段表达一个特征的观点,也可能一句话有很多特征的观点
显式意见、隐式意见(opinion):直接表达和推断的区别
Explicit opinion:“The picture quality of this camera is amazing"Implicit opinion:“The earphone broke in two days"
Opinion holder:观点持有者
“John expressed hisdisagreement on the treaty”
Semantic orientation of an opinion
Model definition
$$F=\{f_1,f_2,\dots,f_n\}$$ 特征集合
$$W=\{w_1,w_2,\dots,w_n\}$$相关n个特征的同义词集合的集合
每个opinion holder j 的评论在一个子集 $$S_j\subseteq F$$
For each feature $$f_k\in S_j$$ that opinion holder j comments on, he/she chooses aword or phrase from $$W_k$$ to describe the feature, and then expresses a positive, negative or neutral opinion on it.
研究内容
input:reviews $$D$$
$$F and W$$未知
- 鉴别提取物品特征
- 判断特征的情感倾向
- 将特征的同义词分组
$$F $$已知$$W$$未知
任务3变成使用给定的特征集合去分组
$$F and W$$已知
只用完成任务2
具体内容
手机公司分析用户对一些机型的评价,任务3已有关心的特征集合以及针对不同特征的同义词集合。
输出:对于每一个text$$d\in D$$有一个pair$$(f, SO)$$特征及其情感或意见导向
THE PROPOSED TECHNIQUE
关键难点
- 如何结合多元观点词语做最终判断
- 如何解决领域词问题
- 如何解决语言结构对观点词寓意倾向的改变问题
Opinion Words, Phrase and Idioms
使用opinion lexicon (Each set is usuallyobtained through a bootstrapping process [13] using the WordNet.)
除了原本的形容词副词,还添加了名词和动词 —— lists of context dependent opinion words
POS(part-of-speech) tagging——词性
idioms:收集了1000多条习语
Non-opinion phrases containing opinion words
"pretty large"——pretty
Aggregating Opinions for a Feature
This content is only supported in a Feishu Docs
$$score(f)=\sum_{w_j:w_i\in s\wedge w_i\in V}{\frac{W_I.SO}{dis(w_j,f)}}$$
一些特征本身就是一个opinion word,$$score(f)$$就直接取决于该特征的情感
“This camera is very reliable"
否定规则:否定词和短语与句子中表达的观点相反
否定词:“no”
否定模式:stop doing quit doing
否定规则
NN->P 没问题NP->N 不好NNeutral->N 不起作用
包含否定词的非否定词
Not just
转折从句规则
找不到情感倾向就在but前找,在做否定即可
不是but从句但是包含but词
Not only but also
Handling Context Dependent Opinions
整体分析法
句内连词规则(Intra-sentence conjunctiong rule)
The battery life is very longThis camera takes great pictures and has a long battery life 运用该规则推断long是positive 因为没有but
伪句内连词规则(Pseudo intra-sentence conjunction rule)
The battery life is very longThe camera has a long battery life, which is great 没有and连接
句间连词规则(Inter-sentence conjunction rule)
The picture quality is amazing. However, the battery life is short
Majority view
同义词反义词规则
EMPIRICAL EVALUATION
实验数据
实验结果
- Author:Wenxuan Wang
- URL:http://preview.tangly1024.com/article/206d0968-b245-8050-b4c8-c7a3395688b5
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!