Lazy loaded image
课程资料
A Holistic Lexicon-Based Approach to Opinion Mining
Words 1597Read Time 4 min
2025-6-2
2025-6-10
type
status
date
slug
summary
tags
category
icon
password
org

INTRODUCTION

背景

商品评价繁杂

两个任务

  1. To find product features that have been commented on by reviewers
  1. To decide whether the comments are positive or negative
本文关注task2 ,we want to oaccurately identify th esemanic orientations of opinions expressed on each prduct feature by each reviewer. 判断积极、消极、中立。

【13】方法:

使用feature附近的opinion words判断对某个特征的态度,通过positive和nagative词语的数量比较简单判断。
缺点:
  1. 针对不同语境,词语情感导向不同,专家提供知识库不可取

    提出方法

    1. 运用其他语句、评论的外部信息、证据,推断opinion words的情感导向,不需要前置知识和任何输入
    1. 针对同一句话中相反情感的词语,提出新模型:considering the distance between each opinion word and the product feature. This turns out to be highly effective.

    测试方法

    测试数据
    1. bench mark review data set used in [13, 28] 5个产品的大量评论
    1. 3个产品的新的数据集

    RELATED WORK

    两个研究方向

    sentiment classification 情感分类:特定文本的情感分类——文档级、语句级
    feature-based opinion mining 基于特征的语义挖掘
    >更细
    e.g.,“the voice quality of this phone is great and so is the reception,but the battery life is short.”
    “voice quality” , “reception” and“battery life” are features. The opinion on “voice quality”,“reception” are positive, and the opinion on “battery life” isnegative.

    两种基于单词短语分类的方法

    1. Corpus-based approaches:co-occurrence patterns of words 【10,32,34】
    notion image
    1. Dictionary-based approaches: approaches use synonyms(同义词) and antonyms(反义词) in WordNet to determine word sentiments based on a set of seed opinion words. 【1,8,13,17】

    lexicon-based method(基于词库)方法

    【13】提出,also used in【17】,improved in 【28】by a more sophisticated method based on relaxation labeling(松弛标记法

    A domain specific system

    【37】分析电影评论

    The extraction of comparative sentences and relations

    【14】比较关系抽取

    相似研究

    Holistic lexicon-based approach整体的基于词典的方法
    识别domain opinion words 【11,16】 use conjunction rules(连接规则):两个词使用and连接,情感倾向相同
    “this room is beautiful and spacious”, both “beautiful” and “spacious”are positive opinion words
    区别:虽然也使用了语言规则或习惯,但是仍有不同
    1. 同一领域的相同词汇也可能表达不同的情感,要同时关注特征和opinion word
      1. For example, in the following review sentences in thecamera domain,“the battery life is very long” and “it takes a longtime to focus”, “long”is positive in the first sentence, but negative in the second.
    1. More flexible:不需要前置训练知识,可以online做决策

    PROBLEM DEFINITION

    Object: An object O is an entity which can be a product, person, event, organization, or topic. It is associated with a pair, O:(T, A), where T is a hierarchy or taxonomy of components (or parts), subcomponents, and soon, and A is aset of attributes of O. Each component has its own set of subcomponents and attributes.
    T:组件、子组件,A:事物属性的集合
    This content is only supported in a Feishu Docs
    可以构成树结构,根节点为object,非根节点为component and subcomponent,节点都有对应的属性。
    显式特征、隐式特征:句子中提到了的特征为显式特征,句子中没提到的但能推断出来的特征为隐式特征
    显式特征:“The battery life of this camera is too short”
    隐式特征:“This camera is too large”、
    Opinion passage on a feature:可能一段表达一个特征的观点,也可能一句话有很多特征的观点
    显式意见、隐式意见(opinion):直接表达和推断的区别
    Explicit opinion:“The picture quality of this camera is amazing"
    Implicit opinion:“The earphone broke in two days"
    Opinion holder:观点持有者
    “John expressed hisdisagreement on the treaty”
    Semantic orientation of an opinion

    Model definition

    $$F=\{f_1,f_2,\dots,f_n\}$$ 特征集合
    $$W=\{w_1,w_2,\dots,w_n\}$$相关n个特征的同义词集合的集合
    每个opinion holder j 的评论在一个子集 $$S_j\subseteq F$$
    For each feature $$f_k\in S_j$$ that opinion holder j comments on, he/she chooses aword or phrase from $$W_k$$ to describe the feature, and then expresses a positive, negative or neutral opinion on it.

    研究内容

    input:reviews $$D$$

    $$F and W$$未知

    1. 鉴别提取物品特征
    1. 判断特征的情感倾向
    1. 将特征的同义词分组

    $$F $$已知$$W$$未知

    任务3变成使用给定的特征集合去分组

    $$F and W$$已知

    只用完成任务2

    具体内容

    手机公司分析用户对一些机型的评价,任务3已有关心的特征集合以及针对不同特征的同义词集合。
    输出:对于每一个text$$d\in D$$有一个pair$$(f, SO)$$特征及其情感或意见导向

    THE PROPOSED TECHNIQUE

    关键难点

    1. 如何结合多元观点词语做最终判断
    1. 如何解决领域词问题
    1. 如何解决语言结构对观点词寓意倾向的改变问题

    Opinion Words, Phrase and Idioms

    使用opinion lexicon (Each set is usuallyobtained through a bootstrapping process [13] using the WordNet.)
    除了原本的形容词副词,还添加了名词和动词 —— lists of context dependent opinion words
    POS(part-of-speech) tagging——词性
    idioms:收集了1000多条习语
    Non-opinion phrases containing opinion words
    "pretty large"——pretty

    Aggregating Opinions for a Feature

    This content is only supported in a Feishu Docs
    $$score(f)=\sum_{w_j:w_i\in s\wedge w_i\in V}{\frac{W_I.SO}{dis(w_j,f)}}$$
    一些特征本身就是一个opinion word,$$score(f)$$就直接取决于该特征的情感
    “This camera is very reliable"
    否定规则:否定词和短语与句子中表达的观点相反
    否定词:“no”
    否定模式:stop doing quit doing
    否定规则
    NN->P 没问题
    NP->N 不好
    NNeutral->N 不起作用
    包含否定词的非否定词
    Not just
    转折从句规则
    找不到情感倾向就在but前找,在做否定即可
    不是but从句但是包含but词
    Not only but also

    Handling Context Dependent Opinions

    整体分析法

    句内连词规则(Intra-sentence conjunctiong rule)

    The battery life is very long
    This camera takes great pictures and has a long battery life 运用该规则推断long是positive 因为没有but

    伪句内连词规则(Pseudo intra-sentence conjunction rule)

    The battery life is very long
    The camera has a long battery life, which is great 没有and连接

    句间连词规则(Inter-sentence conjunction rule)

    The picture quality is amazing. However, the battery life is short
    Majority view
    同义词反义词规则
    notion image

    EMPIRICAL EVALUATION

    实验数据
    notion image
    实验结果
    notion image
    上一篇
    网络文本情感计算
    下一篇
    ACL2025 SLM