The
Sentiment Analysis
Symposium was a great experience for me ! Back in Paris, I first
thought of updating my last post on Opinion Mining and Sentiment Analysis. But
the update grew heavier and heavier, so here's a enhanced one.
Context
For more than a decade now, researchers from Text and Data Analytics,
Computer Science, Computational Linguistics and Natural Language Processing,
among others, have been working on technologies that could lead to analyze how
people feel or what people think about something. In the current period, a
great amount of commercial offers have been built on what is still to be taken
as a Research Program. Here are some basic clues to get an idea of how this
kind of content analysis technologies work.
One of the major issues dealing with huge amounts of User-Generated Content
published online – also referred to as UGC – implies mining opinions, which
means detecting their polarity, knowing the target(s) they aim at and what
arguments they rely on. Opinion Mining/Sentiment Analysis tools are, simply
put, derived from Information Extraction (such as Named Entities detection) and
Natural Language Processing technologies (such as syntactic parsing). Given
this, simply put, they work like an enhanced search engine with complex data
calculation habilities and knowledge bases.
Applications with pieces of linguistics inside
Four types of applications are put forth in (Pang
& Lee, 2008)’s reference survey :
- those seeking for customer insight, in movie or product reviews websites or
in social networks ;
- the specific integrations within CRM (Customer Relationship Management) or
e-commerce systems ;
- the strategic foresight and e-reputation applications ;
- and last but not least, political discourse analysis.
Automated textual summaries also stands as a very promising subtask, as it
is currenlty deeply linked to data visualization for information
summarization.
Among the numerous problematics related to Opinion Mining and Sentiment
Analysis systems adressed in (Pang
& Lee, 2008)’s, I would pinpoint two of particular interest from a
linguistic point of view :
- linguistic – e.g. syntactic properties and negation modelization – and
statistic – e.g. the type/token distribution within large amounts of texts -
features as an important issue for systems improvement ;
- current processes for adapting Linguistic Resources- such as lexicons or
dictionaries – to various domains as an impediment to cost-cutting and
reusability.
Not as easy as it seems
Indeed, the Social Media industry expresses a growing interest and need
towards NLP technologies to overcome issues such as accuracy, robustness and
multilinguism. Sentiment Analysis & Opinion Mining became a promising
business field a couple of years ago, as a very well documented post by
Doug Henschen for
Information
Week explains.
But quick recipes are easily found on the web, as shown by a glance on
Quora’s « How
does Sentiment Analysis Work ? » thread. Also, a manichean way of
viewing things, which implies an
insuperable dichotomy between ''Linguistic Resources'' and ''Machine
Learning'', is well-spread in the industry right now. As Neil Glassman writes on the latest Sentiment Analysis
Symposium’s insights, he puts forth that there
is a way
« Between those on one side who feel the accuracy of automated
sentiment analysis is sufficient and those on the other side who feel we can
only rely on human analysis », adding that « most in the field concur with
/the idea that/ we Need to define a methodology where the software and the
analyst collaborate to get over the noise and deliver accurate analysis.»
So the word is spread !
Putting forth the benefits of Textometry
Textometry is one of the major steps towards the new methodologies to
achieve such a goal. Simply put, it is a branch of statistical study of
linguistic data, where text is considered as possessing its own internal
structure. Textometric methods and tools lead to bypass the information
extraction step (qualitative coding), by :
- applying statistical and probabilistic calculations to the units that make
up comparable texts in a corpus ;
- providing robust methods for processing data without external resources
constraints (lexicons, dictionaries, ontologies, for example) ;
- analyzing objects distribution within the corpus framework ;
- improving the flow of building corpus-driven Linguistic Resources that can
be projected on the data and incrementally enhanced for various purposes, such
as Named Entity Recognition and paraphrase matching, resources for deep
thematic analysis, and resources for opinion analysis.
Kurt Williams,
Mindshare Technologies CTO, accurately wraps it up
as follows :
« Using Textometry to leverage opinion analysis. It can be used to
cluster authors who share similar opinions together. One approach for improving
opinion mining, rather than starting with the individual leveling phrases,
start with the context of the conversation first. In other words, many
approaches often skip the step of analyzing the context of the text. »
Please find out more in
the following presentation displayed at the Sentiment Analysis
Symposium.
So this must be what sets up a hot topic : an emerging market,
industrial R&D and academics chasing for better solutions and improved
systems, and a pluridisciplinary field of interest !
Post scriptum
Special thanks to Seth Grimes who
chaired the Sentiment Analysis
Symposium and Neil
Glassman who nicely quoted
me in his post.
Post Update Just to let you know that Seth Grimes nicely provides videos of the SAS'11 Talks and
Lighting Talks. You can find my
french-accent speech here 
Derniers commentaires