A 46.8% positive introduction to Sentiment Analysis

Few Natural Language Processing (NLP) tools are touted as “blatantly useful” as often as sentiment analysis. Companies want to know why their products are bad, and they want to know it automatically.


Some of the first automatic sentiment classification experiments used film reviews as data. You have the actual text, and you have the rating the person that wrote the review gave the film. You find that there is a correlation. Pang et al reports accuracies around 80% for the task of classifying reviews as positive (2.5 stars or more out of 5) or negative (less than 2.5 stars out of 5).

The most basic technique they used comes down to giving each word in the dictionary a score (negative if the word is “negative”, and positive if the word is “positive”). You add all the scores of all the words in a review. If it adds up to a positive number you guess the review is favourable.

To make this technique work better you can also make a list of all pairs of words and give the pairs scores (For example: “not” can have a score of -0.3, and “excellent” can have a score of 2.0, but “not excellent” can have a score of -3.3). In practice, for the film review data, using this trick increases the accuracy from 78.7% to 80.6%. (72.8% to 82.7% if you use a SVM as your classifier)

Another early task, called opinion classification, was classifying text as subjective or objective. An early dataset for this task consisted of film reviews that were all assumed to be subjective, and film synopses taken from imdb that are assumed to be objective. The basic logistic regression classifier described above gets around 99% for this task. I found this problem for this dataset to be uninteresting.


Much of the subsequent work in sentiment classification involves more sophisticated ways of distinguishing a “good” labelling from a “as good as Big Rigs: Over the Road Racing” labelling. These include adding syntactical information such as part of speech tags or actually building parse trees before classification.

Knowing whether a text is positive or negative can only get you so far. You might also be interested in whether the speaker is angry, or sarcastic (which is very difficult, even for humans). You also want a more fine grained classification. Take the opinion piece:

“The apple was shiny and pretty, but it left a bitter taste in my mouth”. *

You don’t want the analyser to just tell you “+0.1 positive”. You want: “if object is apple and aspect is appearance then score=+2.0, if object is apple and aspect is taste then score=-1.5”. Various degrees of unsatisfactory results are presented for these tasks.


Everyone knows that the best place to get raw text is from social media. This is why we have:

If you want to know more, go read a few papers.

This post was put through uClassify, an online sentiment analyser and it was found to be:


negative: 53.2

positive: 46.8


happy: 53.9

upset: 46.1


corporate: 93.0

personal: 7.0

* Also see


  1. 1
    Katie Paine on Thursday 24 May, 10:01 AM #

    What you neglect to warn people about is ensuring the integrity of the data. Even if you get sentiment 90% correct, you are assuing that the data is valid. However, we find in our research that about 30% of social media data is from spam bots, content farms and/or has no human component at all,

    • 2
      Dirko on Tuesday 29 May, 10:00 AM #

      That is good point Katie.

      In the experiments Barent and I did, however, almost all the tweets were valid. We looked at tweets containing the search term “democracy”. But I can think that for different terms and problems one would get vastly different results.

      Spam detection and relevancy detection are whole research fields on their own, and as you point out they should actually be closely tied to sentiment analysis and always part of a social media sentiment analyzer (maybe a subject of future research)

Leave a comment

Leave a Reply