1 feb 2017

The data set

On January 5th 2017, the Obama administration relased their social media data.

Among these data were all of the tweets the former president made from the @POTUS account.

Dan Shiffman cleaned up the text format slightly for the Obamathon event at ITP - NYU. This cleaned data - found here - forms the basis for this project.

For the sentiment analysis, the AFINN-111 data set is used. It contains 2477 words which has been manually rated on a scale from -5 to 5.

Finding the sentiment of single word

Regular expressions are used to find the word in the AFINN data and return sentiment:

sentiment <- function(word) {
  lookup <- grep(paste0("^", word, "$"), afinn$word)
  if (length(lookup) == 0) {
    return(0)
  }
  afinn[lookup,]$score[1]
}

sapply(c("joy", "wrong", "dachshund"), sentiment)
##       joy     wrong dachshund 
##         3        -2         0

Inserting span for use with CSS

We want to use CSS for coloring words with non-zero sentiment:

insert_sentiment_span <- function(string, word, sentiment) {
  capture <- paste0("(", word, ")")
  replacement <-
    paste0('<span class="sentiment',
           as.character(sentiment),
           '">\\1</span>')
  gsub(capture, replacement, string, ignore.case = TRUE)
}

insert_sentiment_span("Happy birthday!", "happy", 3)
## [1] "<span class=\"sentiment3\">Happy</span> birthday!"

Putting it all together

This process is repeated to get a sentiment score for the tweet. It is then compared to the overall distribution of sentiments, and the average sentiment at the time of the tweet.

Sample output