There is a R package called as “sentiment” by Timothy Jurka. The package has 2 best functions so far which are as follows-
classify_emotion
Classify_emotion classifies the information into following terms –
- anger,
- disgust,
- fear,
- joy,
- sadness, and
- surprise
This is done by two main algorithms –
- naive Bayes classifier (which is trained on Carlo Strapparava and Alessandro Valitutti’s emotions lexicon)
- the other one is just a simple voter procedure.
classify_polarity
In contrast to the classification of emotions, the classify_polarity function allows us to classify some text as positive or negative. In this case, the classification can be done by using a naive Bayes algorithm trained on Janyce Wiebe’s subjectivity lexicon; or by a simple voter algorithm.
Important Note:
The R package “sentiment” depends on Duncan’s Temple Rstem package that is only available at Omegahat
At the time of this writing, I’m using the version 0.4-1 (I downloaded and installed the tar.gz file from the package website).
Example with tweets talking about “starbucks”
Step 1: Load the necessary packages
# required pakacgeslibrary(twitteR)library(sentiment)library(plyr)library(ggplot2)library(wordcloud)library(RColorBrewer)
Step 2: Let’s collect some tweets containing the term “starbucks”
# harvest some tweetssome_tweets = searchTwitter("starbucks", n=1500, lang="en")# get the textsome_txt = sapply(some_tweets, function(x) x$getText())
Step 3: Prepare the text for sentiment analysis
# remove retweet entitiessome_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)# remove at peoplesome_txt = gsub("@\\w+", "", some_txt)# remove punctuationsome_txt = gsub("[[:punct:]]", "", some_txt)# remove numberssome_txt = gsub("[[:digit:]]", "", some_txt)# remove html linkssome_txt = gsub("http\\w+", "", some_txt)# remove unnecessary spacessome_txt = gsub("[ \t]{2,}", "", some_txt)some_txt = gsub("^\\s+|\\s+$", "", some_txt)# define "tolower error handling" function try.error = function(x){ # create missing value y = NA # tryCatch error try_error = tryCatch(tolower(x), error=function(e) e) # if not an error if (!inherits(try_error, "error")) y = tolower(x) # result return(y)}# lower case using try.error with sapply some_txt = sapply(some_txt, try.error)# remove NAs in some_txtsome_txt = some_txt[!is.na(some_txt)]names(some_txt) = NULL
Step 4: Perform Sentiment Analysis
# classify emotionclass_emo = classify_emotion(some_txt, algorithm="bayes", prior=1.0)# get emotion best fitemotion = class_emo[,7]# substitute NA's by "unknown"emotion[is.na(emotion)] = "unknown"# classify polarityclass_pol = classify_polarity(some_txt, algorithm="bayes")# get polarity best fitpolarity = class_pol[,4]
Step 5: Create data frame with the results and obtain some general statistics
# data frame with resultssent_df = data.frame(text=some_txt, emotion=emotion,polarity=polarity, stringsAsFactors=FALSE)# sort data framesent_df = within(sent_df, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE))))
This what the first 15 rows of sent_df would look like

Step 6: Let’s do some plots of the obtained results
# plot distribution of emotionsggplot(sent_df, aes(x=emotion)) +geom_bar(aes(y=..count.., fill=emotion)) +scale_fill_brewer(palette="Dark2") +labs(x="emotion categories", y="number of tweets") +opts(title = "Sentiment Analysis of Tweets about Starbucks\n(classification by emotion)", plot.title = theme_text(size=12))
# plot distribution of polarityggplot(sent_df, aes(x=polarity)) +geom_bar(aes(y=..count.., fill=polarity)) +scale_fill_brewer(palette="RdGy") +labs(x="polarity categories", y="number of tweets") +opts(title = "Sentiment Analysis of Tweets about Starbucks\n(classification by polarity)", plot.title = theme_text(size=12))
Step 7: Separate the text by emotions and visualize the words with a comparison cloud
# separating text by emotionemos = levels(factor(sent_df$emotion))nemo = length(emos)emo.docs = rep("", nemo)for (i in 1:nemo){ tmp = some_txt[emotion == emos[i]] emo.docs[i] = paste(tmp, collapse=" ")}# remove stopwordsemo.docs = removeWords(emo.docs, stopwords("english"))# create corpuscorpus = Corpus(VectorSource(emo.docs))tdm = TermDocumentMatrix(corpus)tdm = as.matrix(tdm)colnames(tdm) = emos# comparison word cloudcomparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"), scale = c(3,.5), random.order = FALSE, title.size = 1.5)
No comments:
Post a Comment