Hey everyone...

Its time we appreciate the power of R.

In this post we will try to fetch the tweets of two famous personalities, teams etc in Twitter and try to analyze who is famous between two with respect to some common comparison parameters.

Oh one thing, I hope you already have done the handshake with Twitter using your credentials. If not, then please refer to my post "Second Step" to do that.

OK let's proceed now...

********************* Code In R to Accomplish The Mission ********************

#Get Tweets for a searchTerm

TweetFrame<- function (searchTerm,maxTweets)

{

twtlist<-searchTwitter(searchTerm,maxTweets,cainfo="cacert.pem")

return(do.call("rbind",lapply(twtlist,as.data.frame)))

} #End of Function TweetFrame

#Function to do a popularity check

popularityCheck<-function(name1,name2,count)

{

name1DF<-TweetFrame(name1,count)

name2DF<-TweetFrame(name2,count)

sortname1<-name1DF[order(as.integer(name1DF$created)),]

sortname2<-name2DF[order(as.integer(name2DF$created)),]

eventdelays1<-as.integer(diff(sortname1$created))

eventdelays2<-as.integer(diff(sortname2$created))

meanof1<-mean(eventdelays1)

sumval1<-sum(eventdelays1<=round(meanof1,1)) #here val of sumval1 becomes the

#common ground of comparison

res1<-poisson.test(sumval1,count)$conf.int

meanof2<-mean(eventdelays2)

sumval2<-sum(eventdelays2<=sumval1) #hence sumval1 is used to compare.

res2<-poisson.test(sumval2,count)$conf.int

p1<-as.single(sumval1/count)

p2<-as.single(sumval2/count)

l1=as.single(res1[1])

l2=as.single(res2[1])

u1=as.single(res1[2])

u2=as.single(res2[2])

barplot2(c(p1, p2), ci.l = c(l1,l2), ci.u = c(u1,u2), plot.ci=TRUE,

names.arg=c(name1,name2))

} #End of Function popularityCheck

******************************** END**********************************

Well in the past few days people were so much engrossed in FIFA 2014 that there was flood of posts and tweets. So why not conduct a popularity check on FIFA teams.

Team 1: Argentina (#argentina)

Team 2: Germany (#germany)

Number of Tweets extracted from Twitter: 500 (for each team)

The next thing which i am going to do is build a word cloud in R using Tweets. And I feel fun doing it.. Till my next post, as I say, Happy Learning...

Its time we appreciate the power of R.

In this post we will try to fetch the tweets of two famous personalities, teams etc in Twitter and try to analyze who is famous between two with respect to some common comparison parameters.

Oh one thing, I hope you already have done the handshake with Twitter using your credentials. If not, then please refer to my post "Second Step" to do that.

OK let's proceed now...

********************* Code In R to Accomplish The Mission ********************

#Get Tweets for a searchTerm

TweetFrame<- function (searchTerm,maxTweets)

{

twtlist<-searchTwitter(searchTerm,maxTweets,cainfo="cacert.pem")

return(do.call("rbind",lapply(twtlist,as.data.frame)))

} #End of Function TweetFrame

#Function to do a popularity check

popularityCheck<-function(name1,name2,count)

{

name1DF<-TweetFrame(name1,count)

name2DF<-TweetFrame(name2,count)

sortname1<-name1DF[order(as.integer(name1DF$created)),]

sortname2<-name2DF[order(as.integer(name2DF$created)),]

eventdelays1<-as.integer(diff(sortname1$created))

eventdelays2<-as.integer(diff(sortname2$created))

meanof1<-mean(eventdelays1)

sumval1<-sum(eventdelays1<=round(meanof1,1)) #here val of sumval1 becomes the

#common ground of comparison

res1<-poisson.test(sumval1,count)$conf.int

meanof2<-mean(eventdelays2)

sumval2<-sum(eventdelays2<=sumval1) #hence sumval1 is used to compare.

res2<-poisson.test(sumval2,count)$conf.int

p1<-as.single(sumval1/count)

p2<-as.single(sumval2/count)

l1=as.single(res1[1])

l2=as.single(res2[1])

u1=as.single(res1[2])

u2=as.single(res2[2])

barplot2(c(p1, p2), ci.l = c(l1,l2), ci.u = c(u1,u2), plot.ci=TRUE,

names.arg=c(name1,name2))

} #End of Function popularityCheck

******************************** END**********************************

**At first, let's see what this code will do and then we will see how it did that...**

Well in the past few days people were so much engrossed in FIFA 2014 that there was flood of posts and tweets. So why not conduct a popularity check on FIFA teams.

**Input**Team 1: Argentina (#argentina)

Team 2: Germany (#germany)

Number of Tweets extracted from Twitter: 500 (for each team)

`>popularityCheck("#argentina","#germany",500)`

**Output**
So, this is what we got. This plot clearly shows that on some comparison basis Argentina is more popular than Germany.

Now let us understand how we got this..

At first look at the

**TweetFrame(searchTerm,maxTweets)**function. This function takes a "searchTerm" say #germany and 500 tweets as "maxTweets" in input and return 500 tweets in a list form. Hence the result is stored in a variable twtList. Now the content of twtList is very haphazard. To give it a proper tabular format we convert the list into a Dataframe and return it.
Now let us look at the function

**popularityCheck(name1,name2,count**) which is of more concern. name1 and name2 are the two search terms and count is no.of tweets we need to extract.
If we look at the

**first two**lines of the function it takes these terms and prepares two separate lists.
The next two lines sort the respective lists in order of arrival times of tweets. The latest tweet is kept first and so on.

The next two lines prepares two lists say

**eventdelays1**and**eventdelays2**which keep the difference of arrival times..
Next we compute mean of

**eventdelays1**named as**meanof1**and count the number of tweets that comes within the mean value...**This becomes the ground for comparison and we find the number of tweets for next search term that came within meanof1.**The count of tweets satisfying condition of meanof1 is kept in**sumval1**and**sumval2.**
The next two lines compute the probabilities of tweets coming within meanof1. The values are stored in p1 and p2.

**l1 and u1 is the range which says that 95% of the tweets out of 500 with a desired mean of 'meanof1' lies in between this range. This goes for l2 and u2 as well.**
And the godfather line is executed which is barplot(...) [

*please select the package gplots from package window and if it is not there then you can install writing "install.packages("gplots")*]with arguments shown above which are self explanatory. This command plots the graph and an instance is shown above. The plot shows that people are tweeting more about Argentina and comparatively less for Germany. Well this is it....
Well I hope it was of some worth spending your time... Please feel free to make any suggestions.

The next thing which i am going to do is build a word cloud in R using Tweets. And I feel fun doing it.. Till my next post, as I say, Happy Learning...

## No comments:

## Post a Comment