Archive

Posts Tagged ‘Twitter’

Presidents in Twitter

January 5, 2012 2 comments

I saw the release of a new version of twitteR package a few weeks back and thought I should be testing the code I wrote some time ago but also do something interesting at the same time. Thus I came up with the idea of checking out how Presidents are doing in twitter.

Not many Presidents are on twitter yet so my sample is fairly small. Here is my list:

  • @dilmabr (Brazil)
  • @CFKArgentina (Argentina –  I think you could have guessed it from the nick name, anyway)
  • @JuanManSantos (Colombia)
  • @chavezcandanga (Venezuela)
  • @sebastianpinera (Chile)
  • @BARACKOBAMA (USA)
  • @Number10gov (UK)

My idea was just to plot the number of followers by president but also the size of the point will be their number of tweets (statuses).

It is interesting to see the above, I’d have put David Cameron in second place not Hugo Chavez and trying to answer the “why” of this brings up a lot more questions like:

  • How much time Presidents have been in twitter?
  • How They use it?
  • How people in each country use Twitter?
  • population matters?

Anyway, for those of you interested in the code here it is:

library(twitteR)
library(ggplot2)
base <- NULL
lkuplist <- c('dilmabr','CFKArgentina','JuanManSantos','chavezcandanga','sebastianpinera',
              'BARACKOBAMA','Number10gov')

for(users in lkuplist){
  user <- getUser(users)
  userName <- screenName(user)
  followers <- followersCount(user)
  friends <- friendsCount(user)
  statuses <- statusesCount(user)
  # if the merged dataframe does not exist then create it
  if(!exists("base")){

    base <- as.data.frame(cbind(user=userName,followers=followers,friends=friends,statuses=statuses)
                          ,stringsAsFactors=F)
  }
  # if the merged dataframe exists, append new file to it
  if(exists("base")){
    temp <- as.data.frame(cbind(user=userName,followers=followers,friends=friends,statuses=statuses)
                          ,stringsAsFactors=F)
    base <- rbind(base,temp)
    rm(temp)
  }

}
# convert char to numbers
base <- transform(base, followers = as.numeric(followers),
                friends = as.numeric(friends),
                statuses = as.numeric(statuses))
# reorder variables
base  <- transform(base,user = reorder(user, followers))

p <- ggplot(base,aes(x=user,y=followers,color=statuses)) + geom_point(aes(size=statuses))
p <- p +scale_color_gradient() + coord_flip()
p <- p+scale_y_continuous(formatter = "comma",breaks=c(1000000,2500000,5000000,10000000))
p
Categories: analytics Tags: , ,

Twitter Analysis using R

February 6, 2011 16 comments

One of the things that normally slow me down when I am learning a new tool outside working hours is the lack of an interesting project/task that forces me to investigate more about the sort of activities required to finish the project successfully.

On the other hand, I am always in the quest of the latest data visualizations techniques, trends etc then I thought what kind of thing I can do that not only will improve my R programming skills but also will help with data visualization? Well, the answer is: Twitter

Twitter holds so much data that the range of analyses anyone can do  is unlimited ( or limited by his/her skills). Fortunately R has a package called TwitteR  that let’s pull out this data effortlessly. The functionality of the package goes from scanning the public timeline, looking at the timeline of a specific user and looking at the followers and friends of a specific user. There are some issues with protected accounts and some constrains in terms of the number of requests made per hour so it was impossible to hammer the server properly.

The aim of this analysis is pretty basic: pull out data from some users along with their followers and friends and then draw some graphics using different packages and hopefully uncover interesting things.

Due to the constrains I mentioned before I had to select a random sample (honestly it was random) of 5 users from our #rstats community. The victims are as follows:

Since the package return data  only for a given user at time and since R functions return only the latest object created (or at least that is my current understanding)  I had to write 2 functions:

  • TwitterCrawler.r: require one parameter, the twitter username, the aim of this function is to pull out friends and followers for a given user with their basic information: # of tweets, # of followers, # of friends.
  • TwitterUserStats.r: the same as the above but will only pull  the user itself.

My first attempt was to build a single function to do all the above and I did it but couldn’t return 2 data frames at the same time  and that’s why I built 2 functions instead.

Please note that current figures in Twitter may not match this sample as I pulled  this data out a few weeks ago.

For the sake of this post I’ll show the main R code where I call the functions and I’ll give you the link  to my functions further down (if I get my head around Github).

Main program as follow:
# Required functions

source(“C:\\Users\\Altons\\Documents\\R\\Rfunctions\\TwitterCrawler.r”)

source(“C:\\Users\\Altons\\Documents\\R\\Rfunctions\\TwitterUserStats.r”)

TwitterUserDB <- NULL

NetworkDB <- NULL
TwitterList <- data.frame( user=c(“Rarchive” ,”sachaepskamp” ,”peterflom” ,”Altons” ,”CMastication” ))

for(i in 1:nrow(TwitterList)) {

CrawlerDB <- TwitterCrawler(toString(TwitterList[i,1]))

UserDB <- TwitterUserStats(toString(TwitterList[i,1]))

TwitterUserDB <- data.frame(rbind(TwitterUserDB,UserDB))

NetworkDB <- data.frame(rbind(NetworkDB,CrawlerDB))

}

So as you can see our main data frames of analysis are TwitterUserDB and NetworkDB. The firt one holds information about our victims whilst the second one about their networks.

Coming from the SAS world I do like to see the resulting data frame (data set in SAS) once it’s been created so I find quite handy the function fix() to check my data frames.

> fix(TwitterUserDB)

Pretty basic, isn’t it? But very handy to check out whether or not the data looks right.

TwitterUserDB has the ScreenName, number of friends, number of followers and the number of tweets for every user.

Let’s do the same for NetworkDB,

>fix(NetworkDB)

NetworkDB (click on image to see a larger version) contains information about followers & friends for each user. With this information we could know how active a user’s network is for example, or analysis the correlation between tweets & followers.

I had an issue with the TwitterBirth variable (date when user signed up in twitter) because the way function returned values was text I couldn’t figure out how to convert it to a date. I hope one you guys shed me some light on this matter.

Now let’s move on to the analysis, here I experimented with 2 graphical packages: the standard that comes along with the main installation (don’t know the name) and ggplot2.

So let’s do some bar charts:

My first try was the barplot function along with the png to output my graph to an external chart. Data frame of analysis is TwitterUserDB.

#Analize TwitterUserDB

Tweets <-TwitterUserDB$TweetCount

Tweets.legend <- TwitterUserDB$ScreenName
png(paste(Graphout,”barchart1.png”))

barplot(Tweets ,names.arg=Tweets.legend ,cex.names=0.9 ,ylab=”# of Tweets” ,ylim = c(0, 22000))

dev.off()

Well, not bad but let’s add some colors. Here I found the function colors() for displaying all colors’ names which is good but better if I could actually see the colors.

#with colors

mycolors <- c(“skyblue2″,”red3″,”tomato4″,”orange3″,”yellow3″)

png(paste(Graphout,”barchart2.png”))

barplot(Tweets ,names.arg=Tweets.legend ,cex.names=0.9 ,ylab=”# of Tweets” ,ylim = c(0, 22000) ,col=mycolors)

dev.off()

Yeah, a bit better and I guess I could do the same for the other variables but these univariate graphics are not providing too much insight apart from a simple count.

Before continuing with this analysis I have to say that these comparisons are quite unfair since  users in my sample did not start at the same point in time, so bear this in mind. Perhaps if I had fixed the TwitterBirth variable I’d have compared apples with apples.

Having said that let’s crack on! The next graph combine the 3 main variables: Friends, Followers and TweetCount in order to have a “full” picture of our users.

library(‘ggplot2’)

png(paste(Graphout,”3dchart.png”))

p <- ggplot(TwitterUserDB, aes(Friends,Followers ,size=TweetCount, label=ScreenName))

p <- p+geom_point(colour=”blue”)+scale_area(to=c(1,20))+geom_text(size=3)
p + xlab(“# Followers”) + ylab(“# Friends”)

dev.off()

Well that’s interesting, a simple idea could be that the more you tweet then the more followers you will get, obviously this is a simplistic thought but is interesting to see that Rarchive has pretty much the same number of tweets than  peterflom and lot more than CMastication  but he is not way near to them in term of followers nor friends. In fact, Rarchive is closer to my  Altons and sachaepskamp who are light users in term of tweets.

Could we say that in order to get a lot of followers you have to tweet quite often but also the quality of your tweets must be good as well. Well, this could true for simple mortals but may not for presidents, politicians and artists.

Anyway, the idea of this is just get used to R as a tool for analytics rather to uncover hidden patterns in the Twitter Rstats community so I’d better post this for the time being as life has its way of getting in the way of plans.

Hope you enjoy the post and i’ll come back with new post about NetworkDB as soon as I can.

Regards,

Alberto

 

 

Categories: analytics Tags: , , ,
%d bloggers like this: