Archive

Archive for January, 2012

Suicide vs Divorce rates by country using ggplot

January 10, 2012 4 comments

I was looking for data I could use with the geom_text() object in ggplot2 and came across this data from the World Health Organization about the suicide rates by country which I found very handy for my example.

I used the scale_colour_gradient2() with 3 colors, red, gray and black but it only picked up gray and black and still don’t know why. 😦

Anyway, here it is the graph:

The number of suicides is for every 100.000 people and number of divorces for every 1000 people. (I know, I should have added this to the graph)

The size  and color of each country is the ratio of suicide women for every 10 suicide men (ratio_f_m if that make sense!?). So China has the same number of suicides between women and men folllowed by Kuwait and South Korea. The coeff of correlation was .02867 and excluding Maldives 0.5507.

As you notice Maldives is far away from the cloud and I bet $10 dollars that the key drivers behind that are the sun, beaches and really small bikinis and not population size 😉

With a little bit of photoshop the above graph will look like this one

code and data can be found in github

UPDATE:

Thanks to Louis, I’ve manage to show the three colors properly. Code also updated accordingly.

See comments for further details.

library(XLConnect)
library(ggplot2)

wb <- loadWorkbook('divorce_vs_suicide.xlsx')
df <- wb['Sheet1']
df$Col6 <- NULL
df$Col7 <- NULL

p <- ggplot(na.omit(df), aes(x=divorce,y=suicide,label=country))
p <- p+geom_text(aes(colour=ratio_f_m,size=ratio_f_m))+ scale_colour_gradient2(low='red',mid="gray", high="black", midpoint=mean(range(na.omit(df$ratio_f_m))))
p <- p+scale_size(to=c(3,5))+theme_bw()
p <- p+opts(panel.grid.major=theme_blank(),panel.grid.minor=theme_blank())
p

Presidents in Twitter

January 5, 2012 2 comments

I saw the release of a new version of twitteR package a few weeks back and thought I should be testing the code I wrote some time ago but also do something interesting at the same time. Thus I came up with the idea of checking out how Presidents are doing in twitter.

Not many Presidents are on twitter yet so my sample is fairly small. Here is my list:

  • @dilmabr (Brazil)
  • @CFKArgentina (Argentina –  I think you could have guessed it from the nick name, anyway)
  • @JuanManSantos (Colombia)
  • @chavezcandanga (Venezuela)
  • @sebastianpinera (Chile)
  • @BARACKOBAMA (USA)
  • @Number10gov (UK)

My idea was just to plot the number of followers by president but also the size of the point will be their number of tweets (statuses).

It is interesting to see the above, I’d have put David Cameron in second place not Hugo Chavez and trying to answer the “why” of this brings up a lot more questions like:

  • How much time Presidents have been in twitter?
  • How They use it?
  • How people in each country use Twitter?
  • population matters?

Anyway, for those of you interested in the code here it is:

library(twitteR)
library(ggplot2)
base <- NULL
lkuplist <- c('dilmabr','CFKArgentina','JuanManSantos','chavezcandanga','sebastianpinera',
              'BARACKOBAMA','Number10gov')

for(users in lkuplist){
  user <- getUser(users)
  userName <- screenName(user)
  followers <- followersCount(user)
  friends <- friendsCount(user)
  statuses <- statusesCount(user)
  # if the merged dataframe does not exist then create it
  if(!exists("base")){

    base <- as.data.frame(cbind(user=userName,followers=followers,friends=friends,statuses=statuses)
                          ,stringsAsFactors=F)
  }
  # if the merged dataframe exists, append new file to it
  if(exists("base")){
    temp <- as.data.frame(cbind(user=userName,followers=followers,friends=friends,statuses=statuses)
                          ,stringsAsFactors=F)
    base <- rbind(base,temp)
    rm(temp)
  }

}
# convert char to numbers
base <- transform(base, followers = as.numeric(followers),
                friends = as.numeric(friends),
                statuses = as.numeric(statuses))
# reorder variables
base  <- transform(base,user = reorder(user, followers))

p <- ggplot(base,aes(x=user,y=followers,color=statuses)) + geom_point(aes(size=statuses))
p <- p +scale_color_gradient() + coord_flip()
p <- p+scale_y_continuous(formatter = "comma",breaks=c(1000000,2500000,5000000,10000000))
p
Categories: analytics Tags: , ,
%d bloggers like this: