March 22, 2014

The characteristics of popular R packages

Today, to describe the characteristics of popular R packages, I try to add the information of CRAN Task Views to the ranking data which I have often used in the previous posts. At first I will get CRAN Task Views data and process the one from the package per view style to the view per package style.

Data Handling

Get the CRAN Task Views data (ctv)

library(ctv)
library(plyr)
views <- lapply(available.views(repos="http://cran.rstudio.com/"),function(x)x$name)
viewsDF <- ldply(views, function(x){
  data.frame(
    view=x,
    package=unlist(ctv:::.get_pkgs_from_ctv_or_repos(x,repos="http://cran.rstudio.com/")),
    stringsAsFactors=FALSE)
  })
viewsDF <- ddply(viewsDF, .(package), summarise, 
                 views=paste(collapse=", ", view))
head(viewsDF)
##   package                  views
## 1     abc               Bayesian
## 2   abind           Multivariate
## 3     abn                     gR
## 4 acepack         SocialSciences
## 5     acs        WebTechnologies
## 6  actuar Distributions, Finance

Join the ctv data and the ranking

Next join he data above and the ranking data.

packageRanking2013 <- read.csv("http://dl.dropboxusercontent.com/u/956851/RStudio_CRAN_data.csv", as.is=TRUE, encoding="UTF-8")
packageRanking2013 <- merge(packageRanking2013, viewsDF, all.x=TRUE)
packageRanking2013 <- packageRanking2013[order(packageRanking2013$count, decreasing=TRUE),]
rownames(packageRanking2013) <- seq_len(nrow(packageRanking2013))
packageRanking2013 <- cbind(rank=rownames(packageRanking2013),
                            packageRanking2013)

results

The top100 ranking is as follows. I'm surprised that six of the top10 aren't registered to CRAN Task Views. Look closely, other than proto package are all made by Hadley Wickham. Moreover they will be imported when the top of the ranking, ggplot2, is installed. It seems that ggplot2 pulls up the other related packages.

To expand to the top20, you can see basic packages rank high. They are related to the useful data structures (zoo), graphics(ggplot2, lattice and other graphics packages) and the glue to other languages(rJave, Rcpp). Because they are common in all data handling, it is agreeable that they are popular.

library(rCharts)
dt1 <- dTable(packageRanking2013[1:100,],
        sScrollX="600px", sScrollY="400px")
dt1$show("iframesrc", cdn=TRUE)

2 comments:

  1. Great post! I'm also working on a tool analyzing RStudio logs among other related data, which will be released hopefully in the middle of the next week at blog.rapporter.net

    ReplyDelete
  2. Hi!
    As for me, I have used installr package to get the RStudio logs and to do some data handling.
    I'm looking forward to your new tool and post!

    ReplyDelete