Today, to describe the characteristics of popular R packages, I try to add the information of CRAN Task Views to the ranking data which I have often used in the previous posts. At first I will get CRAN Task Views data and process the one from the package per view style to the view per package style.
Data Handling
Get the CRAN Task Views data (ctv)
library(ctv)
library(plyr)
views <- lapply(available.views(repos="http://cran.rstudio.com/"),function(x)x$name)
viewsDF <- ldply(views, function(x){
data.frame(
view=x,
package=unlist(ctv:::.get_pkgs_from_ctv_or_repos(x,repos="http://cran.rstudio.com/")),
stringsAsFactors=FALSE)
})
viewsDF <- ddply(viewsDF, .(package), summarise,
views=paste(collapse=", ", view))
head(viewsDF)
## package views
## 1 abc Bayesian
## 2 abind Multivariate
## 3 abn gR
## 4 acepack SocialSciences
## 5 acs WebTechnologies
## 6 actuar Distributions, Finance
Join the ctv data and the ranking
Next join he data above and the ranking data.
packageRanking2013 <- read.csv("http://dl.dropboxusercontent.com/u/956851/RStudio_CRAN_data.csv", as.is=TRUE, encoding="UTF-8")
packageRanking2013 <- merge(packageRanking2013, viewsDF, all.x=TRUE)
packageRanking2013 <- packageRanking2013[order(packageRanking2013$count, decreasing=TRUE),]
rownames(packageRanking2013) <- seq_len(nrow(packageRanking2013))
packageRanking2013 <- cbind(rank=rownames(packageRanking2013),
packageRanking2013)
results
The top100 ranking is as follows. I'm surprised that six of the top10 aren't registered to CRAN Task Views. Look closely, other than proto package are all made by Hadley Wickham. Moreover they will be imported when the top of the ranking, ggplot2, is installed. It seems that ggplot2 pulls up the other related packages.
To expand to the top20, you can see basic packages rank high. They are related to the useful data structures (zoo), graphics(ggplot2, lattice and other graphics packages) and the glue to other languages(rJave, Rcpp). Because they are common in all data handling, it is agreeable that they are popular.
library(rCharts)
dt1 <- dTable(packageRanking2013[1:100,],
sScrollX="600px", sScrollY="400px")
dt1$show("iframesrc", cdn=TRUE)
Great post! I'm also working on a tool analyzing RStudio logs among other related data, which will be released hopefully in the middle of the next week at blog.rapporter.net
ReplyDeleteHi!
ReplyDeleteAs for me, I have used installr package to get the RStudio logs and to do some data handling.
I'm looking forward to your new tool and post!