April 18, 2014
TokyoR #extra 17 April 2014
On 17 April 2014, I attended the Tokyo R user group meeting (TokyoR). It was an extra meeting and the first time to welcome a guest from a foreign country. Welcome to Japan, Mr.Gesmann! It was hold at the NIFTY Corporation's office in Shinjuku Tokyo. The session started at 20:00 with three presentations and two lightning talks. All presentation are given in English.
1. Interactive charts with googleVis in R
The first presentation was given by Markus Gesmann, who is the author of googleVis package. When it comes to interactive charts of R, googleVis is very famous package along with rCharts. He presented the abstract and the design of the package. He also introduced new features(It was released few days ago!) I have used the packages so much time that his talk was impressive.
2. Visualization of Supervised Learning with {arules} & {arulesViz}
Takashi J Ozaki talked about the data visualization for association rules. He clearly showed the examples with the packages, arules and aruleVis. At the end of the presentation, he pointed out the pros and cons about using these packages. They were very impressive to me. Please check out his presentation.
3. Trading volume mapping in recent R environment
@teramonagi gave a talk about trading volume of bitcoins. He had a good command of the recent R packages, dplyr, magrittr, pings, rMaps and so on. In this talk, he referred to the convenience of F# language when he explained the origin of %>% in magrittr package. I hope he would give an insightful talk about F# language in data science.
Lighting talks After three presentations, two lightning talks were given. In only 5 minutes, all presenters talked about impressive contents. I list only titles below. For more details, please check out their blogs and slides.
Salmon Visualization!!! Shota Yasui (@housecat442)
How do you keep our motivation? Daisuke Ichikawa (-> me )
Next Tokyo R
Next meeting is scheduled for 19 April (Tomorrow! Don't miss it!). Thanks to NIFTY Corporation for hosting the event.
April 5, 2014
TokyoR #37 29 March 2014
The session started at 14:30 with five presentations and seven lightning talks, followed by drinks.
Presentations
This meeting was presented in Japanese. So, I named English titles for them.
1. Learn R in 10 minutes
2. Introduction to data mining with R
The first presentation was given by Nobuaki Ooshiro from Yahoo Japan. Second @Prunus1350, who presides the meetup "Pattern recognition for beginners", introduced the textbook for R, "Learning data mining with R". These first two presentations are very useful introductions to R.
3. Feature selection with R packages
@srctaha talked about feature selection, which is selecting subsets
of variables that together have good predictive power. First, He introduced the methods of feature selection, for example, Wrappers, Filters and Embeded. Second, he presented the useful R packages for feature selection (seven packages!). Because his explanation were full of both theories and simulations, I could smoothly comprehend the elements of feature selection.
4. Discrete choice models with R
Hiroki Sano (@sanoche16), who works as a data scientist in a consulting firm, gave a talk about discrete choice models. When you works in marketing research, you can not avoid discrete choice models. He taught the mechanism of discrete choice models and how to apply it with mlogit package.
Shota Yasui (@housecat442) showed the example of decision making analysis. He took home hunting as the example. He used XML package for acquiring house rent data and estimated the value of house with regression model, then compared it for real data. His strategy was simple but powerful. Like him, I hope that much more people make use of various data for their own decision making.
Lighting talks
After five presentations, seven lightning talks were given. In only 5 minutes, all presenters talked about impressive contents.
I just list titles below. For more details, please check out slides.
- Quantile regression with R @aich8
- Introduction to private decision analysis -> me
- A little thought of Kernel and SVM @tetsuroito
- How to handle missing data @Hiro_macchan
- R coding standard @soultoru
- Introduction to SIR model @yoshi_fit
- Interactive visualization with rCharts
Next meeting is scheduled for 19 April.
Thanks to NIFTY Corporation for hosting the event.
April 4, 2014
Vine from R
Have you ever use Vine? This is the web service sharing movies(short clips), which is restricted to only 6 minutes. There are 60 million clips uploaded at the site.
For Vine there is an unofficial web API. Why unofficial? The reason is that tVine does not release the API. So, you need to keep in mind that this API may change without notice.
Now, today, I will try this API from R.There are three steps here.
Step 1. Register vine Step 2. Check the API reference Step 3. Use from R
Step 1. Register vine
First, you need to download the application for your mobile, iOS or Android, and get your accounts. It seems that a part of APIs can be used without registration.
Step 2. Check the API reference
Second, let's move on the API reference site and check the document. It seems that Vine has at least 10 APIs.
Step 3. Use the API from R
At last, we are ready to get data from Vine through the API. Here I show a simple case. I try to get popular clips. In the next post, I will introduce other APIs and my library for the API.
library(httr)
library(jsonlite)
USERNAME <- "ichikawadaisuke@gmail.com"
PASSWORD <- "moomin322"
POST("https:/api.vineapp.com/users/authenticate",
body= list(username="ichikawadaisuke@gmail.com",
password="moomin322"))
Get popular20's URL
library(httr)
library(jsonlite)
req <- GET("https://api.vineapp.com/timelines/popular")
res <- content(req, as="text")
parsed <- fromJSON(res)
# popular20's URL
parsed$data$records$permalinkUrl
## [1] "https://vine.co/v/MiAlnJALiYL" "https://vine.co/v/MiAZ6wO5HMV"
## [3] "https://vine.co/v/MiA7ePtrmDU" "https://vine.co/v/MibnqQeVmYU"
## [5] "https://vine.co/v/Miqrw79lbMi" "https://vine.co/v/MiqmmMhlKAT"
## [7] "https://vine.co/v/MiKLzrvjiiI" "https://vine.co/v/MiKE9YBp2pQ"
## [9] "https://vine.co/v/MiADHM3XvjE" "https://vine.co/v/MiAWO9xPgOh"
## [11] "https://vine.co/v/MiA6IpwBqi5" "https://vine.co/v/MiK3MU5VzFE"
## [13] "https://vine.co/v/MiA5Q9Y0uQX" "https://vine.co/v/MiAIEYbOL7r"
## [15] "https://vine.co/v/Miq6uKYQzD7" "https://vine.co/v/MiAmWPWQxD9"
## [17] "https://vine.co/v/MiKYAY07xM6" "https://vine.co/v/MiAWpjg20Vv"
## [19] "https://vine.co/v/MiAz2FAWq99" "https://vine.co/v/MiKVx3Kr1t0"
April 1, 2014
How do you check the task completion in R?
When you run your analysis in R, how do you spent your time? Check the progress bar, or do another thing? I have divided that into three cases. The time which the analysis need is very short, short and long. At the same time, I selected R packages and functions which are suitable for each case.
Case 1. Very short
This case includes less than three minutes, which you can bare to behold the display. I think txtProgressBar function is suitable for this case. With this function, you can visulize the progress.
pb <- txtProgressBar()
for (i in 1:10) {
Sys.sleep(0.5)
setTxtProgressBar(pb, i/10)
}
2014-04-02 add
The function tkProgressBar of tcltk package would be also helpful. Please check it out the example of tkProgressBar. (Thanks to Mr. Hayashi!)Case 2. Short
In this case, you have to endure from 5 to 15 minutes to check the results of the analysis. I am afraid that you would do another task in parallel. So the notice is needed. You can choose two types of the notice, pop-up or sound. To make a pop-up message in R, you can use tcltk package as follows.
library(tcltk)
Sys.sleep(5)
tkmessageBox(message = "Finished!", icon = "info", type = "ok")
Also you can use a sound notice with pingr package. ping() function of this package alarms you with 9 sounds. Other than built-in sounds, You can jingle any wave file.
# This package is not uploaded to CRAN
devtools::install_github("rasmusab/pingr")
library(pingr)
ping(5)
Case 3. Long
The final case is that it takes a long time to complete the task. In such a case, you would be away from your display. Thus you will recieve the notice with your mobile device, for example, iPhone, Androids and tablet PCs. There are two ways, E-mail and push notifications from the application for mobile phones.
You can send E-mail from R with mailR package (It requires Java). If you use gmail, the code is like this.
library(mailR)
sender <- "sender@gmail.com"
recipients <- c("recipients@gmail.com")
# You need to get your application specific password(see below)
# https://support.google.com/accounts/answer/185833
email <- send.mail(from = sender,
to = recipients,
subject="Subject of the email",
body = "Body of the email",
smtp = list(host.name = "smtp.gmail.com",
port = 465,
user.name = "yourname@gmail.com",
passwd = "your application specific password", ssl = TRUE),
authenticate = TRUE,
send = TRUE)
If you want to send a HTML mail, you can use EasyHTMLReport package.
A push notice is a convenient way as well. With pushoverr package and pushoverAPI, you can receive push notifications when the task is completed. Other than R, pushoverAPI can be joined with various components, such as IFTTT, Github, etc (check here for other services).
After the registration and acquiring your USER KEY and your APP TOKEN, you can send a notice as follows.
library(pushoverr)
yourkey <- "XXXXXX"
yourtoken <- "XXXXXX"
pushover(message = "Mission complete!!!", user = yourkey, token = yourtoken)
To summarise, R offers you various notice ways for your aims. If you use other ways, please let me know the one.
March 31, 2014
Why don't you use %>% ?
One of features of dplyr package, which is well known as very useful one for manipulation, is the operator %.%. It is called chain operator and chains any operation in R as follows.
library(dplyr)
iris %.% group_by(Species) %.% summarise(avg = mean(Sepal.Width))
## Source: local data frame [3 x 2]
##
## Species avg
## 1 setosa 3.428
## 2 versicolor 2.770
## 3 virginica 2.974
%.% works like pipe operator in UNIX. %.% is simple but powerful. In case you have to make temporary objects, you don't need.
If you like F# style, you can use %>% operator with magrittr package. In the next version of dplyr, it is announced that %.% will be deprecated and replaced with %>%. Please check out below in detail.
https://groups.google.com/forum/#!msg/manipulatr/4EtIPVR3qEw/Xx4Vec7O0CQJ
library(magrittr)
iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width))
## Source: local data frame [3 x 2]
##
## Species avg
## 1 setosa 3.428
## 2 versicolor 2.770
## 3 virginica 2.974
March 30, 2014
Try circlize package
Today I try to circlize package. This package enables us to make Circos plot. Circos plot is well known for describing transitions of population. Also in Japan, the plot is used in the special TV program of the last election (below).
http://dl.dropboxusercontent.com/u/956851/test000002.jpg
circulize package has a good vignette, so I highly recommend you read that.
At this post I try just one example. In future articles, I will describe how to plot in more detail.
library(circlize)
par(mar = c(1, 1, 1, 1))
factors = letters[1:8]
circos.par(points.overflow.warning = FALSE)
# initialize
circos.initialize(factors = factors, xlim = c(0, 10))
circos.trackPlotRegion(factors = factors, ylim = c(0, 1), bg.col = "grey",
bg.border = NA, track.height = 0.05)
# linking between elements
circos.link("a", 5, "c", 5)
circos.link("b", 5, "d", c(4, 6))
circos.link("a", c(2, 3), "f", c(4, 6))