Sunday, December 29, 2013

CitiBike share--what are the chances?

I have been working with Joe Jansen on the Citibike data in the R Language.  Citibike is New York's bike sharing program, which started in may and currently has more than 80,000 annual members.  The R Language is a freely available object oriented programming language designed originally for doing statistics at Bell Labs.

Joe has downloaded all the data and done an extensive analysis, which you can find here.  I did a simpler analysis predicting trips using a statistical regression model and graphed it using the function ggplot2 in R.  I found that maximum temperature, humidity, wind, and amount of sunshine to be significant factors in predicting the number of trips that will be taken on any given day.  While rain was not a significant factor, it is likely confounded with sunshine, so it is only not a factor after accounting for amount of sunshine.  Also, keep in mind that a number of days with rain, especially in the summer, are generally sunny days with an hour or two of rain or thunderstorms.  The day of the week, surprisingly, was not an important factor influencing number of trips.  The R-squared, which is a typical measure of predictive power and is on a scale from 0 to 100%, was more than 70%.

Here is a graph of the results that shows the predicted number of trips per 1,000 members versus the actual number of trips.  The day of the week is indicated by the color of the point.
I am an amateur with the function ggplot, and so the legend for day of the week has the days of the week in alphabetical order rahter than Monday , tuesday, etc.  Help on that and other aspects of ggplot for this graph would be welcome (please comment accordingly).

If day of the week made a difference, for any given point on the x-axis (predicted trips) you would have more of a certain color that is high on the y-axis than other colors.  For example, if more trips occurred on weekends, you would have more of the green colors (Saturday and Sunday) on top.  However, no such affect seems to exist.  I guess people are enjoying Citibike every day of the week, or casual riders on the weekends are roughly making up for weekday commuting riders.