Showing posts with label opts axes. Show all posts
Showing posts with label opts axes. Show all posts

Monday, June 4, 2012

Boxplot with Means

boxplots as alternative to barplot with error bars

  • a more informative alternative to this barplot with errorbars
  • using the movies example data set (part of the ggplot2 package)
  • cut the year variable in 15 equal intervals gives the new variable years
  • map this new year var to x, budget to y and colour also to years (aes(x=years,y=budget,colour=years))
  • add a layer boxplot, set the fill colour for the bars to black
  • format the y axis (use dollar formatting)
  • change the x axis tick labels with a text theme: rotate the tick labels (angle=270), position adjustment with hjust and vjust
  • we do not need a legend for he colours because the information is contained in the x axis, so get rid of it: opts(legend.position="none")
movies$years <- cut(movies$year,breaks=15)
ggplot(movies, aes(x=years,y=budget,colour=years)) +
  geom_boxplot(fill="black") +
  scale_y_continuous(labels=dollar) +
  stat_summary(fun.y="mean",geom="point") +
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
  opts(legend.position="none")
Warnmeldungen:
1: Removed 53573 rows containing non-finite values (stat_boxplot). 
2: Removed 53573 rows containing missing values (stat_summary).

Sunday, June 3, 2012

Barplot with Errorbars

I think boxplots are the better alternative
  • load the summarySE function
  • and cut the year variable to create reasonable categories
library(ggplot2)
data(movies)
source("helpers.r")
movies$years <- cut(movies$year,breaks=15)
table(movies$years)

(1893,1900] (1900,1908] (1908,1915] (1915,1923] (1923,1930] (1930,1938] 
         65         162         276         327         935        3093 
(1938,1945] (1945,1953] (1953,1960] (1960,1968] (1968,1975] (1975,1983] 
       3725        3357        4116        3702        5023        4554 
(1983,1990] (1990,1998] (1998,2005] 
       6775        8257       14421
  • extract the information
moviesSE <- summarySE(movies,measurevar="budget",groupvars="years",na.rm=T)
- a warning message because the one class without non-missing values in budget
Warnmeldung:
In qt(p, df, lower.tail, log.p) : NaNs wurden erzeugt
  • a look at the new created data frame containing the means, se, sd and ci of the budget variable according to the years classes
head(moviesSE)
        years   N    budget        sd       se        ci
1 (1893,1900]   0       NaN        NA       NA        NA
2 (1900,1908]   2   1125.00   1590.99  1125.00  14294.48
3 (1908,1915]  11  44023.73  63415.40 19120.46  42603.05
4 (1915,1923]  25 314215.76 298129.12 59625.82 123061.65
5 (1923,1930]  76 703895.24 761287.07 87325.62 173961.55
6 (1930,1938] 205 588688.42 553872.30 38684.12  76271.96
  • build the plot using this new data frame
p <- ggplot(moviesSE,aes(x=years,y=budget,colour=years)) 
p +
  geom_bar() +  ## add bars
  geom_errorbar(aes(ymin=budget-se,ymax=budget+se),width=0.5) + ## add errorbars
  opts(axis.text.x = theme_text(angle=315,hjust=0,vjust=1)) ## customize tick labels