Showing posts with label ggplot2. Show all posts
Showing posts with label ggplot2. Show all posts

Wednesday, August 8, 2012

rebuild cdc obesity maps

you find a short script here

or you can download the images

Monday, June 4, 2012

another map style

ggmap

  • just another (more fancy) map style
  • for the preliminaries look here
map <- get_cloudmademap(bbox=c(left=12.25,bottom=51.285,
                        right=12.5,top=51.47),
                        maptype = 58461,
                        api_key = api_key,
                        zoom=13)
ggmap(map) + geom_point(aes(x=lon,y=lat),colour="red",data=gc) 

plot addresses as points on map using ggmap

ggmap

  • given addresses
  • find long and lat
  • get a map of the city
  • plot point at location of the addresses
library(ggmap)
library(stringr)
mystreets <- read.table("streets.txt",header=F,sep="|")
head(mystreets) ## addresses of physicians in Leipzig
                          V1
1  Rosa-Luxemburg-Str. 20-30
2            Marktstraße 2-6
3       Engelsdorfer Str. 21
4               Jahnallee 59
5            Johannisplatz 1
6         Hugo-Krone-Platz 9
## adds city and country to get complete addresses
ads <- paste(mystreets$V1, ', Leipzig, Germany',sep='')

## get rid of space at begin and end of the strings
ads <- str_trim(ads)


## get long and lat from google so you 
## need to have access to the internet
gc <- geocode(ads)

## get data of leipzig (limits of the map)
leipzig <- geocode("leipzig",output="more")


## get map from cloudmademap, also possible googlemaps etc
## for cloudemademap you need additionaly a api key
map <- get_cloudmademap(bbox=c(left=leipzig$west,
                          bottom=leipzig$south,
                          right=leipzig$east,
                          top=leipzig$north),
                        maptype = 997,api_key = api_key,zoom=13)

## plot all together
ggmap(map) + geom_point(aes(x=lon,y=lat),colour="red",data=gc)  

axis tick labels with comma as decimal point (german style)

german commas - commas as decimal point

  • how to format the axis tick label, as example we use formatting numbers in german style (comma as dec)
  • first we have to write a little function using the format function from the base package (which makes it very easy)
## the formatter function
germ_commas <- function(x,...){
  format(x, dec="," , trim = TRUE, scientific = FALSE, ...)
}

## create example data
x <- y <- rnorm(10)
z <- data.frame(x=x,y=y)

## create the plot using the function above inside the scales
ggplot(z,aes(x=x,y=y)) +
  geom_point() +
  scale_x_continuous(labels=germ_commas,breaks=seq(-2.5,2.5,by=0.25)) +
  scale_y_continuous(labels=germ_commas,breaks=seq(-2.5,2.5,by=0.25)) +
  opts(axis.text.x = theme_text(size=13,angle=90)) +
  opts(axis.text.y = theme_text(size=13))

Transformations axis vs values

log transformation: axis vs values

  • sometimes it is useful to transform the values
  • in ggplot you have the choice between transforming the values or the axis
  • here is an example to compare both
  • the data set movie is part of the ggplot package, additional used are the gridExtra package (grid.arrange), the scales package (labels formatter "dollar")
  • in the beginning a cut the year var in 15 even spaced intervals
  • on the lefthandside is the plot with the transformation of the axis, on the righthandside with transformation of values
movies$years <- cut(movies$year,breaks=15)

p <- ggplot(movies, aes(x=years,y=budget,colour=years)) +
## bar filling
  geom_boxplot(fill="black") + 
## add mean points
  stat_summary(fun.y="mean",geom="point") + 
 ## customize tick labels
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
## get rid of the legend
  opts(legend.position="none") 

## log transformation of the y axis
p1 <- p + scale_y_continuous(labels=dollar,trans="log")

## compare with
## here we do the transformation while mapping:
p2 <- ggplot(movies, aes(x=years,y=log(budget),colour=years)) +
  geom_boxplot(fill="black") +
  stat_summary(fun.y="mean",geom="point") +
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
  opts(legend.position="none")

## plot both graphs on one page
grid.arrange(p1,p2,ncol=2)

transformvalvsscale.png

Dotplot with facets

  • x contains variables riskgroup, aod (for age of death) and sex
ggplot(x,aes(x=riskgroup,y=aod,colour=riskgroup)) + 
  geom_dotplot() + 
  facet(~sex) +
  xlab("Risk Group") +
  ylab("Age of Death")

Boxplot in facets with different background colours

boxplots with facets and different background colours

  • x contains the vars sex, heightsds and riskgroup
  • rg contains a vars riskgroup, min (which is -inf in all cases) and max (which is inf in all cases); these define the backgroup colors for the different facets
p <- ggplot(x)
p +
  geom_boxplot(aes(x=sex,y=heightsds,color=sex),fill="grey",notch=T,alpha=1) +
  geom_rect(data=rg,aes(fill=riskgroup,xmin=min,ymin=min,xmax=max,ymax=max),alpha=0.15) +
  facet_wrap(~riskgroup) +
  scale_fill_manual(values=c("darkseagreen1","lightgreen","gold","firebrick")) +
  ylab("Adult Height - Height SDS") + xlab("Sex") +
  opts(panel.background=theme_rect(colour=NA),legend.position="none")

Boxplot with Means

boxplots as alternative to barplot with error bars

  • a more informative alternative to this barplot with errorbars
  • using the movies example data set (part of the ggplot2 package)
  • cut the year variable in 15 equal intervals gives the new variable years
  • map this new year var to x, budget to y and colour also to years (aes(x=years,y=budget,colour=years))
  • add a layer boxplot, set the fill colour for the bars to black
  • format the y axis (use dollar formatting)
  • change the x axis tick labels with a text theme: rotate the tick labels (angle=270), position adjustment with hjust and vjust
  • we do not need a legend for he colours because the information is contained in the x axis, so get rid of it: opts(legend.position="none")
movies$years <- cut(movies$year,breaks=15)
ggplot(movies, aes(x=years,y=budget,colour=years)) +
  geom_boxplot(fill="black") +
  scale_y_continuous(labels=dollar) +
  stat_summary(fun.y="mean",geom="point") +
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
  opts(legend.position="none")
Warnmeldungen:
1: Removed 53573 rows containing non-finite values (stat_boxplot). 
2: Removed 53573 rows containing missing values (stat_summary).

Sunday, June 3, 2012

Scatter Plot with Smoother

scatter plot with smoother

  • load data (part of the lme4 package)
data(sleepstudy)
head(sleepstudy)
  Reaction Days Subject
1 249.5600    0     308
2 258.7047    1     308
3 250.8006    2     308
4 321.4398    3     308
5 356.8519    4     308
6 414.6901    5     308
  • and build the graph using a linear model (method="lm" in the geom)
ggplot(sleepstudy,aes(x=Days,y=Reaction)) +
  geom_point() +
  geom_smooth(method="lm")

Simple Histogram

Simple Histogram

library(ggplot2) ## load the package
data(movies)     ## load the data
## build the plot
ggplot(movies,aes(x=rating)) + 
  geom_histogram(aes(y=..density..,fill=..density..))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Barplot with Errorbars

I think boxplots are the better alternative
  • load the summarySE function
  • and cut the year variable to create reasonable categories
library(ggplot2)
data(movies)
source("helpers.r")
movies$years <- cut(movies$year,breaks=15)
table(movies$years)

(1893,1900] (1900,1908] (1908,1915] (1915,1923] (1923,1930] (1930,1938] 
         65         162         276         327         935        3093 
(1938,1945] (1945,1953] (1953,1960] (1960,1968] (1968,1975] (1975,1983] 
       3725        3357        4116        3702        5023        4554 
(1983,1990] (1990,1998] (1998,2005] 
       6775        8257       14421
  • extract the information
moviesSE <- summarySE(movies,measurevar="budget",groupvars="years",na.rm=T)
- a warning message because the one class without non-missing values in budget
Warnmeldung:
In qt(p, df, lower.tail, log.p) : NaNs wurden erzeugt
  • a look at the new created data frame containing the means, se, sd and ci of the budget variable according to the years classes
head(moviesSE)
        years   N    budget        sd       se        ci
1 (1893,1900]   0       NaN        NA       NA        NA
2 (1900,1908]   2   1125.00   1590.99  1125.00  14294.48
3 (1908,1915]  11  44023.73  63415.40 19120.46  42603.05
4 (1915,1923]  25 314215.76 298129.12 59625.82 123061.65
5 (1923,1930]  76 703895.24 761287.07 87325.62 173961.55
6 (1930,1938] 205 588688.42 553872.30 38684.12  76271.96
  • build the plot using this new data frame
p <- ggplot(moviesSE,aes(x=years,y=budget,colour=years)) 
p +
  geom_bar() +  ## add bars
  geom_errorbar(aes(ymin=budget-se,ymax=budget+se),width=0.5) + ## add errorbars
  opts(axis.text.x = theme_text(angle=315,hjust=0,vjust=1)) ## customize tick labels