Wednesday, August 8, 2012

rebuild cdc obesity maps

you find a short script here

or you can download the images

Saturday, June 16, 2012

Visualizing fMRI

Analyze

  • first read data (download from R for medicine and biology companion website: http://www.jblearning.com/catalog/9780763758080/)
  • extract one point in time
  • here you see two snapshots from the resulting 3d object (opengl has to be installed)
  • plot using rgl
library("oro.dicom")
library(rgl)
library("misc3d")
library("AnalyzeFMRI")
x <- f.read.analyze.volume("finger.img")
volume <- x[ , , , 1]
contour3d(volume, 1:64, 1:64,1.5*(1:25), lev=c(500,700,800),alpha=c(0.2,0.5,1),color=c("white","red","green"))

Monday, June 4, 2012

another map style

ggmap

  • just another (more fancy) map style
  • for the preliminaries look here
map <- get_cloudmademap(bbox=c(left=12.25,bottom=51.285,
                        right=12.5,top=51.47),
                        maptype = 58461,
                        api_key = api_key,
                        zoom=13)
ggmap(map) + geom_point(aes(x=lon,y=lat),colour="red",data=gc) 

plot addresses as points on map using ggmap

ggmap

  • given addresses
  • find long and lat
  • get a map of the city
  • plot point at location of the addresses
library(ggmap)
library(stringr)
mystreets <- read.table("streets.txt",header=F,sep="|")
head(mystreets) ## addresses of physicians in Leipzig
                          V1
1  Rosa-Luxemburg-Str. 20-30
2            Marktstraße 2-6
3       Engelsdorfer Str. 21
4               Jahnallee 59
5            Johannisplatz 1
6         Hugo-Krone-Platz 9
## adds city and country to get complete addresses
ads <- paste(mystreets$V1, ', Leipzig, Germany',sep='')

## get rid of space at begin and end of the strings
ads <- str_trim(ads)


## get long and lat from google so you 
## need to have access to the internet
gc <- geocode(ads)

## get data of leipzig (limits of the map)
leipzig <- geocode("leipzig",output="more")


## get map from cloudmademap, also possible googlemaps etc
## for cloudemademap you need additionaly a api key
map <- get_cloudmademap(bbox=c(left=leipzig$west,
                          bottom=leipzig$south,
                          right=leipzig$east,
                          top=leipzig$north),
                        maptype = 997,api_key = api_key,zoom=13)

## plot all together
ggmap(map) + geom_point(aes(x=lon,y=lat),colour="red",data=gc)  

axis tick labels with comma as decimal point (german style)

german commas - commas as decimal point

  • how to format the axis tick label, as example we use formatting numbers in german style (comma as dec)
  • first we have to write a little function using the format function from the base package (which makes it very easy)
## the formatter function
germ_commas <- function(x,...){
  format(x, dec="," , trim = TRUE, scientific = FALSE, ...)
}

## create example data
x <- y <- rnorm(10)
z <- data.frame(x=x,y=y)

## create the plot using the function above inside the scales
ggplot(z,aes(x=x,y=y)) +
  geom_point() +
  scale_x_continuous(labels=germ_commas,breaks=seq(-2.5,2.5,by=0.25)) +
  scale_y_continuous(labels=germ_commas,breaks=seq(-2.5,2.5,by=0.25)) +
  opts(axis.text.x = theme_text(size=13,angle=90)) +
  opts(axis.text.y = theme_text(size=13))

Transformations axis vs values

log transformation: axis vs values

  • sometimes it is useful to transform the values
  • in ggplot you have the choice between transforming the values or the axis
  • here is an example to compare both
  • the data set movie is part of the ggplot package, additional used are the gridExtra package (grid.arrange), the scales package (labels formatter "dollar")
  • in the beginning a cut the year var in 15 even spaced intervals
  • on the lefthandside is the plot with the transformation of the axis, on the righthandside with transformation of values
movies$years <- cut(movies$year,breaks=15)

p <- ggplot(movies, aes(x=years,y=budget,colour=years)) +
## bar filling
  geom_boxplot(fill="black") + 
## add mean points
  stat_summary(fun.y="mean",geom="point") + 
 ## customize tick labels
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
## get rid of the legend
  opts(legend.position="none") 

## log transformation of the y axis
p1 <- p + scale_y_continuous(labels=dollar,trans="log")

## compare with
## here we do the transformation while mapping:
p2 <- ggplot(movies, aes(x=years,y=log(budget),colour=years)) +
  geom_boxplot(fill="black") +
  stat_summary(fun.y="mean",geom="point") +
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
  opts(legend.position="none")

## plot both graphs on one page
grid.arrange(p1,p2,ncol=2)

transformvalvsscale.png

Dotplot with facets

  • x contains variables riskgroup, aod (for age of death) and sex
ggplot(x,aes(x=riskgroup,y=aod,colour=riskgroup)) + 
  geom_dotplot() + 
  facet(~sex) +
  xlab("Risk Group") +
  ylab("Age of Death")

Boxplot in facets with different background colours

boxplots with facets and different background colours

  • x contains the vars sex, heightsds and riskgroup
  • rg contains a vars riskgroup, min (which is -inf in all cases) and max (which is inf in all cases); these define the backgroup colors for the different facets
p <- ggplot(x)
p +
  geom_boxplot(aes(x=sex,y=heightsds,color=sex),fill="grey",notch=T,alpha=1) +
  geom_rect(data=rg,aes(fill=riskgroup,xmin=min,ymin=min,xmax=max,ymax=max),alpha=0.15) +
  facet_wrap(~riskgroup) +
  scale_fill_manual(values=c("darkseagreen1","lightgreen","gold","firebrick")) +
  ylab("Adult Height - Height SDS") + xlab("Sex") +
  opts(panel.background=theme_rect(colour=NA),legend.position="none")

Lexis Plot

boxplots as alternative to barplot with error bars

  • thX is a Lexis object (Epi package)
  • in the dead variable is coded wether the subjects died or not
  • in the injstat variable is coded which status the subject had when it entered a specific period (defined by age and time)
  • in lex.Xst the exit status is coded
## plot lines for all subjects in a light colour
plot(thX,2:1,col=c("azure2","red")[factor(thX$dead)],xlab="year")
## for the dead subjects we set the line width 2, for all other subject 0 so we do not see them
## we set color for the treatment time red and time after treatment blue
lines(thX,2:1,col=c("red","blue")[factor(thX$injstat)],
      lwd=c(0,2)[factor(thX$dead)])

## we add big white crosses at the point of dead
points(thX,2:1,pch=c(NA,NA,NA,3)[factor(thX$lex.Xst)],
       col="white",lwd=3,cex=1.2)
## within these big white crosses we plot coloured crosses which indicates the risk group of the subjects
points(thX,2:1,pch=c(NA,NA,NA,3)[factor(thX$lex.Xst)],
       col=c("green","green","blue","red")[factor(thX$riskgroup)],lwd=1,cex=1)

## add a legend
text(rep(2016,2),c(27.5,25.5),c("treatment","after treatment"),
     col=c("red","blue"),pos=4)

points(rep(2015,3),c(34.5,32.5,30.5),
       pch=3,lwd=3,cex=1.2,col="lightgrey")
points(rep(2015,3),c(34.5,32.5,30.5),pch=3,
       lwd=1,cex=1,col=c("green","blue","red"))
text(rep(2016,3),c(34.5,32.5,30.5),
     c("riskgroup 1a, 1b","riskgroup 2","riskgroup 3"),
     col=c("green","blue","red"),pos=4)

Boxplot with Means

boxplots as alternative to barplot with error bars

  • a more informative alternative to this barplot with errorbars
  • using the movies example data set (part of the ggplot2 package)
  • cut the year variable in 15 equal intervals gives the new variable years
  • map this new year var to x, budget to y and colour also to years (aes(x=years,y=budget,colour=years))
  • add a layer boxplot, set the fill colour for the bars to black
  • format the y axis (use dollar formatting)
  • change the x axis tick labels with a text theme: rotate the tick labels (angle=270), position adjustment with hjust and vjust
  • we do not need a legend for he colours because the information is contained in the x axis, so get rid of it: opts(legend.position="none")
movies$years <- cut(movies$year,breaks=15)
ggplot(movies, aes(x=years,y=budget,colour=years)) +
  geom_boxplot(fill="black") +
  scale_y_continuous(labels=dollar) +
  stat_summary(fun.y="mean",geom="point") +
  opts(axis.text.x = theme_text(angle=270,hjust=0,vjust=1)) +
  opts(legend.position="none")
Warnmeldungen:
1: Removed 53573 rows containing non-finite values (stat_boxplot). 
2: Removed 53573 rows containing missing values (stat_summary).

Diagnostic Plot no1: Random Example: non constant variances

diagnostic plot 1: non constant variance

  • example how a the diagnostic plot no 1 looks if the error variance is not constant (strong non-constant variance)
par(mfrow=c(3,3)) ## 9 plots on one page
x <- runif(100,0,20)
for(i in 1:9){
  y <- x + x * rnorm(100)
  m <- lm(y ~ x)
  plot(m,which=1)
}

Diagnost Plot No. One, Random Examples

diagnostic plot no1: normal distributed errors

  • example how a the diagnostic plot 1 (residuals vs. fitted) should look like: 9 random number examples
par(mfrow=c(3,3),mar=c(3,2,2,1)) ## 9 plots on one page
x <- runif(100,0,20)
for(i in 1:9){
  y <- x + rnorm(100)
  m <- lm(y ~ x)
  plot(m,which=1)
}

Sunday, June 3, 2012

Scatter Plot with Smoother

scatter plot with smoother

  • load data (part of the lme4 package)
data(sleepstudy)
head(sleepstudy)
  Reaction Days Subject
1 249.5600    0     308
2 258.7047    1     308
3 250.8006    2     308
4 321.4398    3     308
5 356.8519    4     308
6 414.6901    5     308
  • and build the graph using a linear model (method="lm" in the geom)
ggplot(sleepstudy,aes(x=Days,y=Reaction)) +
  geom_point() +
  geom_smooth(method="lm")

Simple Histogram

Simple Histogram

library(ggplot2) ## load the package
data(movies)     ## load the data
## build the plot
ggplot(movies,aes(x=rating)) + 
  geom_histogram(aes(y=..density..,fill=..density..))
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Barplot with Errorbars

I think boxplots are the better alternative
  • load the summarySE function
  • and cut the year variable to create reasonable categories
library(ggplot2)
data(movies)
source("helpers.r")
movies$years <- cut(movies$year,breaks=15)
table(movies$years)

(1893,1900] (1900,1908] (1908,1915] (1915,1923] (1923,1930] (1930,1938] 
         65         162         276         327         935        3093 
(1938,1945] (1945,1953] (1953,1960] (1960,1968] (1968,1975] (1975,1983] 
       3725        3357        4116        3702        5023        4554 
(1983,1990] (1990,1998] (1998,2005] 
       6775        8257       14421
  • extract the information
moviesSE <- summarySE(movies,measurevar="budget",groupvars="years",na.rm=T)
- a warning message because the one class without non-missing values in budget
Warnmeldung:
In qt(p, df, lower.tail, log.p) : NaNs wurden erzeugt
  • a look at the new created data frame containing the means, se, sd and ci of the budget variable according to the years classes
head(moviesSE)
        years   N    budget        sd       se        ci
1 (1893,1900]   0       NaN        NA       NA        NA
2 (1900,1908]   2   1125.00   1590.99  1125.00  14294.48
3 (1908,1915]  11  44023.73  63415.40 19120.46  42603.05
4 (1915,1923]  25 314215.76 298129.12 59625.82 123061.65
5 (1923,1930]  76 703895.24 761287.07 87325.62 173961.55
6 (1930,1938] 205 588688.42 553872.30 38684.12  76271.96
  • build the plot using this new data frame
p <- ggplot(moviesSE,aes(x=years,y=budget,colour=years)) 
p +
  geom_bar() +  ## add bars
  geom_errorbar(aes(ymin=budget-se,ymax=budget+se),width=0.5) + ## add errorbars
  opts(axis.text.x = theme_text(angle=315,hjust=0,vjust=1)) ## customize tick labels 

Saturday, June 2, 2012

Filled Countour 1

  • if you have a matrix with nxm dimensions and every entry represents a value of interest and the indices of the entry describe the position, then you can use filled.contour() to visualize the matrix
  • first we create a simple matrix to demonstrate how it works:
x <- y <- 1:10 # create two vectors with the integers from 1 to 10
z <- outer(x,y) # create a matrix as the outer product of the two vectors
z # show the matrix
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    2    3    4    5    6    7    8    9    10
 [2,]    2    4    6    8   10   12   14   16   18    20
 [3,]    3    6    9   12   15   18   21   24   27    30
 [4,]    4    8   12   16   20   24   28   32   36    40
 [5,]    5   10   15   20   25   30   35   40   45    50
 [6,]    6   12   18   24   30   36   42   48   54    60
 [7,]    7   14   21   28   35   42   49   56   63    70
 [8,]    8   16   24   32   40   48   56   64   72    80
 [9,]    9   18   27   36   45   54   63   72   81    90
[10,]   10   20   30   40   50   60   70   80   90   100
  • now let us plot the matrix:
filled.contour(z)