Emulating the Canadian climate model CanAM4

In this example we use the well-known R-package DiceKriging to emulate output from the Canadian climate model CanAM4.

First we load the necessary packages, and the data file globalCanAM4W1.RData. The emulator inputs are saved in columns 2 to 14 of the object tData:

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(DiceKriging))

load('../Demonstrations/globalCanAM4W1.RData')

design = tData[, 2:14] %>% as_data_frame
design

## # A tibble: 60 x 13
##         FACAUT     UICEFAC          CT       CBMF      WEIGHT     DRNGAT
##  *       <dbl>       <dbl>       <dbl>      <dbl>       <dbl>      <dbl>
##  1 -0.11111111  0.21739130 -0.09090909 -0.2000000  0.44444444 -0.9000000
##  2  0.93964656  0.21272724  0.44617789 -0.6328893 -0.12804819  0.5548678
##  3 -0.49607751  0.49866598  0.66486451 -0.4184081 -0.57449420 -0.7147499
##  4 -0.01823092  0.87500017 -0.78226390  0.7481327  0.20387546  0.7659221
##  5  0.18724746 -0.24641582  0.37947947 -0.9198253 -0.03209664  0.8924434
##  6 -0.33415933 -0.04815128 -0.96533195 -0.2130182 -0.93268338 -0.2679695
##  7  0.61434413 -0.67240104  0.06517013  0.9333383 -0.83407115  0.2400356
##  8 -0.11223438  0.80259298  0.08011285  0.3629515  0.92100957  0.1179780
##  9 -0.85252199 -0.12957552  0.89944743  0.5471016 -0.51206544  0.4832247
## 10 -0.68412334 -0.81095030 -0.12232986  0.4093460  0.76273577 -0.4389870
## # ... with 50 more rows, and 7 more variables: RKXMIN <dbl>, XINGLE <dbl>,
## #   ALMC <dbl>, ALF <dbl>, TAUS1 <dbl>, CSIGMA <dbl>, CVSG <dbl>

DiceKriging emulator for BALX

Build the emulator

We build the emulator for the output variable BALX. We use the built-in R function step to choose the emulator mean function by stepwise regression. step returns an object of class lm, whose list element terms contains the R formula for the mean function. That formula is passed to the DiceKriging function km, along with the design matrix and the response, to fit the Gaussian process emulator. See ?km for options to control the emulator; here we only specify to use a Matern covariance matrix (with nu=5/2) and let all parameters be estimated automatically:

# define response
response = tData$BALX

# stepwise regression to fit trend model
em_lm = step(lm(response ~ ., data=design), 
             direction='both', trace=0)

# fit GP using DiceKriging function `km`
em_km = km(formula  = em_lm$terms, 
           design   = design, 
           response = response, 
           covtype  = 'matern5_2',
           control  = list(trace=FALSE))

The object em_km contains the emulator. We can use predict to evaluate the emulator at new inputs, for example at the mean of all input points:

pred = predict(em_km, newdata=map_df(design, mean), type='UK')
pred[c('mean', 'sd', 'lower95', 'upper95')] %>% as.data.frame

##       mean       sd  lower95  upper95
## 1 6.230499 1.142164 3.935236 8.525763

Validate emulator

The emulator can be validated by leave-one-out cross validation, using the DiceKriging function leaveOneOut.km:

em_km_loo = leaveOneOut.km(em_km, type='UK') %>% as_data_frame
em_km_loo

## # A tibble: 60 x 2
##          mean       sd
##         <dbl>    <dbl>
##  1 -1.0139507 1.731929
##  2  5.0540275 1.482308
##  3  0.4496219 1.619887
##  4  6.1947459 1.613111
##  5 -5.3986640 1.785493
##  6  6.6694612 1.610073
##  7 10.4919816 1.572071
##  8 -1.7387301 1.383712
##  9 -5.4089395 1.652246
## 10  7.7264413 1.680943
## # ... with 50 more rows

em_km_loo is a list with two vectors mean and sd of equal length, which is why it can be transformed to a data frame. To plot the cross validation results in a faceted panel plot with ggplot2, we have to set up the appropriate data frame with columns for the response, and lower and upper limit of the prediction interval. We also add a boolean variable inside and a corresponding colr variable to change the colour of the intervals if the response falls outside. The variable pit is the probability integral transform, which should be uniformly distributed if the emulator is reliable. At last we append the design matrix, and use gather to turn the data frame from wide into long form appropriate for facet plotting:

# dataframe for plotting
plot_df = em_km_loo %>% 
  mutate(response = response,
         index    = seq_along(response),
         lwr      = mean - 2 * sd, 
         upr      = mean + 2 * sd,
         pit      = pnorm(response, mean, sd),
         inside   = (response > lwr & response < upr),
         colr     = ifelse(inside, 'deepskyblue3', 'darkorange3')) %>%
  bind_cols(design) %>%
  gather('variable', 'value', names(design))

The following plots the emulator output and the actual value of the response over the value of the input in a separate panel for each input.

ggplot(plot_df, aes(x=value)) + facet_wrap( ~ variable, ncol=2) +
  geom_point(aes(y=mean), colour=plot_df$colr) + 
  geom_segment(aes(xend=value, y=lwr, yend=upr), colour=plot_df$colr) +
  geom_point(aes(y=response), cex=1.5)

The histogram over the PIT values should be approximately flat:

ggplot(plot_df, aes(x=pit)) + 
  geom_histogram(breaks=seq(0,1,0.2), colour='black', fill='deepskyblue3')

Repeat everyting with PCP as a response

Build emulator

response = tData$PCP
em_lm = step(lm(response ~ ., data=design), direction='both', trace=0)
em_km = km(formula=em_lm$terms, design=design, response=response, covtype='matern5_2', control=list(trace=FALSE))

Validate emulator

plot_df = leaveOneOut.km(em_km, type='UK') %>% as_data_frame %>% mutate(response = response, index = seq_along(response), lwr = mean - 2 * sd, upr = mean + 2 * sd, pit = pnorm(response, mean, sd), inside = (response > lwr & response < upr), colr = ifelse(inside, 'deepskyblue3', 'darkorange3')) %>% bind_cols(design) %>% gather('variable', 'value', names(design))

ggplot(plot_df, aes(x=value)) + facet_wrap(~variable, ncol=2) + geom_point(aes(y=mean), colour=plot_df$colr) + geom_segment(aes(xend=value, y=lwr, yend=upr), colour=plot_df$colr) + geom_point(aes(y=response), cex=1.5)

ggplot(plot_df, aes(x=pit)) + geom_histogram(breaks=seq(0,1,0.2), colour='black', fill='deepskyblue3')