# Emulating the Canadian climate model CanAM4 ```{r, echo=FALSE} knitr::opts_chunk$set(fig.path='figure/emulate-canam4/', fig.height=6, fig.width=10, cache.path='_knitr_cache/emulate-canam4/') set.seed(14112017) ``` In this example we use the well-known R-package [`DiceKriging`](https://cran.r-project.org/web/packages/DiceKriging/index.html) to emulate output from the Canadian climate model CanAM4. First we load the necessary packages, and the data file `globalCanAM4W1.RData`. The emulator inputs are saved in columns 2 to 14 of the object `tData`: ```{r} suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(DiceKriging)) load('../Demonstrations/globalCanAM4W1.RData') design = tData[, 2:14] %>% as_data_frame design ``` # DiceKriging emulator for BALX ## Build the emulator We build the emulator for the output variable `BALX`. We use the built-in R function `step` to choose the emulator mean function by stepwise regression. `step` returns an object of class `lm`, whose list element `terms` contains the R formula for the mean function. That formula is passed to the `DiceKriging` function `km`, along with the design matrix and the response, to fit the Gaussian process emulator. See `?km` for options to control the emulator; here we only specify to use a Matern covariance matrix (with `nu=5/2`) and let all parameters be estimated automatically: ```{r emulate-balx} # define response response = tData$BALX # stepwise regression to fit trend model em_lm = step(lm(response ~ ., data=design), direction='both', trace=0) # fit GP using DiceKriging function `km` em_km = km(formula = em_lm$terms, design = design, response = response, covtype = 'matern5_2', control = list(trace=FALSE)) ``` The object `em_km` contains the emulator. We can use `predict` to evaluate the emulator at new inputs, for example at the mean of all input points: ```{r} pred = predict(em_km, newdata=map_df(design, mean), type='UK') pred[c('mean', 'sd', 'lower95', 'upper95')] %>% as.data.frame ``` ## Validate emulator The emulator can be validated by leave-one-out cross validation, using the `DiceKriging` function `leaveOneOut.km`: ```{r} em_km_loo = leaveOneOut.km(em_km, type='UK') %>% as_data_frame em_km_loo ``` `em_km_loo` is a list with two vectors `mean` and `sd` of equal length, which is why it can be transformed to a data frame. To plot the cross validation results in a faceted panel plot with `ggplot2`, we have to set up the appropriate data frame with columns for the response, and lower and upper limit of the prediction interval. We also add a boolean variable `inside` and a corresponding `colr` variable to change the colour of the intervals if the response falls outside. The variable `pit` is the probability integral transform, which should be uniformly distributed if the emulator is reliable. At last we append the design matrix, and use `gather` to turn the data frame from wide into long form appropriate for facet plotting: ```{r} # dataframe for plotting plot_df = em_km_loo %>% mutate(response = response, index = seq_along(response), lwr = mean - 2 * sd, upr = mean + 2 * sd, pit = pnorm(response, mean, sd), inside = (response > lwr & response < upr), colr = ifelse(inside, 'deepskyblue3', 'darkorange3')) %>% bind_cols(design) %>% gather('variable', 'value', names(design)) ``` The following plots the emulator output and the actual value of the response over the value of the input in a separate panel for each input. ```{r plot-balx-loo, fig.height=15} ggplot(plot_df, aes(x=value)) + facet_wrap( ~ variable, ncol=2) + geom_point(aes(y=mean), colour=plot_df$colr) + geom_segment(aes(xend=value, y=lwr, yend=upr), colour=plot_df$colr) + geom_point(aes(y=response), cex=1.5) ``` The histogram over the PIT values should be approximately flat: ```{r plot-balx-pit} ggplot(plot_df, aes(x=pit)) + geom_histogram(breaks=seq(0,1,0.2), colour='black', fill='deepskyblue3') ``` # Repeat everyting with PCP as a response ## Build emulator ```{r} response = tData$PCP em_lm = step(lm(response ~ ., data=design), direction='both', trace=0) em_km = km(formula=em_lm$terms, design=design, response=response, covtype='matern5_2', control=list(trace=FALSE)) ``` ## Validate emulator ```{r} plot_df = leaveOneOut.km(em_km, type='UK') %>% as_data_frame %>% mutate(response = response, index = seq_along(response), lwr = mean - 2 * sd, upr = mean + 2 * sd, pit = pnorm(response, mean, sd), inside = (response > lwr & response < upr), colr = ifelse(inside, 'deepskyblue3', 'darkorange3')) %>% bind_cols(design) %>% gather('variable', 'value', names(design)) ``` ```{r plot-pcp-loo, fig.height=15} ggplot(plot_df, aes(x=value)) + facet_wrap(~variable, ncol=2) + geom_point(aes(y=mean), colour=plot_df$colr) + geom_segment(aes(xend=value, y=lwr, yend=upr), colour=plot_df$colr) + geom_point(aes(y=response), cex=1.5) ``` ```{r plot-pcp-pit} ggplot(plot_df, aes(x=pit)) + geom_histogram(breaks=seq(0,1,0.2), colour='black', fill='deepskyblue3') ```