Building a simple emulator

In this example, we emulate a simple 1d function. We use the function EMULATE.lm to fit the mean function and the function EMULATE.gpstan to emulate the residuals. Then we use stanLOOplot for to produce a diagnostic plot, EMULATOR.gpstan to predict new values of the simulator. Some comments in the end might help to improve the code.

suppressPackageStartupMessages(library(tidyverse))
knitr::opts_chunk$set(fig.path='figure/simple-emulator/', fig.height=4, 
                      cache.path='_knitr_cache/simple-emulator/')

The simulator

We define the following function to be the simulator that we want to emulate:

simulator = function(x) {
  sin(2*x) * exp(-0.5 * x^2)
}
df_simulator = data_frame(x=seq(-5, 5, .01), y=simulator(x))
ggplot(df_simulator, aes(x=x, y=y)) + geom_line()
plot of chunk simulator-function

plot of chunk simulator-function

The sample points

The preliminary step is that we need to identify the possible range of input x. In our example, a reasonable range is [-3, 3]. For emulation, we have to normalize our input x to [-1, 1]. We decided to choose 15 equidistant design points and evaluate our simulator at these design points in order to construct GP emulator. We also added a Noise term to avoid overfitting our mean function component.

# define tData data frame
tData = data_frame(x = seq(-3, 3, length.out = 15), 
                   y = simulator(x),
                   Noise = rnorm(15, 0, 0.4)) %>%
        as.data.frame

# Normalize x to [-1, 1]
tData = mutate(tData, x = (x - min(x)) / (max(x) - min(x)) * 2 - 1)

# show and plot
head(tData)
##            x            y       Noise
## 1 -1.0000000  0.003104026 -0.19480404
## 2 -0.8571429  0.033313747 -0.06518729
## 3 -0.7142857  0.091643625 -0.65654230
## 4 -0.5714286  0.065121613 -0.11329425
## 5 -0.4285714 -0.236184398  0.01501933
## 6 -0.2857143 -0.685451823  0.28742009
ggplot(tData, aes(x = x, y=y)) + geom_point()
plot of chunk design-points

plot of chunk design-points

Loading the code

The emulator code is currently located in the file BuildEmulator/BuildEmulator.R.

old_wd = getwd() 
setwd('..')
source('BuildEmulator/BuildEmulator.R')
## Loading required package: GenSA
## Loading required package: far
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## far library : Modelization for Functional AutoRegressive processes
## version 0.6-4 (2014-12-07)
## Loading required package: fields
## Loading required package: methods
## Loading required package: spam
## Loading required package: grid
## Spam version 1.4-0 (2016-08-29) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
## Loading required package: maps
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
## Loading required package: lhs
## Loading required package: mco
## Loading required package: mvtnorm
## Loading required package: ncdf4
## Loading required package: parallel
## Loading required package: rstan
## Loading required package: StanHeaders
## rstan (Version 2.14.2, packaged: 2017-03-19 00:42:29 UTC, GitRev: 5fa1e80eb817)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
## 
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
## 
##     extract
## Loading required package: shape
## Loading required package: tensor
setwd(old_wd)

Building the emulator

Firstly, we start with fitting the emulator mean function. The function EMULATE.lm uses the stepwise regression to find the best regressors for the mean function. The terms we consider for addition to the mean function are linear, quadratic, cubic, and Fouriers transformations with intercations up to third order between all parameters. (For this simple example we do not need such complicated terms, but in more complicated settings we might.)

myem_lm = EMULATE.lm(Response='y', tData=tData, tcands='x', tcanfacs=NULL, maxOrder=2, maxdf = 2)
## [1] "No permitted terms improve the fit"
## [1] "No permitted terms improve the fit"
## 
## Call:
## lm(formula = y ~ 1, data = tData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68964 -0.07838  0.00000  0.07838  0.68964 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.294e-18  9.832e-02       0        1
## 
## Residual standard error: 0.3808 on 14 degrees of freedom

Secondly, we construct the Gaussian process emulator using the function EMULATE.gpstan. We have to specify the nugget term, which we set to 0.0001 for this particular example, to ensure numerical stability of the covariance matrix. Another important function argument is FastVersion: When the fully Bayesian analysis is too expensive, it allows us to fix parameters at their posterior mode values.

myem_gp = EMULATE.gpstan(meanResponse=myem_lm, nugget=0.0001, tData=tData, 
                         additionalVariables='x', FastVersion=TRUE)
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file35ab25580c70.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder2nd<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file35ab25580c70.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
##      class binder2nd
##            ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file35ab25580c70.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder1st<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file35ab25580c70.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
##      class binder1st
##            ^~~~~~~~~

The calls to EMULATE.lm and EMULATE.gpstan are two main steps in constructing Bayesian Gaussian process emulator.

Validation of the emulator: Leave-one-out analysis

To validate our GP emulator we produce Leave One Out plots against inputs. The function stanLOOplot calculates leave-one-out posterior means at the design points:

tLOOs = stanLOOplot(StanEmulator = myem_gp, ParamNames='x')
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file3680522ad084.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder2nd<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file3680522ad084.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
##      class binder2nd
##            ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file3680522ad084.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder1st<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file3680522ad084.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
##      class binder1st
##            ^~~~~~~~~
## 
## SAMPLING FOR MODEL 'PredictLOO' NOW (CHAIN 1).
## 
## Chain 1, Iteration: 1 / 1 [100%]  (Sampling)
##  Elapsed Time: 3e-06 seconds (Warm-up)
##                0.298515 seconds (Sampling)
##                0.298518 seconds (Total)
plot of chunk plot-stan-loo

plot of chunk plot-stan-loo

Leave One Out plot against the input data. The predictions and two standard deviation prediction intervals for the left out points are in black. The true values are in either green, if they are within two standard deviations of the prediction, or red otherwise.

Emulator predictions

The function EMULATOR.gpstan allows to produce emulator predictions for unseen point. It returns a list with two vectors: Expectation and Variance at the prediction points. An important point is that FastVersion = FALSE uses the posterior samples to generate predictions at unseen points, while FastVersion = TRUE uses the parameters values fixed at the posterior mode.

new_data_x = data.frame(x = seq(-1, 1, len=200))
new_data_y = EMULATOR.gpstan(new_data_x, Emulator = myem_gp, FastVersion = FALSE)
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file383343cc8787.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder2nd<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file383343cc8787.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
##      class binder2nd
##            ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
##                  from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file383343cc8787.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
##  struct functor_traits<std::binder1st<T> >
##                             ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
##                  from /usr/include/c++/7.2.0/string:48,
##                  from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
##                  from /usr/include/c++/7.2.0/bits/ios_base.h:41,
##                  from /usr/include/c++/7.2.0/ios:42,
##                  from /usr/include/c++/7.2.0/istream:38,
##                  from /usr/include/c++/7.2.0/sstream:38,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
##                  from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
##                  from file383343cc8787.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
##      class binder1st
##            ^~~~~~~~~
## 
## SAMPLING FOR MODEL 'PredictGen' NOW (CHAIN 1).
## 
## Chain 1, Iteration: 1 / 1 [100%]  (Sampling)
##  Elapsed Time: 2e-06 seconds (Warm-up)
##                0.526429 seconds (Sampling)
##                0.526431 seconds (Total)
new_data = new_data_x %>% 
  bind_cols(as_data_frame(new_data_y)) %>%
  mutate(lower= Expectation - 2 * sqrt(Variance),
         upper= Expectation + 2 * sqrt(Variance))
ggplot(data=new_data, aes(x=x)) + 
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='magenta') +
  geom_point(aes(y=Expectation), cex=.8)
plot of chunk plot-predictions

plot of chunk plot-predictions

Comments:

  • What exactly does the Noise term do in EMULATE.lm?
  • Is normalisation of the inputs really necessary? Or would it be better to define the length scales for the covariance matrix of the GP?
  • Can the stan warnings be suppressed?
  • The return value of EMULATOR.gpstan could be a data frame instead of a list, including the input values?
  • The column names in tLOOs are upper quartile and lower quartile but the text description says “two standard deviation” – Which one is it? (also variable names shouldn’t contain spaces)
  • I had problems using the ValidPlot function because the x values have to be provided unnormalised to the simulator function but in normalised form to the emulator.
  • In general, the code could be consolidated by making the input/output data format more consistent and more predictable. I would suggest using data_frames from the tidyverse.