In this example, we emulate a simple 1d function. We use the function EMULATE.lm
to fit the mean function and the function EMULATE.gpstan
to emulate the residuals. Then we use stanLOOplot
for to produce a diagnostic plot, EMULATOR.gpstan
to predict new values of the simulator. Some comments in the end might help to improve the code.
suppressPackageStartupMessages(library(tidyverse))
knitr::opts_chunk$set(fig.path='figure/simple-emulator/', fig.height=4,
cache.path='_knitr_cache/simple-emulator/')
We define the following function to be the simulator that we want to emulate:
simulator = function(x) {
sin(2*x) * exp(-0.5 * x^2)
}
df_simulator = data_frame(x=seq(-5, 5, .01), y=simulator(x))
ggplot(df_simulator, aes(x=x, y=y)) + geom_line()
plot of chunk simulator-function
The preliminary step is that we need to identify the possible range of input x
. In our example, a reasonable range is [-3, 3]
. For emulation, we have to normalize our input x
to [-1, 1]
. We decided to choose 15 equidistant design points and evaluate our simulator at these design points in order to construct GP emulator. We also added a Noise
term to avoid overfitting our mean function component.
# define tData data frame
tData = data_frame(x = seq(-3, 3, length.out = 15),
y = simulator(x),
Noise = rnorm(15, 0, 0.4)) %>%
as.data.frame
# Normalize x to [-1, 1]
tData = mutate(tData, x = (x - min(x)) / (max(x) - min(x)) * 2 - 1)
# show and plot
head(tData)
## x y Noise
## 1 -1.0000000 0.003104026 -0.19480404
## 2 -0.8571429 0.033313747 -0.06518729
## 3 -0.7142857 0.091643625 -0.65654230
## 4 -0.5714286 0.065121613 -0.11329425
## 5 -0.4285714 -0.236184398 0.01501933
## 6 -0.2857143 -0.685451823 0.28742009
ggplot(tData, aes(x = x, y=y)) + geom_point()
plot of chunk design-points
The emulator code is currently located in the file BuildEmulator/BuildEmulator.R
.
old_wd = getwd()
setwd('..')
source('BuildEmulator/BuildEmulator.R')
## Loading required package: GenSA
## Loading required package: far
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## far library : Modelization for Functional AutoRegressive processes
## version 0.6-4 (2014-12-07)
## Loading required package: fields
## Loading required package: methods
## Loading required package: spam
## Loading required package: grid
## Spam version 1.4-0 (2016-08-29) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
## Loading required package: maps
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
## Loading required package: lhs
## Loading required package: mco
## Loading required package: mvtnorm
## Loading required package: ncdf4
## Loading required package: parallel
## Loading required package: rstan
## Loading required package: StanHeaders
## rstan (Version 2.14.2, packaged: 2017-03-19 00:42:29 UTC, GitRev: 5fa1e80eb817)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
##
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: shape
## Loading required package: tensor
setwd(old_wd)
Firstly, we start with fitting the emulator mean function. The function EMULATE.lm
uses the stepwise regression to find the best regressors for the mean function. The terms we consider for addition to the mean function are linear, quadratic, cubic, and Fouriers transformations with intercations up to third order between all parameters. (For this simple example we do not need such complicated terms, but in more complicated settings we might.)
myem_lm = EMULATE.lm(Response='y', tData=tData, tcands='x', tcanfacs=NULL, maxOrder=2, maxdf = 2)
## [1] "No permitted terms improve the fit"
## [1] "No permitted terms improve the fit"
##
## Call:
## lm(formula = y ~ 1, data = tData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68964 -0.07838 0.00000 0.07838 0.68964
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.294e-18 9.832e-02 0 1
##
## Residual standard error: 0.3808 on 14 degrees of freedom
Secondly, we construct the Gaussian process emulator using the function EMULATE.gpstan
. We have to specify the nugget
term, which we set to 0.0001 for this particular example, to ensure numerical stability of the covariance matrix. Another important function argument is FastVersion
: When the fully Bayesian analysis is too expensive, it allows us to fix parameters at their posterior mode values.
myem_gp = EMULATE.gpstan(meanResponse=myem_lm, nugget=0.0001, tData=tData,
additionalVariables='x', FastVersion=TRUE)
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file35ab25580c70.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder2nd<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file35ab25580c70.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
## class binder2nd
## ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file35ab25580c70.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder1st<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file35ab25580c70.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
## class binder1st
## ^~~~~~~~~
The calls to EMULATE.lm
and EMULATE.gpstan
are two main steps in constructing Bayesian Gaussian process emulator.
To validate our GP emulator we produce Leave One Out plots against inputs. The function stanLOOplot
calculates leave-one-out posterior means at the design points:
tLOOs = stanLOOplot(StanEmulator = myem_gp, ParamNames='x')
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file3680522ad084.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder2nd<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file3680522ad084.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
## class binder2nd
## ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file3680522ad084.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder1st<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file3680522ad084.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
## class binder1st
## ^~~~~~~~~
##
## SAMPLING FOR MODEL 'PredictLOO' NOW (CHAIN 1).
##
## Chain 1, Iteration: 1 / 1 [100%] (Sampling)
## Elapsed Time: 3e-06 seconds (Warm-up)
## 0.298515 seconds (Sampling)
## 0.298518 seconds (Total)
plot of chunk plot-stan-loo
Leave One Out plot against the input data. The predictions and two standard deviation prediction intervals for the left out points are in black. The true values are in either green, if they are within two standard deviations of the prediction, or red otherwise.
The function EMULATOR.gpstan
allows to produce emulator predictions for unseen point. It returns a list with two vectors: Expectation
and Variance
at the prediction points. An important point is that FastVersion = FALSE
uses the posterior samples to generate predictions at unseen points, while FastVersion = TRUE
uses the parameters values fixed at the posterior mode.
new_data_x = data.frame(x = seq(-1, 1, len=200))
new_data_y = EMULATOR.gpstan(new_data_x, Emulator = myem_gp, FastVersion = FALSE)
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file383343cc8787.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:973:28: warning: ‘template<class _Operation> class std::binder2nd’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder2nd<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file383343cc8787.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:143:11: note: declared here
## class binder2nd
## ^~~~~~~~~
## In file included from /home/stefan/lib/R/RcppEigen/include/Eigen/Core:276:0,
## from /home/stefan/lib/R/RcppEigen/include/Eigen/Dense:1,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat/fun/Eigen_NumTraits.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/matrix_vari.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:14,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file383343cc8787.cpp:8:
## /home/stefan/lib/R/RcppEigen/include/Eigen/src/Core/Functors.h:977:28: warning: ‘template<class _Operation> class std::binder1st’ is deprecated [-Wdeprecated-declarations]
## struct functor_traits<std::binder1st<T> >
## ^~~~~~~~~
## In file included from /usr/include/c++/7.2.0/bits/stl_function.h:1127:0,
## from /usr/include/c++/7.2.0/string:48,
## from /usr/include/c++/7.2.0/bits/locale_classes.h:40,
## from /usr/include/c++/7.2.0/bits/ios_base.h:41,
## from /usr/include/c++/7.2.0/ios:42,
## from /usr/include/c++/7.2.0/istream:38,
## from /usr/include/c++/7.2.0/sstream:38,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/memory/stack_alloc.hpp:10,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core/autodiffstackstorage.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/core.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math/rev/mat.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/stan/math.hpp:4,
## from /home/stefan/lib/R/StanHeaders/include/src/stan/model/model_header.hpp:4,
## from file383343cc8787.cpp:8:
## /usr/include/c++/7.2.0/backward/binders.h:108:11: note: declared here
## class binder1st
## ^~~~~~~~~
##
## SAMPLING FOR MODEL 'PredictGen' NOW (CHAIN 1).
##
## Chain 1, Iteration: 1 / 1 [100%] (Sampling)
## Elapsed Time: 2e-06 seconds (Warm-up)
## 0.526429 seconds (Sampling)
## 0.526431 seconds (Total)
new_data = new_data_x %>%
bind_cols(as_data_frame(new_data_y)) %>%
mutate(lower= Expectation - 2 * sqrt(Variance),
upper= Expectation + 2 * sqrt(Variance))
ggplot(data=new_data, aes(x=x)) +
geom_ribbon(aes(ymin=lower, ymax=upper), fill='magenta') +
geom_point(aes(y=Expectation), cex=.8)
plot of chunk plot-predictions
Comments:
Noise
term do inEMULATE.lm
?EMULATOR.gpstan
could be a data frame instead of a list, including the input values?tLOOs
areupper quartile
andlower quartile
but the text description says “two standard deviation” – Which one is it? (also variable names shouldn’t contain spaces)ValidPlot
function because thex
values have to be provided unnormalised to thesimulator
function but in normalised form to the emulator.data_frame
s from thetidyverse
.