# Copyright (C) 2024 CNRS, Météo-France, Sorbonne Université, Exeter Univ.
#
# htexplo    is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published
# by the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# htexpo     is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with htexplo.    If not, see <http://www.gnu.org/licenses/>.

# 15/10/2024: Now distributed under GPL-v3 license

Septembre 2020 : first open source version
= = = = = = = = = = = = = = = = = = = = = 
In link with the submission of two reference papers:
Process-based climate model development harnessing machine learning: I. a calibration tool for parameterization improvement, Couvreux et al, in revision for JAMES
Process-based climate model development harnessing machine learning: II. model calibration from single column to global, Hourdin et al, in revision for JAMES


                          - svn -

Maintained under subversion (svn)
A .ignorelist file can be modified.
Then run
svn propset svn:ignore -F .ignorelist .
svn commit . .ignorelist


                         - content -

Codes under src directory:
= = = = = = = =  = = = = =
.ignorelist :  list of files not handled by svn
compute_metrics_csv.sh             htune_EOF.R
expe_setup.R                       htune_metric.R
extract_onemetric_csv.sh           htune_netcdf2csvMetrics.R
htune_case_setup.R                 htune_plot.R
htune_convertDesign.R              htune_test_plot.R
htune_convert.R                    kLHC.R
htune_csv2Rdata.R                  param2R.sh
htune_EmulatingMultiMetric.R       param2Rwave.sh
post_scores.sh  post_plots.sh scatter_plot.py

Successive version of the main R script
htune_EmulatingMultiMetric.R
htune_Emulating_Multi_Metric.R
htune_Emulating_Multi_Metric_Multi_LHS.R
htune_Emulating_Multi_Metric_Multi_LHS_new.R


               - Description -

BEFORE STARTING: 

The model will be installed one levels higher in the tree.
If you are in DIR1/Hightune right now, the models will be installed in DIR1.
This allows you to avoid reinstalling the model by staying in the DIR1
directory.

For running other models, need to have installed MUSC on your machine: http://confluence/pages/viewpage.action?pageId=248758682

Main programs :
===============

1. bench.sh is the script that launch htexplo design to automatically run
   SCM and evaluate metrics on it.
   This bench can be used with any models by running bench.sh 
   with -model $MODEL option MODEL = [LMDZ AROME ARPCLIMAT ARPEGE ECRAD] 
   To see all the bench.sh options : ./bench.sh --help

2. setup.sh is the script that will install all the packages you need to run
   htexplo. You can start by running it without argument ; and it also run a
   little exemple (in WORK/EXEMPLE). It is also automatically run by bench.sh

3. exemple.sh (in src directory) is a script to run waves one by one
   without managing SCM simulations. It's another way of launching htexplo
   when no SCM simulations needs to be done


This bench runs the following steps:
= = =  = = = = =  = = = =  = = = = =
Step 0 : Experiment setup and controls
------------------------------------------------------------------
   Manage the setup of the experiment if it's the first wave by runing
   setup.sh $MODEL $workdir and setup_$MODEL.sh
   Manage for continuing an existing experiment
   Before starting the experiment, you can change some model dependant
   specification in models/$MODEL/setup_$MODEL.sh (typically the version of the model)

Step 1 : Parameter definition and generation of parametric ensemble
-------------------------------------------------------------------
   param2R.sh : define list of parameters and their range
             create the R script  ModelParam.R
   
   Usage : ./param2R.sh LHCSIZE NLHC PARAM_FILE
   Ex : ./param2R.sh 30 3 LMDZ/param_cld
   (when using this step for a second wave needs to use param2Rwave.sh)
   NLHC: if NLHC=1, then generate the maximinLHS of size LHCSIZE.

   htune_convertDesign.R, Automatically run by param2R.sh from version 9
   creates design for the emulator using ModelParam.R
   outputs : Par1D_Wave1.asc containing the parameter values
                for SCM simulations
             Wave1.RData containing normalized parameter values for
                the SCM
   Calls kLHC.R and htune_convert.R
   kLHC.R to produce the k-extended latin hyper cubes sampling
   htune_convert.R contains the different functions to transform from normalized to non normalized and vice/versa the different values of the parameters

Step 2 : serie_[MODEL].sh 
-------------------------
   the different scripts are available in the MODEL directory
   This is the only model-dependent script
   Use : ./serie_$MODEL.sh $opt_model $cas $NWAVE
   - create the def files from Par1D_Wave1.asc stored in WAVE$NWAVE/DEF 
            and/or ECRAD namelists in WAVE$NWAVE/NAMECRAD
   - run automatically the corresponding simulations
            stored in WAVE$NWAVE/CAS/SUBCASE
            named SCM-$NWAVE-$NSIMU.nc
            or RAD$HOUR-$NWAVE-$NSIMU.nc for ECRAD simulations
   - convert the SCM in dephy format

Step 3 : Compute Metrics and convert to Rdata
---------------------------------------------
   use compute_metrics.sh $metrics -wave $nwave
   - that call get_one_metric_target_and_var.sh $metric_name
     to evaluate the target and tolerance to error
     you can also provide a csv file with metrics and tolerances
  
   - and call get_one_metric_from_file.sh $fil $short_metric_name to
     compute metrics on the SCM simulations
   NB : metrics names follow the syntax :
   - for atmospheric : CASE_SUBCASE_METRICS_T1_T2  with T1 and T2
  and final time step for time averages
   - for radiative : RAD_CASE_SUBCASETIME_METRICS_SZA1_SZA2

  You can change options on the evaluation of references in
  get_one_metric_target_and_var.sh
  If you want to add a new metric, you have to define it in
  get_one_metric_from_file.sh and specify the tolerance evaluation in
  get_one_metric_target_and_var.sh


  ### Old way (before realse ~490) ###
  use compute_metrics_csv.sh (call extract_onemetric_csv.sh for both LES and SCM and compute metrics through htune_netcdf2csvMetrics.R)
  Syntaxis: compute_metrics_csv.sh ARMCU_REF_Ay-theta_8_9 ARMCU_REF_zav-400-600-theta_9_9 ...
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !!!!! Need to change manually the number of the wave in this script file!!!!!
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  -> call extract_onemetric_csv.sh for both LES and the dir contains simulations
       -either use cdo to compute metrics when averaging is asked
       -either call htune_netcdf2csvMetrics.R (that call htune_metric.R) to compute other metrics ex lwp, neb metrics, Ayotte metrics)
       Exemple : src/extract_onemetric_csv.sh ARMCU_REF_nebmax_7_9 LES/
       Will compute the cloud fraction for the simulations in LES/ARMCU/REF between time 7 and 9

  -> call htune_csv2Rdata.R=> to convert to Rdata
           Wave1_LES.Rdata : metrics computed on LES
           Wave1_SCM.Rdata : metrics computed on SCM
           Assume that all the files are at an hourly time frequency

   Metrics already available :
     targetvar=averaging of any variables between two different vertical levels [zav]
     targetvar=lwp, zhneb, Ay-theta (or any integral of positive/negative (theta) differences to the 1st time 
     TBD: averaging in time  relevant for stationary cases
     TBD: change 1st hour by initial time

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Small post processing run by the bench.sh : 
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   post_processing_1D.sh is run automatically at each waves and compute the
   ensemble min max and average of each waves and each cases.

### Obsolete ? ###
For CNRM SCM need to before suppress the NaN values using script_modif_fillvalue.sh (then you should run the previous cdo command with new_SCM_1*nc instead of SCM_1*nc)
### Obsolete ? ###

TBD : could be included in the script compute_metrics_csv.sh with an
automatic drawing of the profile for the variables used to compute the
metrics at the given time ?


Step 4 : htune_EmulatingMultiMetric.R
-------------------------------------
    replaced successively by
        htune_EmulatingMultiMetric.R
        htune_Emulating_Multi_Metric.R
        htune_Emulating_Multi_Metric_Multi_LHS.R
        htune_Emulating_Multi_Metric_Multi_LHS_new.R
    Emulator building reading Wave1_LES.Rdata and Wave1_SCM.Rdata
    followed by history matching procedure
    Reading emulators from previous waves when they existe
    Definition and plots of NROY spaces


Step 6: Run a second Wave: 
------------------------
param2Rwave.sh :use the RData file generated in htune_EmulatingMultiMetric.R 
	after history matching for previous waves.
      	Usage : ./param2Rwave.sh WAVEN RDATA_FILE
	Ex :  ./param2Rwave.sh 2 Wave2.RData
	WAVEN should be >= 2
or use bench.sh -wave 2 -model MODEL

Specific use of the bench.sh : 
------------------------------
1. To use metrics that are not evaluated on SCM but on more costly
   configuration
   -> run bench.sh with option -GCM pre, with or without 1D metrics
      It will stop just before runing SCM simulation if no 1D metrics
         or stop just after computing 1D metrics
      You can get the def files in WAVE$nwave/DEF or NAMELIST/ for ECRAD 
      to run your simulations in other places
   -> You evaluate the metrics you want on the ensemble of simulation
      and put them in the same format as metrics_REF_$NWAVE.csv and
      metrics_WAVE$NWAVE_$NWAVE.csv (some explanation are in
      models/LMDZ/Readme_scripts-Tuning-3D/ along with some scripts to paste 1D
      metrics csv file to new metrics csv file)
   -> then you run bench.sh with option -GCM post
      it will execute the rest of the script that is
      do the history matching part and store the free parameters values for
      the next wave
   -> Note that you can do several waves this way, but it will not iterate
      automatically

2. Compute ECRAD offline on LMDZ SCM
   The bench.sh now allows to compute SCM with LMDZ and compute radiative
   transfert with ECRAD, pretty automatically. You have to specify -model
   LMDZ. bench.sh recognize you want to run ECRAD on the SCM profile by the
   name of metrics that start with "RAD".
   Some exemples are available in models/ECRAD/bench_ecrad.sh

    
Post-processing :
-----------------
-post_scores.py : computes the score (error/tolerance) for all metrics and
  waves and makes some graphs.
- post_plots.sh : draw envelope of SCM runs and bests simulations (you can
   changes the waves to plot at the begining of the script) post_plots.sh
   calls post.sh where you can add some option to also plot time series.
   To modifie axes of the plot go in param_$CASE_$SUBCASE.py
   The script trace_sens_LES.py $var $CASE $SUBCASE make the plot
   Plots are stored in PROFILES_$wavemax/BEST$nbest/
- scatter_plot.py : a summary of the the scores for all waves and best simus.
- plot_NROY.py : plot the evolution of the remaining space across waves
A french readme is available in models/LMDZ/Readme_plots

Functions :
===========
htune_case_setup.R : some cases caracteristics for plots
htune_metric.R : metrics computation
htune_plot.R :  plots


Imput from Exeter :
===================
StanEmulateCodeR.R which requires :
AutoLMcode.R
CustomPredict.R
impLayoutplot.R
JamesDevelopment.R
DannyDevelopment.R
MultiWaveHM.R
MySpeed1const.stan
PredictSpeed1const.stan
PredictSpeed2DWconst.stan
MySpeed1.stan
PredictSpeed1.stan
PredictSpeed2DW.stan
kLHC.R : LHS clever sampling


Discussion / conventions :
==========================
I propose to use hourly averaged outputs.
Should work for all the available cases.


Installation rstudio :
======================

Here is how to install RStudio on Ubuntu 16.04

sudo apt-get install r-base
wget https://download1.rstudio.org/rstudio-xenial-1.0.153-amd64.deb
sudo apt-get install gdebi
sudo gdebi rstudio-xenial-1.0.153-amd64.deb

then, you should be able to open RStudio by simply using the command:
rstudio

when you install the supplementary libraries for RStudio on Ubuntu, you will need to install netcdf-bin et libnetcdf-dev, otherwise
the ncdf4 library won't install correctly

You will also need to install these packages in RStudio:

# Two packages were not available : "dicekriging" and "mvtnorm"
#install.packages(c("ncdf4","rstan","tensor","Hmisc","lhs","fields","rgl","shape","mco","far","dicekriging","GenSA","mvtnorm","loo"))
install.packages(c("ncdf4","tensor","Hmisc","lhs","fields","rgl","shape","mco","far","GenSA","loo"))
For the ExeterUQ :
install.packages("rstan")
For the ExeterUQ_MOGP
install.packages("bayesplot","pracma","invgamma")
pracma’ n'est trouvé
2: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  aucun package nommé ‘invgamma’

# Problem with rstan install
Found on https://github.com/stan-dev/rstan/issues/566 :
packageurl <- "http://cran.r-project.org/src/contrib/Archive/StanHeaders/StanHeaders_2.19.0.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

Plantage fenetrage avec rstudio sur ubuntu 18.04.3 :
export RSTUDIO_CHROMIUM_ARGUMENTS="--disable-gpu"
avant de lancer rstudio ...


You might also need:
r-cran-rgl
libx11-dev
libglu1-mesa-dev

Installing mogp_emulator
========================
pip3 required

Cleaning of the tool:
= = = = = = = = = = = 
- expe_setup.sh
bench2waves.sh
bench2wavesmMetric.sh

=> modif of bench and htune_EmulatingMultiMetric.R to get the n° of wave, tau and cutoff as optional arguments: default= 1,0,3
=> modif of extract_onemetric.sh to maximise ref erro (this is the way the tolerance to error is included right now)