# Copyright (C) 2024 CNRS, Météo-France, Sorbonne Université, Exeter Univ. # # htexplo is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published # by the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # htexpo is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with htexplo. If not, see . # 15/10/2024: Now distributed under GPL-v3 license Septembre 2020 : first open source version = = = = = = = = = = = = = = = = = = = = = In link with the submission of two reference papers: Process-based climate model development harnessing machine learning: I. a calibration tool for parameterization improvement, Couvreux et al, in revision for JAMES Process-based climate model development harnessing machine learning: II. model calibration from single column to global, Hourdin et al, in revision for JAMES - svn - Maintained under subversion (svn) A .ignorelist file can be modified. Then run svn propset svn:ignore -F .ignorelist . svn commit . .ignorelist - content - Codes under src directory: = = = = = = = = = = = = = .ignorelist : list of files not handled by svn compute_metrics_csv.sh htune_EOF.R expe_setup.R htune_metric.R extract_onemetric_csv.sh htune_netcdf2csvMetrics.R htune_case_setup.R htune_plot.R htune_convertDesign.R htune_test_plot.R htune_convert.R kLHC.R htune_csv2Rdata.R param2R.sh htune_EmulatingMultiMetric.R param2Rwave.sh post_scores.sh post_plots.sh scatter_plot.py Successive version of the main R script htune_EmulatingMultiMetric.R htune_Emulating_Multi_Metric.R htune_Emulating_Multi_Metric_Multi_LHS.R htune_Emulating_Multi_Metric_Multi_LHS_new.R - Description - BEFORE STARTING: The model will be installed one levels higher in the tree. If you are in DIR1/Hightune right now, the models will be installed in DIR1. This allows you to avoid reinstalling the model by staying in the DIR1 directory. For running other models, need to have installed MUSC on your machine: http://confluence/pages/viewpage.action?pageId=248758682 Main programs : =============== 1. bench.sh is the script that launch htexplo design to automatically run SCM and evaluate metrics on it. This bench can be used with any models by running bench.sh with -model $MODEL option MODEL = [LMDZ AROME ARPCLIMAT ARPEGE ECRAD] To see all the bench.sh options : ./bench.sh --help 2. setup.sh is the script that will install all the packages you need to run htexplo. You can start by running it without argument ; and it also run a little exemple (in WORK/EXEMPLE). It is also automatically run by bench.sh 3. exemple.sh (in src directory) is a script to run waves one by one without managing SCM simulations. It's another way of launching htexplo when no SCM simulations needs to be done This bench runs the following steps: = = = = = = = = = = = = = = = = = Step 0 : Experiment setup and controls ------------------------------------------------------------------ Manage the setup of the experiment if it's the first wave by runing setup.sh $MODEL $workdir and setup_$MODEL.sh Manage for continuing an existing experiment Before starting the experiment, you can change some model dependant specification in models/$MODEL/setup_$MODEL.sh (typically the version of the model) Step 1 : Parameter definition and generation of parametric ensemble ------------------------------------------------------------------- param2R.sh : define list of parameters and their range create the R script ModelParam.R Usage : ./param2R.sh LHCSIZE NLHC PARAM_FILE Ex : ./param2R.sh 30 3 LMDZ/param_cld (when using this step for a second wave needs to use param2Rwave.sh) NLHC: if NLHC=1, then generate the maximinLHS of size LHCSIZE. htune_convertDesign.R, Automatically run by param2R.sh from version 9 creates design for the emulator using ModelParam.R outputs : Par1D_Wave1.asc containing the parameter values for SCM simulations Wave1.RData containing normalized parameter values for the SCM Calls kLHC.R and htune_convert.R kLHC.R to produce the k-extended latin hyper cubes sampling htune_convert.R contains the different functions to transform from normalized to non normalized and vice/versa the different values of the parameters Step 2 : serie_[MODEL].sh ------------------------- the different scripts are available in the MODEL directory This is the only model-dependent script Use : ./serie_$MODEL.sh $opt_model $cas $NWAVE - create the def files from Par1D_Wave1.asc stored in WAVE$NWAVE/DEF and/or ECRAD namelists in WAVE$NWAVE/NAMECRAD - run automatically the corresponding simulations stored in WAVE$NWAVE/CAS/SUBCASE named SCM-$NWAVE-$NSIMU.nc or RAD$HOUR-$NWAVE-$NSIMU.nc for ECRAD simulations - convert the SCM in dephy format Step 3 : Compute Metrics and convert to Rdata --------------------------------------------- use compute_metrics.sh $metrics -wave $nwave - that call get_one_metric_target_and_var.sh $metric_name to evaluate the target and tolerance to error you can also provide a csv file with metrics and tolerances - and call get_one_metric_from_file.sh $fil $short_metric_name to compute metrics on the SCM simulations NB : metrics names follow the syntax : - for atmospheric : CASE_SUBCASE_METRICS_T1_T2 with T1 and T2 and final time step for time averages - for radiative : RAD_CASE_SUBCASETIME_METRICS_SZA1_SZA2 You can change options on the evaluation of references in get_one_metric_target_and_var.sh If you want to add a new metric, you have to define it in get_one_metric_from_file.sh and specify the tolerance evaluation in get_one_metric_target_and_var.sh ### Old way (before realse ~490) ### use compute_metrics_csv.sh (call extract_onemetric_csv.sh for both LES and SCM and compute metrics through htune_netcdf2csvMetrics.R) Syntaxis: compute_metrics_csv.sh ARMCU_REF_Ay-theta_8_9 ARMCU_REF_zav-400-600-theta_9_9 ... !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! Need to change manually the number of the wave in this script file!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -> call extract_onemetric_csv.sh for both LES and the dir contains simulations -either use cdo to compute metrics when averaging is asked -either call htune_netcdf2csvMetrics.R (that call htune_metric.R) to compute other metrics ex lwp, neb metrics, Ayotte metrics) Exemple : src/extract_onemetric_csv.sh ARMCU_REF_nebmax_7_9 LES/ Will compute the cloud fraction for the simulations in LES/ARMCU/REF between time 7 and 9 -> call htune_csv2Rdata.R=> to convert to Rdata Wave1_LES.Rdata : metrics computed on LES Wave1_SCM.Rdata : metrics computed on SCM Assume that all the files are at an hourly time frequency Metrics already available : targetvar=averaging of any variables between two different vertical levels [zav] targetvar=lwp, zhneb, Ay-theta (or any integral of positive/negative (theta) differences to the 1st time TBD: averaging in time relevant for stationary cases TBD: change 1st hour by initial time """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Small post processing run by the bench.sh : """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" post_processing_1D.sh is run automatically at each waves and compute the ensemble min max and average of each waves and each cases. ### Obsolete ? ### For CNRM SCM need to before suppress the NaN values using script_modif_fillvalue.sh (then you should run the previous cdo command with new_SCM_1*nc instead of SCM_1*nc) ### Obsolete ? ### TBD : could be included in the script compute_metrics_csv.sh with an automatic drawing of the profile for the variables used to compute the metrics at the given time ? Step 4 : htune_EmulatingMultiMetric.R ------------------------------------- replaced successively by htune_EmulatingMultiMetric.R htune_Emulating_Multi_Metric.R htune_Emulating_Multi_Metric_Multi_LHS.R htune_Emulating_Multi_Metric_Multi_LHS_new.R Emulator building reading Wave1_LES.Rdata and Wave1_SCM.Rdata followed by history matching procedure Reading emulators from previous waves when they existe Definition and plots of NROY spaces Step 6: Run a second Wave: ------------------------ param2Rwave.sh :use the RData file generated in htune_EmulatingMultiMetric.R after history matching for previous waves. Usage : ./param2Rwave.sh WAVEN RDATA_FILE Ex : ./param2Rwave.sh 2 Wave2.RData WAVEN should be >= 2 or use bench.sh -wave 2 -model MODEL Specific use of the bench.sh : ------------------------------ 1. To use metrics that are not evaluated on SCM but on more costly configuration -> run bench.sh with option -GCM pre, with or without 1D metrics It will stop just before runing SCM simulation if no 1D metrics or stop just after computing 1D metrics You can get the def files in WAVE$nwave/DEF or NAMELIST/ for ECRAD to run your simulations in other places -> You evaluate the metrics you want on the ensemble of simulation and put them in the same format as metrics_REF_$NWAVE.csv and metrics_WAVE$NWAVE_$NWAVE.csv (some explanation are in models/LMDZ/Readme_scripts-Tuning-3D/ along with some scripts to paste 1D metrics csv file to new metrics csv file) -> then you run bench.sh with option -GCM post it will execute the rest of the script that is do the history matching part and store the free parameters values for the next wave -> Note that you can do several waves this way, but it will not iterate automatically 2. Compute ECRAD offline on LMDZ SCM The bench.sh now allows to compute SCM with LMDZ and compute radiative transfert with ECRAD, pretty automatically. You have to specify -model LMDZ. bench.sh recognize you want to run ECRAD on the SCM profile by the name of metrics that start with "RAD". Some exemples are available in models/ECRAD/bench_ecrad.sh Post-processing : ----------------- -post_scores.py : computes the score (error/tolerance) for all metrics and waves and makes some graphs. - post_plots.sh : draw envelope of SCM runs and bests simulations (you can changes the waves to plot at the begining of the script) post_plots.sh calls post.sh where you can add some option to also plot time series. To modifie axes of the plot go in param_$CASE_$SUBCASE.py The script trace_sens_LES.py $var $CASE $SUBCASE make the plot Plots are stored in PROFILES_$wavemax/BEST$nbest/ - scatter_plot.py : a summary of the the scores for all waves and best simus. - plot_NROY.py : plot the evolution of the remaining space across waves A french readme is available in models/LMDZ/Readme_plots Functions : =========== htune_case_setup.R : some cases caracteristics for plots htune_metric.R : metrics computation htune_plot.R : plots Imput from Exeter : =================== StanEmulateCodeR.R which requires : AutoLMcode.R CustomPredict.R impLayoutplot.R JamesDevelopment.R DannyDevelopment.R MultiWaveHM.R MySpeed1const.stan PredictSpeed1const.stan PredictSpeed2DWconst.stan MySpeed1.stan PredictSpeed1.stan PredictSpeed2DW.stan kLHC.R : LHS clever sampling Discussion / conventions : ========================== I propose to use hourly averaged outputs. Should work for all the available cases. Installation rstudio : ====================== Here is how to install RStudio on Ubuntu 16.04 sudo apt-get install r-base wget https://download1.rstudio.org/rstudio-xenial-1.0.153-amd64.deb sudo apt-get install gdebi sudo gdebi rstudio-xenial-1.0.153-amd64.deb then, you should be able to open RStudio by simply using the command: rstudio when you install the supplementary libraries for RStudio on Ubuntu, you will need to install netcdf-bin et libnetcdf-dev, otherwise the ncdf4 library won't install correctly You will also need to install these packages in RStudio: # Two packages were not available : "dicekriging" and "mvtnorm" #install.packages(c("ncdf4","rstan","tensor","Hmisc","lhs","fields","rgl","shape","mco","far","dicekriging","GenSA","mvtnorm","loo")) install.packages(c("ncdf4","tensor","Hmisc","lhs","fields","rgl","shape","mco","far","GenSA","loo")) For the ExeterUQ : install.packages("rstan") For the ExeterUQ_MOGP install.packages("bayesplot","pracma","invgamma") pracma’ n'est trouvé 2: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : aucun package nommé ‘invgamma’ # Problem with rstan install Found on https://github.com/stan-dev/rstan/issues/566 : packageurl <- "http://cran.r-project.org/src/contrib/Archive/StanHeaders/StanHeaders_2.19.0.tar.gz" install.packages(packageurl, repos=NULL, type="source") Plantage fenetrage avec rstudio sur ubuntu 18.04.3 : export RSTUDIO_CHROMIUM_ARGUMENTS="--disable-gpu" avant de lancer rstudio ... You might also need: r-cran-rgl libx11-dev libglu1-mesa-dev Installing mogp_emulator ======================== pip3 required Cleaning of the tool: = = = = = = = = = = = - expe_setup.sh bench2waves.sh bench2wavesmMetric.sh => modif of bench and htune_EmulatingMultiMetric.R to get the n° of wave, tau and cutoff as optional arguments: default= 1,0,3 => modif of extract_onemetric.sh to maximise ref erro (this is the way the tolerance to error is included right now)