wiki:access/JEDIAtNCI

JEDI at NCI

In early 2020 UKMO formally commenced and committed substantial resources to the Next-Generation OPS (NG-OPS) and Next-Generation DA (NG-DA) projects. Both projects rely heavily on JEDI with UKMO concentrating development on components that it requires to implement its specific data assimilation schemes and observation processing. Their aim is to have JEDI-based DA system which is comparable to the current operational system ready by mid-2023 (the Bureau has not had access to the final project plans as of May 2020).

At the time of this writing (May 2020) the Bureau's position is uncertain as to how to respond to the development. Anticipating that the Bureau will commit commensurate resources to JEDI-based NG-OPS and NG-DA projects the Data Assimilation Team has begun familiarisation with various components of JEDI itself. This is to enable early adopters to explore opportunities to collaborate with their UKMO colleagues so that when the Bureau formally decides to participate in the NG projects as a part of a partnership-wide effort we are ready to make contributions with minimum spin-ups.

In this wiki page and the links in it we attempt to record various JEDI-related work undertaken by DA Team members.

Building and installing software stack for running JEDI applications

This section is for scientific computing support who will be maintaining the software stack used in running JEDI. Developers and users of JEDI subsystems and applications can skip this section.

JEDI applications depend on a large number of software packages. The group of packages needed to run various JEDI applications is referred to as jedi-stack. The scripts used to build the packages are hosted in one of the JCSDA github repositories (https://github.com/JCSDA/jedi-stack).

Jin LEE and Wenming LU completed the initial build and installation of jedi-stack on Gadi. The name of the github branch used to build the software packages is feature/gadi. The site-specific instruction for building jedi-stack for Gadi can be found in a markdown. The initial installation is documented in the jedi-stack github issue #93. Some of the the decisions made at the time of the initial build/installation were provisional and there is an ample room for improvement. We list below some of these decisions which are not documented in the github issue.

  • The environment variable, OPT determines the top-level of the installation directory. We decided to use,
    export OPT=/projects/access/da/jedi/jedi-stack
    
  • The official releases of jedi-stack are now tagged. However we decided not to affix to modulefiles the tag as our jedi-stack pre-dates the decision to start tagging. We have need to use tags later to improve traceability (see further discussion in the github issue #93 .
    • Mark Miesch (core development team, JCSDA, miesch@ucar.edu) suggested that a minor change to one or more packages of jedi-stack may not warrant a new release (see the github issue #93). Hence he recommended upgrading those specific packages manually and then creating a meta-module with a different name. We see a potential problem with this approach: deciding whether an upgrade to a package is minor or not may not be so straighforward as the minor change may affect other packages. We will have to see how well this initial jedi-stack installation supports the running of JEDI applications and prepared to revise the way our installations are managed and named.
  • We recommend scientific computing who will be maintaining jedi-stack to keep a close eye on the JEDI Teams page (https://github.com/orgs/JCSDA/teams/jedi) for any announcement related to jedi-stack. The official jedi-stack releases are in the github jedi-stack releases.

ToDo.

  • In the jedi-stack meta modulefile or another modulefile add,
    module use ~access/modules
    
    Most people have this in their startup script but not all. If the path is not present then module load zlib, etc. fail.
  • The netcdf4 build may not be correct: see here for details.
  • The latest jedi-stack is version 0.3 (as of late June, 20200. This includes updates of ecbuild and eckit (it also removes python2 support from build_pyjedi). We should try to update jedi-stack on Gadi to prevent anything breaking.
  • Current Gadi jedi-stack which is older than v0.3 does not include ODC. ODC is needed by IODA so we should try to build/install ODC.
  • LaTeX or Tex Live should be installed properly so that we can build all the documents used in JEDI
  • We should have the JEDI container available on Gadi
    • Code in the JEDI project changes so quickly that we are not able to update its software stack (jedi-stack) on Gadi fast enough to prevent components of JEDI breaking. I surmise that updating a container is easier than updatinging jedi-stack so a container environment which is always available on Gadi would give reliable environment that is always available to the developers. This continued availability of development environment is essential if Bureau is going to take part in NG-OPS or NG-DA.
    • For individual developers a Linux desktop would be an ideal environment but this is not feasible for people in the Bureau
    • Once a container is built an update to the container should be relatively straightforward (?)

Running JEDI subsystems and applications

Currently the software stack for running JEDI applications is installed only on Gadi.

Until JCSDA makes a public release of JEDI most of us will not be able to access JCSDA's private github repositories where the JEDI source code is hosted. This applies even if you already have a github user account. However JCSDA has kindly provided a temporary github user account for us to use. Follow the steps below after logging onto a Gadi login node,

  1. Make sure you are using a recent version of Git,
module load git/2.24.1
  1. Follow this link in the main JEDI user documentation to configure Git: follow the steps up to but not including git lfs install (you need to run git lfs install within a working copy).
  1. Delete cached credentials
cd                   # change to your home directory
rm .git-credentials  # this will be regenerated next time when you connect to a repository

If you already have a github account and want to access some of your favourite, non-JEDI repositories using your own account you will be prompted for your own github username and password.

  1. Run the following commands to see you can connect to github,
cd <location where there is plenty of disk space>
git clone https://github.com/JCSDA/ufo-bundle.git
# Username: jedi-ac06
# Password: UKxq243fKm4nwgs
chmod go-rwx ~/.git-credentials  # protect your plaintext Git credentials file - JCSDA may get annoyed if someone unauthorised uses the account

In Step 2 you should have configured Git to cache your credential on disk. Try cloning the repository again (after deleting the working copy you cloned earlier) and make sure that Git doesn't ask for your user name and password again.

  1. Set up Git LFS,
cd ufo-bundle    # make sure you're at the top of the working copy
git lfs install  # set up Git LFS
  1. Set up your user environment on Gadi by starting jedi-stack,
export OPT=/projects/access/da/jedi/jedi-stack
unset PYTHONPATH
module purge
module use ~access/modules  # do this if you don't already have "~access/modules" in $MODULEPATH
module use ${OPT}/modulefiles/apps
module load jedi/gcc-system_openmpi-3.1.4

Note that jedi-stack has its own set of Python packages (exclusively Python3). Note also that the jedi-stack meta-module (jedi/gcc-system_openmpi-3.1.4) loads in turn a number of modulefiles that make available many software packages that JEDI relies on.

  1. Try building ufo-bundle and running unit tests in it. These are described in this section of the main JEDI user document. However, we need to make some changes when using Gadi as running MPI jobs on Gadi login nodes is frowned upon by sysadmin. Here's a workaround,
    1. Clone the ufo-bundle project,
      cd <src-directory>
      git clone https://github.com/JCSDA/ufo-bundle.git
      
    2. Run ecbuild on a login node (use local test data files),
      ecbuild -DLOCAL_PATH_TESTFILES_SABER=/home/548/jtl548/academy/public/data/ufo-bundle/test_data/saber \
              -DLOCAL_PATH_TESTFILES_IODA=/home/548/jtl548/academy/public/data/ufo-bundle/test_data/ioda <src-directory>
      
    3. Request an interactive session to allow the build to take place on a compute node,
      qsub -q normal -l walltime=01:30:00,mem=32G,ncpus=20 -l storage=scratch/access+scratch/dp9+gdata/access+gdata/dp9+gdata/hh5 -I -X
      
    4. Once you have an interactive session on a compute node start jedi-stack,
      export OPT=/projects/access/da/jedi/jedi-stack
      unset PYTHONPATH
      module purge
      module use ~access/modules
      module use ${OPT}/modulefiles/apps
      module load jedi/gcc-system_openmpi-3.1.4
      
    5. Run make
      cd <your build directory>
      make -j4
      
    6. Run ctest but exclude some tests which rely on data files that are missing
      ctest -E "(bump|ioda)"  # '-E' is used with a regular expression to exclude tests with names containing 'bump' or 'ioda'
      
      You will find there are still some tests that fail. I'm investigating. But in the meantime you should be able to start on the tutorials.

N.B. There are bugs in SABER, IODA and UFO which prevents the local test file functionality working fully. These bugs have been reported and fixed in bugfix branches. Pull requests are pending.

2020 JEDI Academy tutorial

Now, you are ready to get your hands dirty! Try the tutorial exercises from the JEDI Academy held in Monterey in 2020.

Warning. The academy activities were designed to be done interactively using the code and the computing infrastructure that JCSDA set up in Monterey in February. The code has changed substantially since then and Gadi is different from AWS used, so Monterey tutorials do not work as described in the activities. I will point out parts of the original tutorial slides which will not work and then describe workarounds.

Dive into the Mini JEDI Academy - June 2020

Documentation

There are various tools used in JEDI and many of them are third-party software. For most JEDI subsystems documentation can be built by supplying suitable variables to ecbuild and targets to make. Provisionally we will store those documents under the following directory on accessdev,

accessdev:/home/httpd_home/jedidocs

To be able to browse the documents using your browser go to the URL,

https://accessdev.nci.org.au/jedidocs/

and then look for the relevant JEDI tools and components.

Note. If you need a particular document which is not yet available from the webpage above, I encourage you to build the missing document and copy it to the /home/httpd_home/jedidocs location. Afterwards, please update /home/httpd_home/jedidocs/index.html by inserting a hyperlink linking it to the correct document. If you need any assistance in building documents please come and see Jin LEE.

Next Step: lfric-bundle

Marek Wlasak (UKMO) put together a rose-stem test suite for running LFRic in JEDI. The suite also runs 4DEnVar, 3DVar etc. The suite is hosted on one of the private JCSDA repositories and is named lfric-bundle.

Most likely our involvement with JEDI will be through the UKMO's projects, NG-OPS and NG-DA. So we will need to become familiarised with lfric-bundle.

On 29 May, 2020 BoM and MO held an online meeting. A summary of the discussion is here.

The JEDI core development team (JCSDA)

Name role
Yannick Tremolet JEDI project lead

People in UKMO working on NG-OPS and NG-DA

Name role
Chiara Piccolo project executive
Jonathan Flowerdew head of DA Methods
David Simonin NG-OPS project manager
Stefano Migliorini NG-DA project manager
Marek Wlasak lfric-bundle, um-jedi
Wojciech Smigaj C++ expert
David Davies maintains UKMO JEDI software stack to run lfric-bundle
Mike Cooks (?) ?
Neill Bowler Works on GPSRO
Steven Sandbach ????
KrissyRayD242 (????) lfric-bundle (?)
Chris Thomas Conventional observations (?)
Last modified 2 weeks ago Last modified on Jun 24, 2020 9:11:53 AM