wiki:gadi

ACCESS Gadi Transition

Gadi is NCI's new high performance computer, expected to come online in November-December 2019

Information from NCI

NCI Information and Transition Timeline

Gadi Software Stack - /apps

Gadi Installation Progress

Important Changes from Raijin

  • New node size - each node has 48 CPUs, large jobs must have a cpu count that's a multiple of this
  • No /short space - Persistent storage is only available on /g/data
  • New /scratch space - Very large (!) quotas, but files will be automatically deleted after 90 days
  • Only latest /apps modules - Old versions of modules currently on Raijin will not be moved to Gadi, see above link for list

Configuring ACCESS Jobs for Gadi

The ACCESS support team will be providing instructions for running ACCESS on Gadi once we have more information. You can contact us by emailing cws_help@…

Rose/Cylc (UM vn10 / ACCESS 2 or later)

The exact settings required to run a Rose/Cylc suite on Gadi will depend on the initial setup of the suite. There are two things you must do - first set up Rose to run suites on Gadi's /scratch disk, which only needs to be done once, second set up the Cylc configuration of each suite you want to run to talk to Gadi.

We should in time have a list of configurations pre-configured to run on Gadi

  1. Set up Rose to run jobs from /scratch/$PROJECT/$USER/cylc-run by default:

Add to ~/.metomi/rose.conf:

[rose-suite-run]
root-dir=gadi*=/scratch/a12/abc123 # Replace with your project / user id
root-dir{share/cycle}=gadi*=/scratch/a12/abc123
root-dir{share}=gadi*=/scratch/a12/abc123
root-dir{work}=gadi*=/scratch/a12/abc123
  1. Set up the HPC task in your Rose suite. This will vary depending on the suite you're using, but should be something like this for a GA job

Copy site/nci_raijin.rc to site/nci_gadi.rc, and edit to set up HPC and UMBUILD_RESOURCE

Note you must request access to disk for all projects you're using - this is done here using Cylc's Jinja template system

# Add projects you need storage access to in this list
{% set storage_projects = ['access', 'w35'] %}

[ runtime ]
    [[ HPC ]]
         init-script = """
             module purge
             export PATH=~access/bin:$PATH
             module use ~access/modules

             module load openmpi/3.1.4
             ulimit -s unlimited
         """
         [[[ remote ]]]
             host = gadi
         [[[ job submission ]]]
             method = pbs
         [[[ directives ]]]
             -q = normal
             -l ncpus = 1
             -l walltime = 1:00:00
             -l mem = 4 gb
             -l storage = {{ '+'.join(['scratch/'+p+'+gdata/'+p for p in storage_projects]) }}
             -W umask = 0022
         [[[ environment ]]]
             UMDIR = ~access/umdir
             ROSE_TASK_N_JOBS = ${PBS_NCPUS:-1}

    [[ UMBUILD_RESOURCE ]]
         inherit = HPC
         init-script = """
             module purge
             export PATH=~access/bin:$PATH
             module use ~access/modules

             module load intel-{cc,fc,mkl}/2019.3.199
             module load openmpi/3.1.4
             module load gcom/6.2_ompi.3.1.4 # NOTE! Check GCOM version for your UM version
             module load fcm
             module load netcdf
             module load drhook
             module load grib-api
             module load libpng
             module load openjpeg
             module load zlib
             ulimit -s unlimited
         """
        [[[ directives ]]]
            -q = express
            -l ncpus    = 6
            -l mem      = 12gb
            -l software = intel-compiler
        [[[ environment ]]]
            ROSE_TASK_OPTIONS = -f fcm-make2.cfg

UMUI (UM vn7 / ACCESS 1)

A hand edit script will apply most of the required changes to central data paths and modules needed to run a UMUI job on Gadi, however you will need to manually change run output paths

Configuration Modifications:

Under User Information and Target Machine -> Target Machine, set:

  • Number of processors E-W and N-S so that their product is a multiple of 48 (the number of cores per node on Gadi)

Keep Machine Name as 'raijin', this is needed to get the correct build configuration.

Under Input/Output Control and Resources -> Time Convention and SCRIPT Environment Variables, set:

  • DATAM to the output directory for model outputs (e.g. '/g/data/$PROJECT/$USER/umui/$RUNID')
  • DATAW to the output directory for log files & namelists (e.g. '/scratch/$PROJECT/$USER/umui/$RUNID')

These can be set to the same path if desired'

Under Input/Output Control and Resources -> User hand edit files, add a new entry to the end '~access/gadi/handedits/um7.3' and put a 'Y' in the second column to enable it

Under FCM Configuration -> FCM Extract and Build directories and Output levels, set:

  • Target machine root extract directory (UM_ROUTDIR) to a path on /scratch (e.g '/scratch/$PROJECT/$USER/um_builds'). Note the build system adds on '$USER/$RUNID' to this path automatically.

Transition work progress

gadi/transition/access modules

gadi/transition/Nwp

Last modified 2 days ago Last modified on Nov 13, 2019 11:08:06 AM