wiki:gadi

Version 11 (modified by Jin Lee, 6 weeks ago) (diff)

--

ACCESS Gadi Transition

Gadi is NCI's new high performance computer, expected to come online in November-December 2019

Information from NCI

NCI Information and Transition Timeline

Gadi Software Stack - /apps

Gadi Installation Progress

Important Changes from Raijin

  • New node size - each node has 48 CPUs, large jobs must have a cpu count that's a multiple of this
  • No /short space - Persistent storage is only available on /g/data
  • New /scratch space - Very large (!) quotas, but files will be automatically deleted after 90 days
  • Only latest /apps modules - Old versions of modules currently on Raijin will not be moved to Gadi, see above link for list

Configuring ACCESS Jobs for Gadi

The ACCESS support team will be providing instructions for running ACCESS on Gadi once we have more information. You can contact us by emailing cws_help@…

Rose/Cylc (UM vn10 / ACCESS 2 or later)

The exact settings required to run a Rose/Cylc suite on Gadi will depend on the initial setup of the suite. There are two things you must do - first set up Rose to run suites on Gadi's /scratch disk, which only needs to be done once, second set up the Cylc configuration of each suite you want to run to talk to Gadi.

We should in time have a list of configurations pre-configured to run on Gadi

  1. Set up Rose to run jobs from /scratch/$PROJECT/$USER/cylc-run by default:

Add to ~/.metomi/rose.conf:

[rose-suite-run]
root-dir=gadi*=/scratch/a12/abc123 # Replace with your project / user id
root-dir{share/cycle}=gadi*=/scratch/a12/abc123
root-dir{share}=gadi*=/scratch/a12/abc123
root-dir{work}=gadi*=/scratch/a12/abc123
  1. Set up the HPC task in your Rose suite. This will vary depending on the suite you're using, but should be something like this for a GA job

Copy site/nci_raijin.rc to site/nci_gadi.rc, and edit to set up HPC and UMBUILD_RESOURCE

Note you must request access to disk for all projects you're using - this is done here using Cylc's Jinja template system

# Add projects you need storage access to in this list
{% set storage_projects = ['access', 'w35'] %}

[ runtime ]
    [[ HPC ]]
         init-script = """
             module purge
             export PATH=~access/bin:$PATH
             module use ~access/modules

             module load openmpi/3.1.4
             ulimit -s unlimited
         """
         [[[ remote ]]]
             host = gadi
         [[[ job submission ]]]
             method = pbs
         [[[ directives ]]]
             -q = normal
             -l ncpus = 1
             -l walltime = 1:00:00
             -l mem = 4 gb
             -l storage = {{ '+'.join(['scratch/'+p+'+gdata/'+p for p in storage_projects]) }}
             -W umask = 0022
         [[[ environment ]]]
             UMDIR = ~access/umdir
             ROSE_TASK_N_JOBS = ${PBS_NCPUS:-1}

    [[ UMBUILD_RESOURCE ]]
         inherit = HPC
         init-script = """
             module purge
             export PATH=~access/bin:$PATH
             module use ~access/modules

             module load intel-{cc,fc,mkl}/2019.3.199
             module load openmpi/3.1.4
             module load gcom/6.2_ompi.3.1.4 # NOTE! Check GCOM version for your UM version
             module load fcm
             module load netcdf
             module load drhook
             module load grib-api
             module load libpng
             module load openjpeg
             module load zlib
             ulimit -s unlimited
         """
        [[[ directives ]]]
            -q = express
            -l ncpus    = 6
            -l mem      = 12gb
            -l software = intel-compiler
        [[[ environment ]]]
            ROSE_TASK_OPTIONS = -f fcm-make2.cfg

UMUI (UM vn7 / ACCESS 1)

Instructions to be provided

Transition work progress

gadi/transition/access modules

gadi/transition/Nwp