wiki:access/nwp_suite_input_data

NWP Suites input data on NCI

Available projects at NCI and their data policy

  • lb4 and wr45
    • Funded by NCI for data useful for others (CSIRO/universities)
    • We provide the data as part of BoM contribution to NCI
    • Not for a general use of running NWP experiments
  • ig2
    • Data for Bureau's access only; it is not visible to others
    • 50 TB in total storage space (39 TB are used as of 13 Nov, 2020)
    • As of late 2020 it is used for archiving data from operational NWP models
      • Newer data are copied and as of late 2020 there is no auto-purge
    • Data holding,
      • bufr/ - Bufr files from operational ACCESS-G2 and ACCESS-R
      • cmcgem/, jmagsm/, ops_aps3/, ukgc/ usavm/ - copies of MARS
      • ACCESS_prod/
      • dev_aps4/
      • ei/
      • mnth/
      • UKDAILY/, UKDAILY_PS/



Data inventory

NWP input data (currently under nas_data)

Data From Size per day size per year C3 CE3 NAS G3 ADEPT obsmon
bgerr G3 5.7MB 2GB x x
glu_smc.gz G3 159MB 58GB x x c
glu_varbc.gz G3 1.5MB 0.55GB x x c
obs (includes RAMMSA SST) C3 getobs 23GB 8400GB x x
obs15 NAS getobs 17GB 6200GB x
gl t+3 dump G3 160GB 58400GB c c c x
odb2s C3 3.5G 1300GB x
Frames (C3, NAS, ADEPT) G3 11G 4000GB x x x
Total 79TB12TB

*x means used daily or each cycle. c means used for cold starting

Alternate plan for G3 dumps:

  • 6 months of 4/day = 29TB
  • 30 months of 1/day = 36TB

Total = 65 TB

3 years is then 100TB

Alternate (non-GetObs) obs archive

  • /g/data/dp9/da/access-c/obs/radar ~130GB per month
  • /g/data/dp9/da/access-c/obs/rainfall ~ 250MB per month
  • /g/data/dp9/da/access-c/surf/ascat ~ 250MB per month

The radar data archived here is now static, since the C3 getobs (above) now includes radar data. It should be excluded from ongoing estimates and left where it is. The ascat data might need to be added to the estimates if we start to do soil moisture analysis for high-res systems (likely). Say 9G for 3 years.

Static data

  • Ancillaries (/g/data/access/ANCIL/APS3 or APS4) 12TB
  • Local control files
    • (/g/data/access/OPS/[control|Data]) 2.5GB
    • (/g/data/access/VAR/[data_64/CovStats|ext/VAR]) 6GB

This should be excluded from estimates as it is small and/or static. I.e. not part of a "3 year archive", but must be kept permanently.


  • /g/data/access/VAR/ (2.3T)
  • /g/data/access/OPS/ (330G)
  • /g/data/access/ANCIL (12T)
  • /projects/access/umdir/ancil/data/ (48M)
  • /projects/access/umdir/vn10.8/ctldata (289M)
  • /g/data/dp9/reana/anc_data/ukmo/ (permission denied)
  • /scratch/dp9/ttl548/nas_data/2020/02/ (9.4T)
  • /g/data/dp9/da/access-c/obs/radar/202002/ (131G)
  • /g/data/dp9/da/access-c/obs/rainfall/202002/ (300M)

Total: ~24.54T

City Suite (u-bn286)

Including all in above table, 15.7TB per year for inputs, excluding cold start dumps.

ADEPT

  • For 3 ADEPT domains MCM / SCS / ADS
  • External Inputs (per domain per cycle)
  • G3 start dump: 40GB used by all 3 domains
  • G3 Frames: ~6GB total for 3 domains
  • Ancils: ~1TB total for 3 domains

Global Trials

The following is an estimate of input data needed for global suites (assuming 3 years' worth of data are stored on disk),

  • Bufr observational data: ~2T
  • OSTIA SST and sea ice files: ~1T
  • Error modes for global hybrid non-coupled trialling: 31G/cycle, 124G/day; total of ~136T

UKMO start dumps

3 months' worth of UKMO IC files take up approx 9 TB (Wenming)

Others

NAS, CE, BARRA and BARPA are outside the scope of this current work in data consolidation. But the data requirements for those are listed here.



Consolidated and final estimate of all input data required for NWP trials

ig2 whose allocation is 50 TB currently is insufficient for long trial periods needed by various NWP models. Because of this constraint we decided to store only limited amount of input data under ig2. Here is a summary of how we will be managing ig2,

  • For each model we will store enough input data to allow the completion of 2 standard trial periods: one summer and one winter with each lasting 2 1/2 to 3 months
  • Any other data outside the standard trial periods will need to be stored elsewhere - e.g. MDSS
  • At any given development cycle (e.g. APS4) the standard trial periods will be fixed but can be changed by consensus. When the standard trial periods change the older input data will migrate to MDSS and the required newer data will be stored in ig2

Here's the final estimate of data requirement for various trials,

Model Trial period File type Input data requirement (in TB) Comment
Global 20171201T06 - 20180228T00 and 20170620T06 - 20170930T12 0.5 this estimate does not include error modes which are needed for uncoupled hybrid 4DVar
Global 1 year glu_smc.gz 0.058 From Susan's table (above)
Global 1 year glu_varbc.gz 0.00055 From Susan's table (above)
ACCESS-C 1 year obs 8.4 From Susan's table (above). Includes RAMMSA SST. From C3 getobs
ACCESS-C 20200201T06 - 20200415T21 and 75 day winter period frames 0.9 This estimate is for the frames files for two (2) city regions, for two (2) sets of 75 day periods (one summer, one winter)
ADEPT 28 day winter period in August frames 0.544 Three regions, SCS, MCM, ADS, 4 times/day. This period should be within the Global model winter trial period so that the Global files can be used here too
ADEPT 28 day winter period in August t+3 1.12
Total 11.532



Directory structure

The directory tree structure used for ig2 is identical to that on sam. This will make locating of data files on ig2 easy.



Scripts used in transferring NWP input data from Bureau to NCI

Tan's scripts for transferring NWP input data to NCI are under logan:/home/ttl/cron (some are copied to gadi:/scratch/dp9/ttl548/DL),

  • nas_obs.sh and nas_frame.sh

Some of Milton's scripts are under gadi:/scratch/dp9/ttl548/DL,

  • lftp.sh and mirror_nas.sh



ToDo

Last modified 9 days ago Last modified on Nov 16, 2020 12:23:55 PM

Attachments (1)

Download all attachments as: .zip