wiki:access/AccessPS34Test

Status of local implementation of UM versions

Testing UKMO PS34 Global N768L70 ENDGame Build and Run jobs

UNDER CONSTRUCTION - - Work in trying out ps34 and the writing of this documentation currently on-going




UKMO Documentation on Parallel Suite 34 (PS34)



  • Documentation on PS34 is published on collab wiki at this url: ​http://collab.metoffice.gov.uk/twiki/bin/view/Support/ParallelSuite34

    • Note: Login into collab wiki ( http://collab.metoffice.gov.uk ) is required to access PS34 page.

    • Files accociated with "Global N768L70 (ENDGame)" have been downloaded and available on raijin and ngamai in:
      • ~access/downloads/



Building PS34 Global executable

Requires patching standard UM VN8.5 with UM PS34 Patch and JULES PS34 Patch

azs's local um ps34 branch

  • Apply patches from vn8.5_PS34_Global_and_EG_Configuration_patch.tgz
    • Create branch from trunk at vn8.5 and apply patch.
      • Branch URL=https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34
  • In finalising my local ps34 branch, the following also requires consideration
    • https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/local_changes
    • https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/metoffice_patches


  • List of changes in ...vn8.5/local_changes compared to trunk@vn8.5
    Modified files (14):
        fcm-make/meto-x86-ifort/inc/um-atmos.cfg
        fcm-make/meto-x86-ifort/inc/x86-ifort-mpich.cfg
        src/script/control/qsatmos      src/script/control/make_parexe.pl
        src/script/control/qsresubmit   src/script/control/qsoasissetup
    
        src/atmosphere/dynamics_advection/set_halos.F90
        src/atmosphere/convection/shallow_conv-shconv5a.F90
        src/atmosphere/convection/deep_conv-dpconv5a.F90
    
        src/configs/machines/linux-ifort-nci/ext_libs/gcom_mpp.cfg
        src/configs/machines/linux-ifort-nci/ext_libs/netcdf.cfg    
        src/configs/machines/linux-ifort-nci/ext_libs/gcom_serial.cfg
        src/configs/machines/linux-ifort-nci/ext_libs/drhook.cfg    
        src/configs/machines/linux-ifort-nci/machine.cfg
    
    New files: 11
        fcm-make/linux-ifort-nci/inc/um-scm.cfg      
        fcm-make/linux-ifort-nci/inc/um-atmos.cfg
        fcm-make/linux-ifort-nci/inc/ifort-nci.cfg   
        fcm-make/linux-ifort-nci/inc/um-utils.cfg
        fcm-make/linux-ifort-nci/um-scm-debug.cfg    
        fcm-make/linux-ifort-nci/um-atmos-debug.cfg
        fcm-make/linux-ifort-nci/um-utils-safe.cfg
        fcm-make/linux-ifort-nci/um-scm-safe.cfg
        fcm-make/linux-ifort-nci/um-atmos-safe.cfg   
        fcm-make/linux-ifort-nci/um-scm-high.cfg
        fcm-make/linux-ifort-nci/um-atmos-high.cfg
    
    
    
    • Merge local changes to my working copy of https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34 and commit it.
       cd /g/sc/data/azs/ps34/um8.5_ps34
       svn merge  https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/local_changes
       svn commit
      
    • Inspect https://access-svn.nci.org.au/svn/um/branches/dev/vn8.5/metoffice_patches
      • This branch contains 2 modified sources
        • src/script/control/qsoasissetup
        • src/atmosphere/dynamics_advection/set_halos.F90
      • Both already in "local_changes"


azs's local jules ps34 branch

  • Apply patch from JULES_um8.5_PS34_Global_Configuration_patch.tgz
    • Create branch from trunk (tagged at vn8.5) and apply patch. URL=https://access-svn.nci.org.au/svn/jules/branches/dev/axs599/jules8.5b_ps34


Build job vajda

  • Upload UKMO basis_dljub into vajda in accessdev's UMUI

  • Apply local customisations:
    ----------------------------------------------------------------------------------------------------------------
    Job 1: Accessdev-vajd.a		 "ps34_Build_and_forecast_job (from basis_dljub)"
    Job 2: Accessdev-vajd.x		 "ps34_Build_and_forecast_job (from basis_dljub) Orig"
    Date: 20150216			 LONG COMPARISON
    ----------------------------------------------------------------------------------------------------------------
    	
    00007:		Entry box: Mail-id for notification of end-of-run
    00008:		   Job vajd.a: Entry is set to 'axs599@accessdev.nci.org.au'
    00009:		   Job vajd.x: Entry is set to 'nomail'
    00010:		
    00012:		Entry box: Specify alternative name
    00013:		   Job vajd.a: Entry is set to 'vajd'
    00014:		   Job vajd.x: Entry is set to 'umgl'
    00015:	
    00017:		Entry box: Target Machine user-id:
    00018:		   Job vajd.a: Entry is set to '$USER'
    00019:		   Job vajd.x: Entry is set to 'frpe'
    	    
    00026:	
    00027:		Check box: Change machine config file ($UM_MACHINE)
    00028:		   Job vajd.a: Entry is set to 'ON'
    00029:		   Job vajd.x: Entry is set to 'OFF'
    00030:	
    00031:	
    00032:		Check box: Change target machine name ($TARGET_MC)
    00033:		   Job vajd.a: Entry is set to 'ON'
    00034:		   Job vajd.x: Entry is set to 'OFF'
    00035:	
    00036:	
    00037:		Entry box: Repository directory containing FCM machine.cfg file
    00038:		   Job vajd.a: Entry is set to 'linux-ifort-nci'
    00039:		   Job vajd.x: Entry is inactive
    00040:	
    00041:	
    00042:		Entry box: Host name
    00043:		   Job vajd.a: Entry is set to 'raijin.nci.org.au'
    00044:		   Job vajd.x: Entry is set to 'hpc2e'
    00045:	
    00046:	
    00047:		Radio button: Define submission method
    00048:		   Job vajd.a: Entry is set to 'PBS Pro (Raijin)'
    00049:		   Job vajd.x: Entry is set to 'LoadLeveler'
    00050:	
    00051:	
    00052:		Entry box: Target machine name
    00053:		   Job vajd.a: Entry is set to 'linux'
    00054:		   Job vajd.x: Entry is inactive
    
    00062:		Entry box: DATAM            : Define the directory for written output with time-stamped names
    00063:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/85/$RUNID'
    00064:		   Job vajd.x: Entry is set to '$DATADIR/$RUNID'
    00065:	
    00066:	
    00067:		Entry box: DATAW            : Define the directory for other output file
    00068:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/85/$RUNID'
    00069:		   Job vajd.x: Entry is set to '$DATADIR/$RUNID'
    
    00077:		Differences in Table Hand edits
    00078:	 	1,10c1,10
    00079:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_stashc_DUSTPS32 Y
    00080:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_foamblk Y
    00081:		<  /g/data1/dp9/axs599/ps34/hand_edits/GL_HANDEDITS_8.5_SMNSout_7p5minTS Y
    00082:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_p2t_weight_fix.pl Y
    00083:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_eta_s_0.5.pl Y
    00084:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_sc_1361.pl Y
    00085:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_filter_cloud_tau0.01 Y
    00086:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_srf_agg.ed Y
    00087:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_emis_ssi_full.pl Y
    00088:		<  /g/data1/dp9/axs599/ps34/hand_edits/vn8.5_EG_package_hack.ed Y
    00089:		---
    00090:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_stashc_DUSTPS32 Y
    00091:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_foamblk Y
    00092:		>  ~gmdd/um/handedits/vn8.5/GL_HANDEDITS_8.5_SMNSout_7p5minTS Y
    00093:		>  ~gmdd/um/handedits/vn8.5/vn8.5_p2t_weight_fix.pl Y
    00094:		>  ~gmdd/um/handedits/vn8.5/vn8.5_eta_s_0.5.pl Y
    00095:		>  ~gmdd/um/handedits/vn8.5/vn8.5_sc_1361.pl Y
    00096:		>  ~gmdd/um/handedits/vn8.5/vn8.5_filter_cloud_tau0.01 Y
    00097:		>  ~gmdd/um/handedits/vn8.5/vn8.5_srf_agg.ed Y
    00098:		>  ~gmdd/um/handedits/vn8.5/vn8.5_emis_ssi_full.pl Y
    00099:		>  ~gmdd/um/handedits/vn8.5/vn8.5_EG_package_hack.ed Y
    
    00108:		Entry box: Local machine root extract directory (UM_OUTDIR)
    00109:		   Job vajd.a: Entry is set to '$HOME/UM_OUTDIR'
    00110:		   Job vajd.x: Entry is set to '$HOME/um_extracts'
    00111:	
    00112:	
    00113:		Entry box: Target machine root extract directory (UM_ROUTDIR)
    00114:		   Job vajd.a: Entry is set to '/short/$PROJECT/$USER/UM_ROUTDIR'
    00115:		   Job vajd.x: Entry is set to '/data/nwp/nm'
    
    00123:		Entry box: Specify revision number or keyword of code base to use
    00124:		   Job vajd.a: Entry is set to 'HEAD'
    00125:		   Job vajd.x: Entry is inactive
    00126:	
    00127:	
    00128:		Check box: Use precompiled build
    00129:		   Job vajd.a: Entry is set to 'OFF'
    00130:		   Job vajd.x: Entry is set to 'ON'
    00131:	
    00132:	
    00133:		Check box: Include modifications from branches
    00134:		   Job vajd.a: Entry is set to 'OFF'
    00135:		   Job vajd.x: Entry is set to 'ON'
    00136:	
    00137:	
    00138:		Check box: Use different version of the UM code base from the default for this UMUI version
    00139:		   Job vajd.a: Entry is set to 'ON'
    00140:		   Job vajd.x: Entry is set to 'OFF'
    00141:	
    00142:	
    00143:		Entry box: The Subversion URL (UM_SVN_URL)
    00144:		   Job vajd.a: Entry is set to 'https://access-svn.nci.org.au/svn/um/branches/dev/axs599/um8.5_ps34'
    00145:		   Job vajd.x: Entry is set to 'fcm:um-tr'
    
    00153:		Entry box: Specify revision number or keyword of JULES code base
    00154:		   Job vajd.a: Entry is set to 'HEAD'
    00155:		   Job vajd.x: Entry is set to 'um8.5'
    00156:	
    00157:	
    00158:		Entry box: The Subversion URL (JULES_SVN_URL)
    00159:		   Job vajd.a: Entry is set to 'https://access-svn.nci.org.au/svn/jules/branches/dev/axs599/jules8.5b_ps34'
    00160:		   Job vajd.x: Entry is set to 'fcm:jules-tr'
    00161:	
    00162:	
    00163:		Check box: Include modifications from branches
    00164:		   Job vajd.a: Entry is set to 'OFF'
    00165:		   Job vajd.x: Entry is set to 'ON'
    
    00173:		Entry box: Filename for the Model executable
    00174:		   Job vajd.a: Entry is set to '${RUNID}_um-atmos.exe'
    00175:		   Job vajd.x: Entry is set to 'um-atmos.exe'
    00176:	
    00177:	
    00178:		Entry box: Filename for the Reconfiguration executable
    00179:		   Job vajd.a: Entry is set to '${RUNID}_um-recon.exe'
    00180:		   Job vajd.x: Entry is set to 'um-recon.exe'
    
    00188:		Check box: Including the following list of user file overrides
    00189:		   Job vajd.a: Entry is set to 'OFF'
    00190:		   Job vajd.x: Entry is set to 'ON'
    
    00199:		Differences in Table Specify the STASHmaster files
    00200:	 	1,4c1,4
    00201:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/st_0_246
    00202:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/tca_up_to_6km
    00203:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/STASHmaster_thermal
    00204:		<  /g/data1/dp9/axs599/ps34/user_stashmaster/eg_test_stmaster
    00205:		---
    00206:		>  ~gmdd/um/userstash/vn8.5/st_0_246
    00207:		>  ~gmdd/um/userstash/vn8.5/tca_up_to_6km
    00208:		>  ~gmdd/um/userstash/vn8.5/STASHmaster_thermal
    00209:		>  ~gmdd/um/userstash/vn8.5/eg_test_stmaster
    00210:		
    
    


  • On "Submit" qsub command not found
    Submitting umui_runs/vajda-047163645/stage_1_submit via 'qsub' on raijin.nci.org.au
    /bin/bash: qsub: command not found
    MAIN_SCR: Submit failed
    
    • Try adding "module load pbs" in .profile
    • For now work-around by manually qsubbing on raijin
    • Investigate if UMUIX setup on accessdev can be updated


  • With manual qsub, job failed exceeding walltime
    axs599@raijin4 5056>   tail -18  /home/599/axs599/output/vajda000.vajda.d15047.t163647.comp.leave
    mpif90 -o ni_conv_ctl.o -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umatmos/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/JULES/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/JULES/inc -I/short/dp9/axs599/UM_ROUTDIR/axs599/vajda/baserepos/UMATMOS/inc -O3 -xHost -fp-model precise -g -traceback -mcmodel=medium -g -i8 -r8      -openmp -c /short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umatmos/ppsrc/UM/atmosphere/convection/ni_conv_ctl.f90
    ifort: command line warning #10212: -fp-model precise evaluates in source precision with Fortran.
    ifort: command line remark #10010: option '-pthread' is deprecated and will be removed in a future release. See '-help deprecated'
    =>> PBS: job killed: walltime 3647 exceeded limit 3600
    make: *** [ni_conv_ctl.o] Terminated
    ======================================================================================
    			Resource Usage on 2015-02-17 15:00:51.891711:
    	JobId:  9268436.r-man2  
    	Project: dp9 
    	Exit Status: 271 (Linux Signal 15)
    	Service Units: 6.08
    	NCPUs Requested: 6				NCPUs Used: 6
    							CPU Time Used: 01:00:14
    	Memory Requested: 9000mb 			Memory Used: 664mb
    							Vmem Used: 818mb
    	Walltime requested: 01:00:00 			Walltime Used: 01:00:49
    	jobfs request: 100mb				jobfs used: 1mb
    ======================================================================================
    axs599@raijin4 5057>  
    


  • With wall time increased significantly, build job finally succeeded in building um-atmos executable, but fail to build qxreconf executable.
    /short/dp9/axs599/UM_ROUTDIR/axs599/vajda/umrecon/ppsrc/UM/control/misc/ukmo_grib_mod.f90(108): error #6404: This name does not have a type, and must have an explicit type.   [ZHOOK_OUT]
    IF (lhook) CALL dr_hook('DECODE',zhook_out,zhook_handle)
    ---------------------------------^
    
  • Seek advice from Scott Wales and Martin Dix
  • Try out standard um8.5 build job
    • This job (vajdy) built and ran successfully.

  • Use vajdy to build qxreconf executable using ps34 source (from my branch).
    • This also built successfully.




Reconfiguration job to re-instate ancil fields

Reconfiguration job to add ancil fields stripped from daily-downloaded UKMO initial conditions files qwqg00.reduced.YYYYMMDD400.T+3.gz

  • Set up job vajdf from UKMO's dljtc
    • from http://collab.metoffice.gov.uk/twiki/pub/Support/ParallelSuite34/basis_dljtc.rcf.um8.5.gz


  • Ran into problem due to unrecognised namelists:
    ????????????????????????????????????????????????????????????????????????????????
    ???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
    ? Error in routine: check_iostat
    ? Error Code:    19
    ? Error Message:  Error reading namelist temp_fixes. Please check input list against code.
    ? Error generated from processor:     0
    ? This run generated   0 warnings
    ????????????????????????????????????????????????????????????????????????????????
    
    
  • The above problem and similar namelist issues was solved by turning off all hand-edits in vajdf


  • The job then complained about vertlev file
Vertical Levels file: /projects/access/umdir/vn8.5/ctldata/vert/vertlevs_L70_50t_20s_80km                                                                                                                                                                                           
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: Rcf_Read_Namelists
? Error Code:    80
? Error Message: Vertical Levels Namelist file does not exist!
? Error generated from processor:     0
? This run generated   1 warnings
????????????????????????????????????????????????????????????????????????????????

  • Replace reference to vertlevs_L70_50t_20s_80km with vertlevs_L70_80km


  • After that the reconf job went on to produce an astart files with 142 field types
    • but alas eventually aborted complaining the absence of Field 418 Sec 0:

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: Rcf_Set_Data_Source
? Error Code:    30
? Error Message: Section   0 Item   418 : Required field is not in input dump!
? Error generated from processor:     0
? This run generated   1 warnings
????????????????????????????????????????????????????????????????????????????????

  • According to STASHmaster, the field is "Dust parent soil clay fraction"
    >  grep 418   STASHmaster_A
    1|    1 |    0 |  418 |Dust parent soil clay fraction (anc)|
    
    
  • Study UMUI job vajdf again and found that the ancil settings to add "SOILDUST" is through Scientific section.
    • Model Selection
      • Atmosphere
        • Scientific Parameters and Sections
          • Section by section choices
            • -- Section 17: Aerosols
              • Follow-up panel "DUST"
  • Turn "dust" on. Enter $UM_ANCIL_SOILDUST_DIR & $UM_ANCIL_SOILDUST_FILE in relevant boxes
  • Job ran much further but failed due to memory limitation.
    /projects/access/umdir/vn8.5/linux/scripts/qsrecon: Executing dump reconfiguration program
    
    *********************************************************
    RCF Executable : /short/dp9/axs599/UKD/ps34/bin/vajdy_qxreconf
    *********************************************************
    
    
    =>> PBS: job killed: mem 22012688kb exceeded limit 8192000kb
    mpiexec: killing job...
    
    ======================================================================================
    			Resource Usage on 2015-02-25 15:40:07.653319:
    	JobId:  9408219.r-man2  
    	Project: dp9 
    	Exit Status: 271 (Linux Signal 15)
    	Service Units: 0.03
    	NCPUs Requested: 4				NCPUs Used: 4
    							CPU Time Used: 00:00:57
    	Memory Requested: 8000mb 			Memory Used: 21497mb
    							Vmem Used: 30892mb
    	Walltime requested: 00:10:00 			Walltime Used: 00:00:28
    	jobfs request: 100mb				jobfs used: 1mb
    ======================================================================================
    


  • Even after significant increase in memory allocation, memory problem persist

  • ... to be continued



UM model run job

  • TO--BE--ADDED



UKD Suite

  • TO--BE--ADDED




======================================================================================

Last modified 6 years ago Last modified on Feb 25, 2015 8:40:01 PM