wiki:access/NewSun_009

CAWCR-BoM ACCESS NWP Ngamai Migration Working Group




CAWCR-BoM ACCESS-NWP Ngamai Porting Working Group Meeting Notes

Meeting 9: Wednesday 4th September 2013, 9E Meeting Room
Present: Robin Bowen, Joerg Henrichs, Ed Habjan, Ilia Bermous, Jim Fraser, Joan Fernon, Wenming Lu, Martin Dix, Chris Tingwell, Michael Naughton, Asri Sulaiman, Yi Xiao
Apologies: Zhihong Li, Robert Jukic, Ivor Blockley, Peter Steinle


Agenda

  • Follows notes from previous meeting.

Suites

AG1

  • DISK usage
    • Current disk usage close to maximum
    • Can stop suite which has gone to mid Aug, (verification already done with acceptable result).
    • Consider deleting pi files and re-run suite to generate as needs arise.
    • ACTION: On-going monitoring and management.
  • Plots
    • Verification essentially OK
  • Gary's Diagnostics
    • Exploratory task
    • ACTION: Gary to report.
  • NMOC AG1
    • Joan's suite have started cycling.
    • Started with input from 25/6, now up to 30/6. Running several days run/day.
    • MARS7 not yet ready for archiving
    • opdata capacity is 80Tb, sufficient for 2 months worth of output.
    • pi files will be deleted once MARS archiving is done
    • MARS7 is expected to be available by next week. It will be a disk only system. With 40Tb capacity and only subset of the fields from pi files to be archived, should be sufficient for several months of archiving by all the suites.
  • Discussion will be made with Richard Oxbrow regarding new SAM capacity.
  • LSDSS disk are already mounted on ngamai.
  • Verification of NMOC AG1 suite require MARS -- will need to wait until it is available.

AR1

  • Now ran to 25 Jun - 8 Aug. Will stop at 11 Aug.
  • Joan is close to starting NMOC's AR1
    • consolidating directories
    • save to svn
  • ciwt done 1 month of preliminary verification
    • Virtually identical result to solar apart from expected variation in wind biases.
    • MSLP SI absolutely identical
    • OPS and VAR performance also virtually identical.
    • Can declare the suite as OK.
  • No need to continue further with CAWCR's ACCESS-R on ngamai.
  • Gary's plots will be also be useful on ACCESS-R results. Mike to find someone to take this up.
  • The suite have also incorporated speed-ups from Ilia and Joerg.
  • Speed of reconfiguration still a concern with large runtime variations
    • R12 dump files are much larger than global suite's.
    • Reconfiguration may take between 3 - 20 minutes.
    • This problem was not observed on solar.
  • Timing information on individual task from Joan's runs.
  • Frames are being used for LBCs generation in Joan's suite
  • No benefit have been observed in using 1/2 hourly rather than hourly LBC's in ACCESS-C suite
    • Should update run settings to go to use hourly ourput - significant saving of disk space and run time.
    • Joan to coordinate with Chris to implement the change
    • Preferably done before start of Joan's ACCESS-R suite.

Executables

  • Joerg has been investigating run time variations
    • Initial runs have variation up to 100%
    • Over hundreds of runs, variation is about 10%
    • There appear to be some messaging bottle-neck
      • Modified messages to reduce size
        • Remove combining of several halo exchanges.
    • Problem do not occur on raijin - suspect difference in IB network.
  • Ilia have tried runs with "sleep" between runs
    • No improvement observed.
    • Second run still have up to 100% variation.
    • First run can have up to 20% variation.
  • Ed said Oracle will investigate several areas
    • source code
    • mpi library
    • System configurations
      • buffers
      • Infiniband drivers.
  • There are a great number of runtime settings possible with intel mpi
  • Open MPI appear less so, but according to Joerg, there are also a large number of changeable runtime settings.
  • It may be worthwhile to communicate with Paul Selwood of UKMO as well as MPI developers.
  • Joerg will also look at re-configuration issues next week.

AC1

  • Now running July to present.
    • It is possible to compare to legacy runs, but not that straightforward to do.
  • Differences observed in trial verifications of March/April results
    • Due to ntile settings?
    • Convective settings should be identical, different in re-runs.
  • No difference apart from machine for July.
  • Holly doing rainval plots
  • Chris Bridge & Xiaoxi Wu to run obs verification on surface fields.
  • Run Gary's plots

UIs

  • Basically ok, minor issues still being ironed out.
  • To start planning the migration of all the UI's from solar to ngamai, together with SVN repositories and Trac databases.

OTHER BUSINESS

  • This porting meeting will now be conducted every fortnight.
    • update and revisit issues list for next meeting
    • review status of project documentation
  • It is now appropriate to broaden scope to cover the need of migrating everything that currently runs on solar such as research suites.
    • Look to reconvene UM Working Group meetings.
  • Find out MARS status from Tan Le
  • Follow up on "Build Process"



NEXT MEETING: Wed 18th September, 11am, 9E Meeting Room.






[ 8/9/2013 ] azs, first cut. [ 8/9/2013 ] rab, fix some typos. [ 8/9/2013 ] mjn, few changes.

Last modified 4 years ago Last modified on Oct 20, 2016 12:45:58 PM