wiki:access/NewSun_007

CAWCR-BoM ACCESS NWP Ngamai Migration Working Group


CAWCR-BoM ACCESS-NWP Ngamai Porting Working Group Meeting Notes

Meeting 7: Wednesday 21st August 2013, 9E Meeting Room
Present: Michael Naughton, Jim Fraser, Chris Tingwell, Robin Bowen, Ilia Bermous, Wenming Lu, Zhihong Li, Yi Xiao, Joerg Henrichs, Joan Fernon, Asri Sulaiman.
Apologies: Robert Jukic, Martin Dix, Ivor Blockley


Agenda

  • Review meeting notes
  • Project management
  • Main items in task list
  • Other items from Task List
  • Other business


Review meeting notes

  • New format of meeting notes
  • Task table will be updated at intervals, not necessarily every meeting

Project management

  • Robin has reported progress to Steering Committee after previous meeting. In general they are happy with current progress.
  • ACTION: Re: relative priorities of Solar switch-off, Project Planning, Configuration Management -- solar switch-off is #1 priority, others items are also required within project, but not to delay solar switch-off timing, project will continue after solar switch-off until all steps completed. Still working on drafts of steps.

Item 17: Building executables

  • Documented executable builds
    • ACTION: Asri to organise discussion with Ilia, Xiao and Martin to agree and finalise detail based on Ilia's email subj: UMUI7.5 Building procedure 8/8/2013 . Initially targeted for Thu 22/8 - delayed, to be rescheduled.
  • Have wiki page for each of the Documented builds.

Item 26: APS1 NWP suites

AG1

  • Xiao's ngamai AG1 test NWP suite have caught up to current date. Now running daily.
  • All verifications so far is fine.
  • Disk usage issue: The suite is saving Terabytes of px files, threathening to use up all of Ngamais's diskspace.
  • It is not practical to save px files to sam, but fields may be archiveable to MARS.
  • An option is to save as "frame" files -- to be considered, but not yet.
  • op-research will have 50Tb of FLUSH and 25 Tb of DATADIR limit on ngamai, but temporarily (1-2 weeks) will be able to use more than that 75Tb total.
  • ACTION: Wenming to produce "ACCESS-GN" charts when chart plotting on ngamai is available. REPORT: Work in progress
  • ACTION: Look at running Gary Dietachmayer et al.'s prototype diagnostic tools on porting versions. REPORT:
  • Joan report good progress in setting up NMOC's access-G suite. Going through top level scripts with a few left -- a few more days?. When ready, to run starting from July data.

AR1

  • Xiao now got the test cycling running with separate run and fetch jobs.
  • Now done 4-5 days.
  • Use Research MARS for archiving
  • Load impact on MARS need to be monitored
  • Starts run from End June to continue to current date
  • Verify plots to be included
  • Move archiving from MARS7 to MARS1
  • Can use up to max of 37 cores N-S, with elapse time < 1 hr.
  • Question of reducing from 2 to 1 run a day extending for extra 6 hours is a discussion for APS Working Group.
  • Domain decomposition cannot be as large as AG2.

AC1

  • Crash issue with I.C from Joan -- probably due to incompatible number of input fields.
  • This may be fixable using python utility. Reconfig step is another possibile workaround.
  • "rainval" plot likely useful
  • ACTION: Wenming to produce "ACCESS-CN" charts when chart plotting on ngamai is available. Work in Progress
  • ACTION: Wenming to look at run elapse times for studying runtime variation.

Item 18: MARS

  • Updates regarding MARS on ngamai was sent out via emails.
  • MARS is meant for running on dm nodes, not computing nodes.
  • ACTION: Robin to document MARS aspects and status in email and on wiki. Work in Progress.
  • Dates regarding MARS7_dev and status on tape drives for SDC need to wait until R.Oxboro get back from leave(Sept)

Item 6: v11 software

  • No action due on this. However keep this item in the list for now.

UM 10% timing variations

  • ACTION: Joerg to continue these investigations.
  • Variation in UM7.5 narrowed down to swap_bounds_mv
  • Up to 100% variation in APS-R between partial/fully committed nodes.
  • Elapse times for VAR down to 17 minutes from 24 minutes. This new times appear solid.
  • Reconfiguration can take 2.5 - 20 minutes
  • Use of lustre striping is recommended. Optimal striping configuration to be investigated.
  • Part of Xiao's suite require running of 7 qxreconf in parallel ( 7 different files of different times)
  • Joerg to investigatd qxreconf performance.

/apps

  • Still in progress - finishing up stage?
  • Perl issue related to SCS reported.
  • Consideration of putting MARS in /apps. (Currently MARS utils are in $CWSHARE)

Difference between plots created on raijin / ngamai

  • Different matplotlib version?
  • To be investigated further, but this issue is not critical

Other business

  • Tech Talk ??
  • Open MPI icc -vs- gcc -- close and move to raijin porting issues.
  • work on Verify in progress (rab)
  • Change resolution
  • speed up required from matplotlib
  • ADDITION to note in next meeting (# 15 from tasklist): UI Bigfont issue resolved. Some tidyups remain. Migration plan to be drawn.

Last modified 6 years ago Last modified on Feb 3, 2015 4:33:50 PM