wiki:access/NewSun_006

CAWCR-BoM ACCESS NWP Ngamai Migration Working Group


CAWCR-BoM ACCESS-NWP Ngamai Porting Working Group Meeting Notes

Meeting 6: Wednesday 14th August 2013, 9E Meeting Room
Present: Michael Naughton, Jim Fraser, Chris Tingwell, Robin Bowen, Ilia Bermous, Martin Dix, Wenming Lu, Zhihong Li, Yi Xiao, Robert Jukic, Joerg Henrichs, Joan Fernon (phone),
Apologies: Asri Sulaiman, Ivor Blockley


Agenda

  • Review meeting notes
  • Project management
  • Main items in task list
  • Other items from Task List
  • Other business


Review meeting notes

  • New format of meeting notes
  • Task table will be updated at intervals, not necessarily every meeting

Project management

  • ACTION: Robin will draw up draft of steps and resource reporting, discuss with individuals responsible for each of the work items. Status update -- in progress.
  • Steering Committee meeting early next week; Robin will include WG progress.
  • Mike reported on discussion with Tim Pugh around understanding relative priorities of Solar switch-off, Project Planning, Configuration Management -- solar switch-off is #1 priority, others items are also required within project, but not to delay solar switch-off timing, project will continue after solar switch-off until all steps completed.

Item 26: APS1 NWP suites

AG1

  • Xiao's ngamai AG1 test NWP suite running successfully.
  • Started Friday, runs quickly, faster than solar, 1/2 hr for 00 & 12, 15min for 06 & 18.
  • Slowest step is retrieving obs files from sam; this was avoided by using files already retrieved and saved up at NCI for N512 cycling.
  • One month run from 25-Jun to 23-Jul.
  • Sample difference chart for Z500 72hr forc has considerably smaller values/range than diff between two AGREPS members (courtesy of David Smith).
  • Verified; almost identical with operational AG1 scores; skill/rms/anom_corr same, bias similar.
  • 3xcrashes on weekend due to "bad nodes" system problems; reruns OK; reported to Oracle.
  • ACTION: Continue running for time being -- catch up to current time, then run in current time.
  • Chris, Zhihong have looked at OPS numbers and VAR convergence -- both look as similar to AG1 as expected for parallel run.
  • Chris noted that very first cycle outputs can't be examined, since operational run files are not still available from 25-Jun; this has been discussed with NMOC, who are now keeping output files longer. First cycles will be examined for operational instance, and AR1 testing.
  • Martin has produced RMS difference vs time plots for several runs compared to reference solar run, show that compiler and machine differences are comparable to other non-science changes, e.g. changing number of processors. Results are in APS1G_rmsdifferences. Note that this uses "Martin's Worms" (Item 17i) diagnostic.
  • ACTION: Wenming to produce "ACCESS-GN" charts when chart plotting on ngamai is available.
  • ACTION: Look at running Gary Dietachmayer et al.'s prototype diagnostic tools on porting versions.

AR1

  • Xiao setting up ngamai AR1 test NWP suite.
  • Hopes to be up to running cycles in next week.
  • Plans to pre-fetch obs files from sam to save time.
  • Need to have both early and late cutoff cycles to reproduce operational running; late cutoff are used for assimilation cycling, early cutoff for 3-day forecasts.

AC1

  • Wenming's ngamai AC1 SMS suite running successfully.
  • 00Z 7-Jul case run successfully for all 6 ACCESS-C domains.
  • Run times 18-28% faster than solar; 18% for largest domain (VT), 28% for smallest.
    • Openmpi 1.6.5 used.
    • No -xHost.
    • -xavx made only small difference.
    • Still need to tune for #cores per node.
    • Machine is currently much less busy than solar, especially re disk activity; could see slowdowns later as machine usage rises; noting that in operations some ACCESS-C domains run at same time as ACCESS-R is still running.
    • Note also that ngamai i/o speed is approx. double solar.
  • Forecast fields look OK in xconv.
  • ACTION: Wenming to produce "ACCESS-CN" charts when chart plotting on ngamai is available.
  • ACTION: Wenming to run 1-month trial for 20-Mar to 20-Apr period using IC & LBC files Joan has available on solar, to eliminate need for retrieving AR1 ic & pi files from sam and re-processing.
  • NOTE: Later on, will probably also want to run summer period for AR1 testing.

Item 18: MARS

  • new linux version not yet ready on ngamai; Arn's team working on this.
  • old solar version now working on ngamai (Tan Le verbal update before meeting).
  • ACTION: Robin to document MARS aspects and status in email and on wiki.

Item 6: v11 software

  • No further steps on v11 installation since last week; Justin still on leave.
  • Reviewed priority of this task, given successful progress being achieved with v12 build versions.
    • Decided v11 option is now fairly unlikely to be needed, can be re-activated and achieved fairly quickly if the need does arise.
  • ACTION: v11 compiler building request to be placed on hold.

Item 17: Building executables

  • Documented executable builds
    • Ilia's email after last meeting explained issues with UM & VAR builds.
    • UM: need to modify UMUI scripts to enable software versions to be varied as needed.
    • ACTION: Ilia, Asri, Xiao, Martin to work this out.
    • VAR: need to handle extraction of correct versions of all sources for VAR building.
    • ACTION: Ilia, Xiao, Asri.
    • umui will also be needed on ngamai for ngamai building.
  • UM 10% timing variations
    • Joerg is investigating using Ilia's run environment, has not found variations as large as Ilia so far; has found run variations are dominated by swap-bounds, i.e. MPI exchange of haloes.
    • ACTION: Joerg to continue these investigations.

Other items from Task List

  • Item 3: Robin reported /apps progress.
    • Perl should be available now, as needed by fcm.
    • Python should be available by end of the week, matching what's available on solar.
    • Items required by Tan for MARS/GRIB are also being worked on.

Other business

  • NMOC 31-Jul Tech Talk on Ngamai Porting
    • Jim will request Ivor to repeat this Tech Talk for CAWCR porting team and others at a later date.
  • Item 6: Openmpi is built with icc on ngamai, gcc at NCI. Bureau has tried both, icc version is faster on Bureau machines. Joerg has also tried both on raijin, didn't see the same difference. (Could create new item for this if requierd.)
  • Item 36: Verify.
    • Jim asked about status of verify.
    • Robin reported that verify porting needs some additional /apps items first, which are on the list for installation.
    • raijin verify installation will require additional required /apps items to go into /access/apps.
      • Same goes for /apps items required for GRIB.

Last modified 6 years ago Last modified on Feb 3, 2015 4:33:49 PM