Opened 2 years ago

Last modified 19 months ago

#323

Recommended build configuration

Reported by: Martin Dix
Component: ACCESS-CM2
Description

Page to track changes to the UM build configuration in the coupled model.

Should we set up a branch that merges all these?

comment:1 Changed 2 years ago by Martin Dix

Main branch for ACCESS-CM2 coupling code. Recent fixes for rdump/sdump
Supports use of flexible calendar in UKCA emissions
10-15% time saving through avoiding unnecessary gather to PE0 for partial mean files
Fixes OpenMP bug in loop over ice categories. Required when using more than 1 thread.
Reduces DRHOOK overhead. Very little impact on runs w/o DRHOOK

comment:2 Changed 2 years ago by rb4844

Merging would be good.

Also, Arnold is looking at a new suite which combines some other recent changes including openmpi/1.10.2 and reduced model output.

See also a new ticket regarding MOM compiler flags.

So, maybe establish a new 'reference' suite we can all work from.

And if this could have the CICE build in the right place.

My 'wish' list would also include removing many/all of the extra compilers etc that we don't use, and perhaps standardise some of the naming so its a bit clearer as to what some bits are for e.g. name the CICE build/compiler appropriately, perhaps with reference to the version number e.g. MOM5_1, CICE5 etc, and its for CM2, and its own version number?

comment:3 Changed 22 months ago by Martin Dix

Experiments with UM compiler options. Each case is 10 x 1 day runs on the broadwell nodes with a 14x20 decomposition, intel-fc/ No STASH output and no dump written.

Times are for atm_step as reported by the internal timer. This excludes the model startup and so should be more representative of long runs.

Case Best (s) Mean (s) Notes
safe: -O2 -fp-model precise 89.34 90.99
high: -xavx -O3 -fp-model precise 91.14 93.82
-xavx2 -O2 -fp-model precise 90.48 93.57 Not reproducible across decompositions
-O2 -fp-model precise -align array64byte 88.07 90.88
-O2 -fp-model precise-align array64byte -fast-transcendentals -fimf-precision=high 86.70 88.36 Not reproducible across decompositions
-align array64byte -O3 -fp-model fast. 82.02 85.29 Not reproducible across decompositions

If we want to maintain reproducibility across different decompositions then the only option that helps is -align array64byte.

It's possible that -xavx2 would help on some routines.

comment:4 Changed 21 months ago by Martin Dix

branches/dev/martindix/vn10.6_gregorian_climate_means@42947 fixes a bug when using >= 1000 MPI proceses in the UM.

comment:5 Changed 19 months ago by Martin Dix

Owner: Martin Dix
Status: assigned
