Opened 18 months ago

Closed 8 weeks ago

#347 closed (fixed)

Error in retry of a failed run

Reported by: Martin Dix Owned by:
Priority: major Component: ACCESS-CM2
Keywords: Cc:

Description

Suites are configured to automatically retry on failure. This works ok for an MPI related failure right at the start.

However if the model runs a few days and writes the partial sum files the retry will fail because these partial sums are later than the restarted model time.

Change History (2)

comment:1 Changed 18 months ago by Martin Dix

Not an issue for the Met Office because they write both restarts and partial sums every 10 days.

Gregorian calendar run writes the partial sums every day.

Removing them before retry should be ok as long as we only save monthly means from the UM, not seasonal means.

comment:2 Changed 8 weeks ago by Martin Dix

Resolution: fixed
Status: assignedclosed

Writing the partial sums to /jobfs gets around this problem because it's not persistent. It should also be more efficient than using /short for these files.

https://code.metoffice.gov.uk/trac/roses-u/changeset/106959/b/f/4/8/1/trunk.

Note: See TracTickets for help on using tickets.