Opened 20 months ago

Last modified 20 months ago

#345 new

Cylc site DB errors

Reported by: Martin Dix Owned by:
Priority: major Component: Accessdev Server
Keywords: Cc: Jin Lee

Description

Jin has had several instances of a suite failing with an error message in the suite log

2017-10-18T14:02:31Z INFO - Suite shutting down - ERROR: database disk image is malformed

The "database disk image is malformed" message comes from sqlite3 trying to write to log/suite/db on accessdev.

However the suite can be restarted successfully, which means that the DB isn't really corrupted, unlike the problems we had with the rose-ana DB on raijin https://github.com/metomi/rose/issues/1897

Change History (2)

comment:1 Changed 20 months ago by Martin Dix

Could this be related to the occasional very slow disk access on accessdev, perhaps a misreported timeout problem?

However error occurred at about midnight local time. NCI dashboard shows load average was < 5 then, c.f. peak of 25 in middle of day, so suggesting not load related.

Last edited 20 months ago by Martin Dix (previous) (diff)

comment:2 Changed 20 months ago by Martin Dix

cylc has profiling tests. To run, copy cylc directory and modify dev/profile-experiments/complex.json to use batch-system=background.

Run with

cylc profile-battery --experiments complex

On acessdev-test

Version  Run            Elapsed Time (s)  CPU Time - Total (s)  Max Memory (kb)
HEAD     complex suite  3275.6            279.3                 83380.0       

Elapsed time is too large by factor of 2 (effect of dual CPU machine?).

DB reached about 6 MB, c.f. Jin's suite which reached 10 MB over a much longer period.

No DB problems on accessdev-test.

Note: See TracTickets for help on using tickets.