Opened 3 years ago

Closed 3 years ago

#262 closed (fixed)

Upgrade to cylc 6.8.1, rose 2016.02.0

Reported by: Martin Dix Owned by: Martin Dix
Priority: minor Component: Accessdev Server
Keywords: Cc:

Description

Upgrade cylc, rose and fcm to latest versions

Change History (5)

comment:1 Changed 3 years ago by Martin Dix

Owner: set to Martin Dix
Status: newassigned

Modify cylc on raijin as described in https://accessdev.nci.org.au/trac/ticket/199#comment:7

Cylc can now retrieve log files, replacing the functionality of our rose-task-hook2. To automatically fix the permissions so that rose bush can see them, add

        retrieve job logs command = rsync -a --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r

to the localhost section of the cylc global.rc file.

If this change is made to the cylc 6.7.2 config file, cylc seems to hang on job submission, so it's necessary to have old and new versions in the puppet configuration.

It's also possible to set the configuration to automatically retrieve files and to set retry delays (alternative is to set something like this in every suite),

     retrieve job logs = True
     retrieve job logs retry delays = PT10S,  PT30S, PT1M, PT3M

Making it a default would make it more likely that suites from the Met Office would do the right thing without change. It could still be turned off on a per suite basis (or in a user site config file).

Last edited 3 years ago by Martin Dix (previous) (diff)

comment:2 Changed 3 years ago by Martin Dix

Now using rose 2016.02.1

comment:3 Changed 3 years ago by Martin Dix

Setting retrieve job logs = False in a suite doesn't work properly at the moment. This will be fixed in the next release https://github.com/cylc/cylc/pull/1728.

Setting a limit with retrieve job logs max size doesn't behave as expected. If the limit means that the job.out or job.err file isn't retrieved it will keep trying until the retry delays are exhausted.
https://groups.google.com/forum/#!topic/cylc/CNIdwwOJo9g

https://github.com/cylc/cylc/pull/1747 changes this to test either job.out or job.err which should fix things in most cases.

Next release will also fix the high CPU load issue
https://github.com/cylc/cylc/issues/1744

Next release expected March 9 so better to wait.

Last edited 3 years ago by Martin Dix (previous) (diff)

comment:4 Changed 3 years ago by Martin Dix

Now using cylc 6.9.1 and rose 2016.03.0

Setting retrieve job logs = False in either the root section of a suite or for a particular task works.

Setting log file size limit works now works as expected with no retry delay. Size limit set to 1M initially.

comment:5 Changed 3 years ago by Martin Dix

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.