Opened 5 years ago

Closed 4 years ago

#148 closed (fixed)

Support multiple versions of cylc

Reported by: Scott Wales Owned by: Martin Dix
Priority: major Component: Accessdev Server
Keywords: TIWG Cc:

Description

Cylc can support multiple versions side by side, by installing e.g. cylc-5 and cylc-6 as adjacent folders.

Perhaps something in puppet to support this like

class {'cylc':
    ...
    supported_versions => [5.1.3, 5.3.6, 6.0.0, latest]
    ...
}

Change History (14)

comment:1 Changed 5 years ago by Martin Dix

Owner: set to Martin Dix
Status: newaccepted

Branch mrd599/cylc6 implements this, https://accessdev.nci.org.au/trac/log/puppet?rev=9403442d8f9b1e8c5942da98049758a2e9ba2c16

The cylc documentation recommends installing versions as /usr/local/cylc/cylc-5.4.14, /usr/local/cylc/cylc-6.0.0 and using cylc-wrapper as /usr/local/cylc/bin/cylc. This uses the CYLC_VERSION environment variable to select a particular version, defaulting to /usr/local/cylc/cylc (linked to 5.14.14 here).

At the moment modules/cylc/manifests/init.pp has been modified to add a cylc6 installation rather than using the more elegant list idea above. This should be cleaned up in the future.

The CYLC_VERSION variable gets passed to cylc on raijin. To get the correct version the wrapper script /projects/access/bin/rose needs to be modified to use

module load rose/$ROSE_VERSION
module load cylc/wrapper

This should work independently of the rose/cylc upgrade

Cylc on raijin communicates back to accessdev via ssh, effectively doing

ssh accessdev /usr/local/cylc/bin/cylc ...

or

ssh accessdev /usr/local/cylc/cylc-6.0.0/bin/cylc ...

This is intercepted by the remote-job-submission script which sanitises it to

bash ... cylc

with a restricted path. This means that the version number is lost and all cylc versions on raijin connect to the same default version on accessdev. In practice this seems to work ok, perhaps because it's only doing simple status updates.

The only problem I’ve found is that the log file job-activity.log (new with cylc6) doesn’t get updated. This is just a diagnostic so doesn’t affect the running of suites.

It’d be nice to make it connect to the correct version but it would require some work on the remote-job-submission script which I don’t fully understand.

The new version of rose seems to run existing suites with cylc5 properly but rose-suite-hook (as used by rose-task-hook2) doesn’t bring the cylc5 style log files back to accessdev. Therefore it seems we need both versions of rose. Fortunately the new rose-bush seems to work ok with both log directory structures so we only need one instance of that.

I’ve created a wrapper /usr/local/rose/bin/rose that uses ROSE_VERSION to select /usr/local/rose/2014-05 (default) or /usr/local/rose/2014.09.0. When a suite is started with a particular version of rose, cylc keeps track of ROSE_VERSION so the appropriate version gets used.

There are separate puppet modules for rose (installs 2014.09.0) and rose_cylc5 (installs 2014-05). Only the first one installs the server for rose-bush etc. Unfortunately version numbers have to be hardcoded in the paths in modules/rose/templates/rose-wsgi.conf. Using a variable $revision didn't work. Using two separate rose modules is inconsistent with the way the two cylc versions are installed but was simpler.

My test suite au-aa123 kept running ok across a system update from master to mrd599/cylc6 using puppet apply. Updating like this leaves extra files in /usr/local/rose, /usr/local/cylc and /usr/local/cylc/bin that aren't there in a clean reboot.

Last edited 5 years ago by Martin Dix (previous) (diff)

comment:2 Changed 4 years ago by Martin Dix

Branch now installs cylc 6.1.1 and rose 2014.11.1

comment:3 Changed 4 years ago by Scott Wales

Martin,

I've put together some changes in the branch 'saw562/cylc6' to improve the support for multiple Cylc & Rose versions.

These use the method I described of having an array specifying the version numbers to install, which can be specified in hieradata/project.yaml along with the default version to use.

At the moment my branch deletes any versions that aren't explicitly listed in the project.yaml file, but that's easy enough to change if needed. The rose-wsgi.conf file has also been updated to read the version number from Puppet.

I've tested that Rose & Cylc run ok in a test VM - although it looks like the latest version of Rosie requires Python 2.7. I've not tested the server stuff thoroughly yet.

Would you or Wenming like to try putting the branch onto accessdev-test and trying it out?

Cheers, Scott

comment:4 Changed 4 years ago by Scott Wales

rosie ls with the new Rose version and Python 2.6 is crashing with the message

Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 71, in worker
    put((job, i, result))
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 366, in put
    return send(obj)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

It looks like updating the 'requests' package fixes this, I'll add a fix to my branch

comment:5 Changed 4 years ago by Martin Dix

A few configuration problems on accessdev-test

% cylc -h
/usr/local/cylc/bin/cylc: line 55: /usr/local/cylc/cylc/bin/cylc: No such file or directory

The wrapper should use

if [[ -z ${CYLC_HOME:-} ]]; then
    if [[ -n ${CYLC_VERSION:-} && -d $CYLC_HOME_ROOT/cylc-$CYLC_VERSION ]]; then
        CYLC_HOME=$CYLC_HOME_ROOT/cylc-$CYLC_VERSION
    else
        CYLC_HOME=$CYLC_HOME_ROOT/default
    fi
fi

Cylc site.rc configuration file is in /usr/local/cylc/cylc-5.4.14/siterc. It should be /usr/local/cylc/cylc-5.4.14/conf/siterc. Same for cylc6.

Rose stem expects a keyword um_aux.xm. keyword.cfg should have

location{primary}[um_aux.x]  = https://code.metoffice.gov.uk/svn/um/aux
location{primary}[um_aux.xm] = https://130.56.244.76/svn/um/aux

This is just a different subdirectory of the UM repo, so it's already getting mirrored.

With these changes, simple tests and a full vn10.0 rose stem suite run ok.

Simple suites that just run scripts on raijin are ok with cylc5 and cylc6, running as accesstester.

https://accessdev-test.nci.org.au/rose-bush is using 2014-05. This doesn't show output from cylc6 properly because of the change of log directory structure, e.g. compare
https://accessdev-test.nci.org.au/rose-bush/cycles/mrd599/simple_cycle and https://accessdev-test.nci.org.au/rose-bush/cycles/mrd599/simple_cycle6.

rose-bush with newer versions works with both structures. This is an unfortunate complication because it means the version here has to be different to the default rose version.

Last edited 4 years ago by Martin Dix (previous) (diff)

comment:6 Changed 4 years ago by Scott Wales

I've put in fixes for these issues and updated accessdev-test. rosie go now works correctly, although users will get a request to login to code.metoffice.gov.uk when they start it up since that's now one of the default Rose databases.

Not sure about the best way to handle this - if you can't log in with a code.metoffice account then rosie just quits. I assume not everyone has an account yet.

For the rose-bush issue it's probably easiest just to wait until we can update the default to the newer Rose version

comment:7 Changed 4 years ago by Scott Wales

The Met Office have added a fix for the rosie crash in version 2015.02.0. It won't help for people using the old version though, so I think for the moment we should disable the 'u' repository until everyone's on the new version.

I've updated accessdev-test to the latest rose & cylc versions and merged all the recent changes from the master branch. If this works for you I'll put it onto accessdev proper

comment:8 Changed 4 years ago by Martin Dix

Worked ok as me and as accesstester after I added the matching rose and cylc modules on raijin.

It's now possible to specify a cylc wrapper in the configuration (#101) so can you change the cylc configuration file to

[hosts]
    [[localhost]]
        task communication method = ssh
        use login shell = False
        remote shell template = ssh -Y -oBatchMode=yes %s
    [[raijin.*]]
        cylc executable = /projects/access/bin/cylc

I've created this wrapper on raijin.

At the same time as this all gets installed on accessdev, we should update raijin:~access/bin/rose to use

module load rose/$ROSE_VERSION
module load cylc/wrapper

Some usage information at https://accessdev.nci.org.au/trac/wiki/access/CylcSix

comment:9 Changed 4 years ago by Martin Dix

Cylc 6.3.1 adds a change to the way CYLC_VERSION is propagated to the remote system. I didn't find any problems with 6.3.0, but probably should use the newer one just in case.

comment:10 Changed 4 years ago by Martin Dix

Cylc6 runs tasks on raijin with a command like

ssh -Y -oBatchMode=yes raijin.nci.org.au 'CYLC_VERSION='"'"'6.3.0'"'"' "/projects/access/bin/cylc" '"'"'job-submit'"'"' --remote-mode "$HOME/cylc-run/simple_cycle6/log/job/20020101T0000Z/postproc/01/job"'

CYLC_VERSION is passed to raijin but ROSE_VERSION is not. Somehow I originally misled myself that it was getting passed and that doing

module load rose/$ROSE_VERSION

in the wrapper would work.

Instead the wrappers now choose an appropriate rose module based on CYLC_VERSION. Both ~access/bin/rose and ~access/bin/cylc do

module load cylc/wrapper
if [[ $CYLC_VERSION = 6* ]]; then
    module load rose/2015.02.0
else
    module load rose/2014-04
fi

Might be better to set up modules like rose_cylc5 and rose_cylc6 so these versions weren't hardwired in the scripts.

comment:11 Changed 4 years ago by Scott Wales

As part of its pre-processing Rose passes the Rose version to Cylc as a Jinja variable

{% set ROSE_ORIG_HOST="accessdev.nci.org.au" %}
{% set ROSE_VERSION="2014-05" %}
{% set RUN_NAMES=['nci_n48_noomp'] %}

The Rose stem suite.rc adds the version as an environment variable in the initial scripting

    [[root]]
        initial scripting = """
export CYLC_VERSION={{CYLC_VERSION}}
export ROSE_VERSION={{ROSE_VERSION}}
export FCM_VERSION={{FCM_VERSION}}
"""

comment:12 Changed 4 years ago by Michael Naughton

How do we support updated versions? Perhaps CYLC_VERSION=latest?

How long do we support installed versions for?

Update default to current latest version (cylc 6) Give users a few days warning

comment:13 Changed 4 years ago by Scott Wales

Keywords: TIWG added

comment:14 Changed 4 years ago by Scott Wales

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.