wiki:Rose/SharedSuiteControl

Shared suite control

Note that this description only applies to cylc 7 and later versions. Earlier versions used different directories for some of the critical files.

The description assumes a suite owner user1 who wishes to allow user2 suite control. user1 commands are in red and user2 in blue.

The cylc documentation describes how the server authenticates connections in http://cylc.github.io/cylc/html/multi/cug-htmlse13.html#13.6

The suite itself can define several levels of access,

  • identity - only suite and owner names revealed
  • description - identity plus suite title and description
  • state-totals - identity, description, and task state totals
  • full-read - full read-only access for monitor and GUI
  • shutdown - full read access plus shutdown, but no other control.

with the default being state-totals. This means that anyone can run cylc scan and see information about your suite like

user2@accessdev% cylc scan -o user1 -f

u-aj458 user1@localhost:43076
   Title:
      (no title)
   Description:
      (no description)
   Task state totals:
      held:5 succeeded:2
      19880901T0000Z held:4 succeeded:2
      19881001T0000Z held:1

This facility doesn't give any means to distinguish between permissions given to different users.

Every suite has a passphrase in $HOME/cylc-run/SUITE/.service, normally only readable by the suite owner. Possession of this passphrase gives full read and control access to the suite. However file access control lists can be used to allow this for only selected users.

E.g., to give user2 full control over the suite au-aa398, user1 starts the suite (or just installs with rose suite-run -i), then gives read access to the passphrase with

user1@accessdev% setfacl -m u:user2:r ~/cylc-run/au-aa398/.service/passphrase 

Then user2 creates a directory and copies the passphrase

user2@accessdev% mkdir -p $HOME/.cylc/auth/user1@accessdev.nci.org.au/au-aa398
user2@accessdev% cp ~user1/cylc-run/au-aa398/.service/passphrase $HOME/.cylc/auth/user1@accessdev.nci.org.au/au-aa398

Interacting with the suite also requires knowing the port number which can be obtained from the cylc scan command, e.g.

user2@accessdev% cylc scan -o user1 -n au-aa398
au-aa398 user1@localhost:43051

Then user2 can run all the usual cylc commands on this suite by specifying user and port, e.g.

user2@accessdev% cylc monitor --user=user1 --port=43051 au-aa398
user2@accessdev% gcylc --user=user1 --port=43051 au-aa398
user2@accessdev% cylc stop --user=user1 --port=43051 au-aa398

This just gives control access to the cylc server. Jobs are still submitted to raijin by the original owner, log files still go to the original directory etc.

If the suite is stopped, the port is closed and only the original owner can restart it.

Note that the passphrase changes when the suite is reprocessed by rose (or reregistered by cylc). Stopping and restarting the suite doesn't change it.

Even more control

For some long running suites it may be necessary to allow several users to be able to stop the suite, modify the configuration and restart.

This is possible by making the suite's work and share directories shared. E.g. in suite au-aa585, rose.conf sets these as fixed directories rather than the default /short/$PROJECT/$USER

root-dir{share}=raijin*=/short/p66/user1
root-dir{share/cycle}=raijin*=/short/p66/user1
root-dir{work}=raijin*=/short/p66/user1

Before running anything, create these directories on raijin and set the ACLs

user1@raijin% mkdir -p /short/p66/mrd599/cylc-run/au-aa585
user1@raijin% setfacl -m u:user2:rwx -m d:user2:rwx /short/p66/mrd599/cylc-run/au-aa585
user1@raijin% setfacl -m u:user1:rwx -m d:user1:rwx /short/p66/mrd599/cylc-run/au-aa585

Note that it's also necessary to set permissions for the owner in order for them to be able to run again after the alternate user.

For the owner the suite runs as normal. After the suite has started to run, give read permission to the suite .service directory

user1@accessdev% setfacl -m u:user2:r ~/cylc-run/au-aa398/.service/*

Note that this is different to the command in the first section because user2 now needs access to both the passphrase and the private copy of the suite status database (which is only created at runtime).

If the suite needs to be modified, user2 can copy the passphrase and stop the suite as in the first section. Then check out the suite and change it as required. After installing the suite and before running, copy the private db file so that the new run knows the current state.

user2@accessdev% rose suite-run -i
user2@accessdev% cp ~user1/cylc-run/au-aa585/.service/db ~/cylc-run/au-aa585/.service
user2@accessdev% cylc restart au-aa585 (or other command)

Note that user2 is now running their own copy of the suite so it's not necessary to use the user and port options with the cylc commands.

After the suite starts again,

user2@accessdev% setfacl -m u:user1:r ~/cylc-run/au-aa398/.service/*

so that the original owner can also take control back.

Copying the DB is only necessary if one wants to be able to do a exact restart from within the middle of a cycle. One can do a warm start from a specific date without this.

There is one extra complication in this. The mkdir manpage on raijin says

              When  COREUTILS_CHILD_DEFAULT_ACLS  environment variable is set,
              -p/--parents option respects default umask and ACLs, as it  does
              in Red Hat Enterprise Linux 7 by default

Without this directory permissions in the share and work directories aren't inherited properly. This should be set in the root section of the suite, e.g.

[runtime]
   [[root]]
     [[[environment]]]
        COREUTILS_CHILD_DEFAULT_ACLS=true
Last modified 2 years ago Last modified on Mar 8, 2017 2:01:48 PM