Changes between Version 19 and Version 20 of access/TotalviewCylc


Ignore:
Timestamp:
Feb 16, 2021 12:01:30 PM (11 days ago)
Author:
Susan Rennie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • access/TotalviewCylc

    v19 v20  
    106106
    107107A workaround is to shut down the suite and restart it. This seems to fix the problem.
     108
     109-------------------
     110= Using Totalview on gadi
     111
     112== Job configuration
     113
     114In your suite, or job, you could avoid using rose mpi-run, and instead use an explicit launcher. Most calling scripts (in UM, OPS, VAR, SURF) have a variable called e.g. RECON_LAUNCHER, or OPS_LAUNCHER, which is the alternative to using rose mpi-launch.
     115
     116Make sure that any variables that are used by rose-mpi-launch to do something, are instead defined otherwise, e.g. setting ulimits.
     117
     118For intel-mpi set the launcher variable to e.g. {{tvconnect $(which mpiexec.hydra) --tv --debug -n $NPROC}} or whichever are your $ROSE_LAUNCHER_PREOPTS normally.
     119{{tvconnect}} creates a way for a reverse connection to act to connect to totalview once the job begins.
     120
     121Note for intel-mpi, the mpirun wrapper may not pass {{--tv}} properly, so explicit specification of mpiexec.hydra is needed.
     122
     123For openmpi, {{--tv}} is not needed, and the mpirun wrapper should work.
     124
     125Ensure the totalview module is loaded in your PBS job also. And give your job a longer walltime.
     126
     127== Running Totalview
     128On gadi, load the totalview module, and launch totalview. Check under the file menu that it is looking for reverse connections.
     129
     130Once your job begins on gadi, totalview should give you a prompt to connect to the job. Once you have done so, hit "go" (green play button). It should then ask you what you want to do about starting a parallel job.
     131
     132Useful pages
     133- https://wikis.uni-paderborn.de/pc2doc/Noctua-Software-TotalView (someone else's wiki on using Totalview)
     134- [https://help.totalview.io/current/HTML/index.html#page/TotalView/totalviewlhug-reverse-connect.16.01.html# Totalview help on reverse connections]
     135- https://opus.nci.org.au/display/Help/Totalview (NCI help on Totalview)