Opened 5 years ago

Last modified 4 years ago

#93 new

HPC usage analytics

Reported by: Scott Wales Owned by:
Priority: major Component: Accessdev Server
Keywords: Cc: Martin Dix, Michael Naughton


How can we monitor usage of the HPC using the ACCESS system?

Things we might want to collect:

  • Who is using the system? (CoE, BoM, CSIRO)
  • What are people running? (UM version, global vs. LAM)
  • What resources are being asked for / used?

How we might go about it:

  • Log job information within Rose/UMUI when a job is submitted
  • Use statistics from PBS by adding a tag to ACCESS jobs

Change History (9)

comment:1 Changed 5 years ago by Scott Wales


Summarise what users connect to the system & what commands get run using psacct

logger command: write to syslog, e.g.

logger -t UMUI "${USER} ${PROJECT} submitted $RUNID"

creates a line in /var/log/messages like:

Apr  9 11:28:57 accessdev UMUI: saw562 w35 submitted vabcd

Querying the logs may require some tooling

comment:2 Changed 5 years ago by Scott Wales

This looks interesting - open source log monitoring & analytics

comment:3 Changed 5 years ago by Scott Wales

Demo analytics setup for UMUI - The UMUI POSTs a json file containing job info to an elasticsearch instance when a job is processed, Kibana then produces analytics from the data in elasticsearch

Requires following UMUI patch to send basis info (ATM this is only being done on a test instance, not on accessdev proper):

  • UM/um_nav_actions.tcl

    110110    unset processing_in_progress
    112112    set processing_done 1
     114        # JSON document for elasticsearch
     115        set json "{"
     116        append json "\"user\":\"$::env(USER)\""
     117        append json ",\"project\":\"$::env(PROJECT)\""
     118        append json ",\"basis\":[ jsonify_basis ]"
     119        append json ",\"timestamp\":\"[ clock format [ clock seconds ] -gmt true -format "%Y-%m-%dT%H:%M:%SZ" ]\""
     120        append json "}"
     122        # POST the JSON document to the elasticsearch server
     123        # Just use CURL as I don't understand TCL's http library
     124        set url "http://localhost:9200/umui/process"
     125        exec -ignorestderr curl -XPOST $url -d @- << $json
     127        set testfile [open ~/tmp w]
     128    puts $testfile $json
     129        close $testfile
     132# Turn the basis file into json format
     133# Result looks like {"key":"value","key":["a","b","c"]}
     134proc jsonify_basis {} {
     135        # Json doesn't allow for a trailing comma, so we'll play a trick here
     136        set delim ""
     138        set result "{"
     139        set variableNames [ get_allVariableName ]
     140        foreach key $variableNames {
     141                # Is this variable an array?
     142                if [ regexp {\(.*\)} $key ] {
     143                        # Strip array size
     144                        set key [ regsub {\(.*\)} $key ""]
     145                        set value [ jsonify_list [ get_variable_array $key ] ]
     146                } else {
     147                        set value "\"[ get_variable_value $key ]\""
     148                }
     150                append result $delim "\"$key\": $value"
     151                set delim ","
     152        }
     153        append result "}"
     155        return $result
     158# Turn a TCL list into json
     159# Result looks like '[ "a", "b", "c" ]'
     160proc jsonify_list {a} {
     161        set delim ""
     163        set result "\["
     164        foreach value $a {
     165                append result $delim "\"$value\""
     166                set delim ","
     167        }
     168        append result "\]"
     170        return $result
    117173# um_nav_submit
    118174#  Called when Submit button pressed
    119175# Comments

comment:4 Changed 5 years ago by ibc599

Can we set this up for Rose?

Looks like you are using** for umui submission?

Can we add another url for rose?

comment:5 Changed 5 years ago by Scott Wales

Certainly, I've added indexes called 'rose' and 'rose-test' that can be accessed from accessdev.

To add entries POST json documents to a subdirectory of the index e.g.,

curl -xPOST -d '{
     "user":      "ibc599",
     "runid":     "abc00",
     "timestamp": "1900-01-01"

The type subdirectories correspond to a database table - they should all have the same argument list. For the UMUI stuff I've put each UM version into a different type since the basis variable list changes between versions.

The documentation on the elasticsearch website is pretty good for getting a handle on how to use it for submitting and searching through documents -

comment:6 Changed 5 years ago by Scott Wales

milestone: Future

Milestone Future deleted

comment:7 Changed 4 years ago by Martin Dix

I'd like to experiment with this to track which cylc versions are being used.

However trying

curl -XPOST -d '{ ... '}

from the command line gave an authentication error.

Adding -u mrd599 prompted for a password and then failed with

The requested URL /rose-test/my_type was not found on this server.

comment:8 Changed 4 years ago by Scott Wales

'rose-test' hadn't been set up in apache, fixed now

comment:9 Changed 4 years ago by Martin Dix

Uploading works ok now.

However the timestamp isn't behaving. If I do

curl -XGET ''

I get entries like

{"_index":"rose-test","_type":"my_type","_id":"TweMNn4BTI2gA2nvumqJkw","_score":1.0, "_source" : {
 "user" : "mrd599",
 "runid" : "test",
 "project" : "p66",
 "hostname" : ""

The timestamps look the same from the umui index

However they don't seem to be interpreted correctly. Using a cut-down example from one of the charts

curl -XGET '' -d '{
  "facets": {
    "terms": {
      "terms": {
        "field": "timestamp"
  "size": 0

gives split up parts of the time string like

      "terms" : [ {
        "term" : "2015",
        "count" : 6
      }, {
        "term" : "02",
        "count" : 6
      }, {
        "term" : "30z",
        "count" : 3

rather than the proper timestamps I get using the umui index.

Is there some extra step to tell it that timestamp should be interpreted as a time?

Note: See TracTickets for help on using tickets.