Changes between Version 12 and Version 13 of ticket/370/ticket/370/TicketDetails/OpsReadFromObstore


Ignore:
Timestamp:
Oct 4, 2019 11:40:29 AM (11 days ago)
Author:
Jin Lee
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ticket/370/ticket/370/TicketDetails/OpsReadFromObstore

    v12 v13  
    33There are 2 ways to read data from obstores: one way is to let Ops_ExtractAndProcess to work out all settings from the obstore and read obs. Another way is to let Ops_CreateODB read the obstore and then write out ODB1; then let Ops_ExtractAndProcess read ODB1 and then write back to ODB1.
    44
    5 === Run Ops_ExtractAndProcess to read obstore ===
     5=== Run OpsProg_ExtractAndProcess.exe to read obstore ===
    66
    7 Here's a list of things to keep in mind when running Ops_ExtractAndProcess this way:
     7Here's a list of things to keep in mind when running OpsProg_ExtractAndProcess.exe this way:
    88
    99   * It's safer to let OPS determine various parameters - e.g. batch numbers, buffer sizes, etc. - rather than setting them in `extractcontrolnl` namelist. This means in OPS app config file remove entire `extractcontrolnl` namelist as well as the file that normally holds the namelist
     
    1717   * Estimate the amount of memory required to read observations and allocate space within OPS program. Then use PBS resource request just enough to finish processing. This is based on my hunch that the reason for the failure stems from the fact that there might be not enough observations in some PE's and so OPS is allocating memory to certain variables during CX creation, which may be empty (Again this is only my hunch).
    1818
    19 === Run Ops_CreateODB and Ops_ExtractAndProcess to read obstores and write out ODB1 ===
     19=== Run OpsProg_CreateODB.exe and OpsProg_ExtractAndProcess.exe to read obstores and write out ODB1 ===
    2020
    21 Outside of UKMO this method should be used as it produces updated ODB1 which can be used by VER. Here's a list of things to keep in mind when running Ops_CreateODB and Ops_ExtractAndProcess this way:
     21Outside of UKMO this method should be used as it produces updated ODB1 which can be used by VER. Here's a list of things to keep in mind when running OpsProg_CreateODB.exe and OpsProg_ExtractAndProcess.exe this way:
    2222
    23    * make sure `maxbatchessubtype` is set to a high enough number to be able to read all the batches in a obstore file. If all the data are not read in then you will see in stdout/stderr a message like,
     23   * make sure `maxbatchessubtype` is set to a high enough number to be able to read all the batches in a obstore file. If all the data are not read in then you will see in stdout/stderr from OpsProg_CreateODB.exe a message like,
    2424
    2525   {{{
     
    3434     * for a certain type of memory error it appears the error occurs when the program is trying to distribute observations to other PE's; the solution for this type of error is to increase PBS core request
    3535
    36    * For obstore files which have large numbers of batches failures can occur with either Ops_CreateODB or Ops_ExtractAndProcess:
    37      * Ops_CreateODB fails - decrease buffersize (which is roughly the number of observations in each batch) in inverse proportion to the larger number of batches
    38      * Ops_ExtractAndProcess fails - if the failure happens towards the end of the processing where updating of ODB1 takes place then increasing the number of nodes and memory can fix this problem
     36   * For obstore files which have large numbers of batches failures can occur with either with OpsProg_CreateODB.exe or OpsProg_ExtractAndProcess.exe:
     37     * OpsProg_CreateODB.exe fails - decrease buffersize (which is roughly the number of observations in each batch) in inverse proportion to the larger number of batches
     38     * OpsProg_ExtractAndProcess.exe fails - if the failure happens towards the end of the processing where updating of ODB1 takes place then increasing the number of nodes and memory can fix this problem
    3939     * for some obsgroups - e.g. sonde - the number of batches used in its obstore may be unusually large. This is fixed by using nodes which have larger memories.
    4040     * for obstype of satwind and surface no amount of fine-tuning allow the tasks to read all observations. It's possible the number of observations as reported by print-obstore is not correct