Changes between Version 7 and Version 8 of ticket/370/ticket/370/TicketDetails/OpsReadFromObstore


Ignore:
Timestamp:
Sep 9, 2019 12:00:50 PM (3 months ago)
Author:
Jin Lee
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ticket/370/ticket/370/TicketDetails/OpsReadFromObstore

    v7 v8  
    11== Problems encountered when OPS reads observational data from obstore files ==
    22
    3 There are 2 ways to read data from obstores: one way is to let Ops_ExtractAndProcess to work out all settings from the obstore and read obs. Here's a list of things to keep in mind when running Ops_ExtractAndProcess this way:
     3There are 2 ways to read data from obstores: one way is to let Ops_ExtractAndProcess to work out all settings from the obstore and read obs. Another way is to let Ops_CreateODB read the obstore and then write out ODB1; then let Ops_ExtractAndProcess read ODB1 and then write back to ODB1.
     4
     5=== Run Ops_ExtractAndProcess to read obstore ===
     6
     7Here's a list of things to keep in mind when running Ops_ExtractAndProcess this way:
    48
    59   * It's safer to let OPS determine various parameters - e.g. batch numbers, buffer sizes, etc. - rather than setting them in `extractcontrolnl` namelist. This means in OPS app config file remove entire `extractcontrolnl` namelist as well as the file that normally holds the namelist
     
    1317   * Estimate the amount of memory required to read observations and allocate space within OPS program. Then use PBS resource request just enough to finish processing. This is based on my hunch that the reason for the failure stems from the fact that there might be not enough observations in some PE's and so OPS is allocating memory to certain variables during CX creation, which may be empty (Again this is only my hunch).
    1418
    15 Another way is to let Ops_CreateODB read the obstore and then write out ODB1; then let Ops_ExtractAndProcess read ODB1 and then write back to ODB1. This method should be used as it produces updated ODB1 which can be used by VER. Here's a list of things to keep in mind when running Ops_ExtractAndProcess this way:
     19=== Run Ops_CreateODB and Ops_ExtractAndProcess to read obstores and write out ODB1 ===
    1620
    17    * make sure `maxbatchessubtype` is set to a high enough number to be able to read all the batches in a obstore file. If all the data are not read in then you will see in stdout something like, `More batches of AMSR2 data are available but batch 21 is the Final batch.`
     21Outside of UKMO this method should be used as it produces updated ODB1 which can be used by VER. Here's a list of things to keep in mind when running Ops_CreateODB and Ops_ExtractAndProcess this way:
    1822
    19 One difficulty when using the second method is that it's hard to know whether OPS retrieved all the data from an obstore file correctly. To make sure that all data are retrieved use the first method and then put together the app config file using the second method while comparing the log output.
     23   * make sure `maxbatchessubtype` is set to a high enough number to be able to read all the batches in a obstore file. If all the data are not read in then you will see in stdout/stderr a message like,
     24
     25   {{{
     26   More batches of AMSR2 data are available but batch 21 is the Final batch.
     27   }}}
     28
     29   One difficulty when using the second method is that it's hard to know whether OPS retrieved all the data from an obstore file correctly. To make sure that all data are retrieved use the first method and then put together the app config file using the second method while comparing the log output.
     30
     31   * Depending on the size of each batch in the obstore the job may run of out memory,
     32     * this does not depend on the number assigned to `maxbatchessubtype`
     33     * increasing `maxbatchessubtype` beyond the number of batches in the obstore doesn't seem to have any effect
     34     * it appears for a certain type of memory error the error occurs when the program is trying to distributes observations to other PE's; the solution for this type of error is to increase PBS core request
    2035
    2136=== Resources ===