wiki:access/BomAccessDocumentation/bom-dp9-management/policy

Policy: dp9 Resource Management



General

  • There is no individual soft quota for core users of dp9.
  • Maximum share of a core user on /scratch/dp9 disk space is 30% of total dp9 quota; users with higher usage will be notified daily by email to reduce their usage to under 30%.
  • Maximum share of a core user on /g/data/dp9 disk space is 15% of total dp9 quota; users with higher usage will be notified daily by email to reduce their usage to under 15%.
  • Maximum share of a core user on /g/data/dp9 inodes is 15% of total dp9 quota; users with excess number of inodes will be notified daily by email to reduce their usage to under 15%.
  • Maximum share of a core user on mdss disk space is 10%; users with higher usage will be notified daily by email to reduce their usage to under 10%.
  • These maximum core user shares are subject to change if the allocated resource for dp9 is updated.


Emergency

  • An emergency is a situation with any combinations of the following scenarios
    • Total /g/data/dp9 disk space usage exceeds 85%.
    • Total /g/data/dp9 inode usage exceeeds 85%.
    • Total mdss tape drive usage exceeds 90%.
    • Total /scratch/dp9 disk space usage exceeds 90%.
  • When the total usage of /g/data/dp9 disk space exceeds 85%, all users with more than 5T of usage will be notified daily to reduce their usage until the total usage is down below 85%.
  • When the total usage of /g/data/dp9 inodes exceeds 85%, all users with more than 2M inodes will be notified daily to reduce their inodes until the total usage is down below 85%.
  • When the total usage of mdss tape drive exceeds 90%, all users with more than 50T of usage will be notified daily to reduce their usage until the total usage is down below 90%.
  • When the total usage of /scratch/dp9 disk space exceeds 90%, all users with more than 200T of usage will be notified daily to reduce the usage until the total usage is down below 90%.


Disaster

  • A disaster is a situation with any or any combinations of the following scenarios
    • Total /g/data/dp9 disk space exceeds 100%.
    • Total /g/data/dp9 inode usage exceeds 100%.
    • Total mdss tape drive usage exceeds 100%.
    • Total /scratch/dp9 disk space exceeds 100%.
    • In a disaster, all jobs submission for project dp9 will be suspended so immediate management action will be taken to address the issue.
    • The goal is reduce the usage of the over-quota resource to sit below 95% of its quota.
    • The job will be exectued by NCI administrators on the request from dp9 resource manager.
  • For /g/data/dp9 disk space, the following steps will be taken sequentially until the goal is reached
    • Step 1: Reduce all users with usage more than 15% of total /g/data/dp9 quota (33T of 220T) to 15% of the total quota. At this time, no set algorithm is specified for the order of files to be deleted for each user for disaster deletion; i.e. users should assume that any files may be deleted.
    • Step 2: Set the threshold to 30T, and reduce all users with usage more than 30T on /g/data/dp9 to 30T.
    • Step 3: Apply the decrement of 1T to threshold, and reduce users with usage more than the threshold to the level; repeat the process until reaching the goal.
  • For /g/data/dp9 inodes, the following steps will be taken until the goal is reached
    • Step 1: Reduce all users with inodes more than 15% of total /g/data/dp9 inode quota (6.75M of 45M) to 15% of the total inode quota.
    • Step 2: Set the threshold to 6M, and reduce all users with usage more than 6M on /g/data/dp9 to 6M.
    • Step 3: Apply the decrement of 0.5M to threshold, and reduce users with usage more than the threshold to the level; repeat the process until reaching the goal.
  • For mdss tape drive, the following steps will be taken until the goal is reached
    • Step 1: Reduce all users with usage more than 10% of total mdss quota (170T of 1700T) to 10% of the total quota.
    • Step 2: Set the threshold to 160T, and reduce all users with usage more than 160T on mdss to 160T.
    • Step 3: Apply the decrement of 10T to threshold, and reduce users with usage more than the threshold to the level; repeat the process until reaching the goal.
  • For /scratch/dp9 disk space, the following steps will be taken sequentially until the goal is reached
    • Step 1: Reduce all users with usage more than 30% of total /scratch/dp9 quota (240T of 800T) to 30% of the total quota. At this time, no set algorithm is specified for the order of files to be deleted for each user for disaster deletion; i.e. users should assume that any files may be deleted.
    • Step 2: Set the threshold to 200T, and reduce all users with usage more than 200T on /scratch/dp9 to 200T.
    • Step 3: Apply the decrement of 50T to threshold, and reduce users with usage more than the threshold to the level; repeat the process until reaching the goal.
  • If a user is not available, files from the user will be randomly removed from disk and/or tape to meet the target.


Guest users

  • Maximum disk space of 500G on /g/data/dp9 can be used by a guest user for collaboration work with dp9 core users for up to 6 months.
  • Maximum disk space of 5000G on /scratch/dp9 can be used by a guest user for collaboration work with dp9 core users.
  • Guest users should not use dp9 group resources on mdss tape drive.
  • Guest users are reviewed annually in April/May each year for dp9 membership.


Aged file management

  • Aged files are those which have not been accessed within a year. The aged file management takes place at most once in year on /g/data/dp9; there is currently no aged file management on /scratch/dp9 and mdss of dp9.
  • A profile of aged files on /g/data/dp9 is requested from NCI administrators in October each year.
  • If the total size of aged files is less than 20T, the aged file management on /g/data/dp9 will be skipped for that year.
  • If the total size of aged files is greater than 20T, users with more 1T of aged files will be asked daily by email to reduce their size of aged files below 1T.
  • If a user fails to reduce the aged files to the required volume within a month, aged files by the order from the oldest to the latest will be removed by NCI administrator and resource manager until the aged files occupy less than 1T on /g/data/dp9.
Last modified 7 weeks ago Last modified on Jun 23, 2020 3:53:35 PM