PBS torque: how to solve cores waste problem in parallel tasks that spend very different time from each...












0















I'm running parallel MATLAB or python tasks in a cluster that is managed by PBS torque. The embarrassing situation now is that PBS think I'm using 56 cores but that's in the first and eventually I have only 7 hardest tasks running. 49 cores are wasted now.



My parallel tasks took very different time because they did searches in different model parameters, I didn't know which task will spend how much time before I have tried. In the start all cores were used but soon only the hardest tasks ran. Since the whole task was not finished yet PBS torque still thought I was using full 56 cores and prevent new tasks run but actually most cores were idle. I want PBS to detect this and use the idle cores to run new tasks.



So my question is that are there some settings in PBS torque that can automatically detect real cores used in the task, and allocate the really idle cores to new tasks?



#PBS -S /bin/sh
#PBS -N alps_task
#PBS -o stdout
#PBS -e stderr
#PBS -l nodes=1:ppn=56
#PBS -q batch
#PBS -l walltime=1000:00:00
#HPC -x local
cd /tmp/$PBS_O_WORKDIR
alpspython spin_half_correlation.py 2>&1 > tasklog.log









share|improve this question





























    0















    I'm running parallel MATLAB or python tasks in a cluster that is managed by PBS torque. The embarrassing situation now is that PBS think I'm using 56 cores but that's in the first and eventually I have only 7 hardest tasks running. 49 cores are wasted now.



    My parallel tasks took very different time because they did searches in different model parameters, I didn't know which task will spend how much time before I have tried. In the start all cores were used but soon only the hardest tasks ran. Since the whole task was not finished yet PBS torque still thought I was using full 56 cores and prevent new tasks run but actually most cores were idle. I want PBS to detect this and use the idle cores to run new tasks.



    So my question is that are there some settings in PBS torque that can automatically detect real cores used in the task, and allocate the really idle cores to new tasks?



    #PBS -S /bin/sh
    #PBS -N alps_task
    #PBS -o stdout
    #PBS -e stderr
    #PBS -l nodes=1:ppn=56
    #PBS -q batch
    #PBS -l walltime=1000:00:00
    #HPC -x local
    cd /tmp/$PBS_O_WORKDIR
    alpspython spin_half_correlation.py 2>&1 > tasklog.log









    share|improve this question



























      0












      0








      0








      I'm running parallel MATLAB or python tasks in a cluster that is managed by PBS torque. The embarrassing situation now is that PBS think I'm using 56 cores but that's in the first and eventually I have only 7 hardest tasks running. 49 cores are wasted now.



      My parallel tasks took very different time because they did searches in different model parameters, I didn't know which task will spend how much time before I have tried. In the start all cores were used but soon only the hardest tasks ran. Since the whole task was not finished yet PBS torque still thought I was using full 56 cores and prevent new tasks run but actually most cores were idle. I want PBS to detect this and use the idle cores to run new tasks.



      So my question is that are there some settings in PBS torque that can automatically detect real cores used in the task, and allocate the really idle cores to new tasks?



      #PBS -S /bin/sh
      #PBS -N alps_task
      #PBS -o stdout
      #PBS -e stderr
      #PBS -l nodes=1:ppn=56
      #PBS -q batch
      #PBS -l walltime=1000:00:00
      #HPC -x local
      cd /tmp/$PBS_O_WORKDIR
      alpspython spin_half_correlation.py 2>&1 > tasklog.log









      share|improve this question
















      I'm running parallel MATLAB or python tasks in a cluster that is managed by PBS torque. The embarrassing situation now is that PBS think I'm using 56 cores but that's in the first and eventually I have only 7 hardest tasks running. 49 cores are wasted now.



      My parallel tasks took very different time because they did searches in different model parameters, I didn't know which task will spend how much time before I have tried. In the start all cores were used but soon only the hardest tasks ran. Since the whole task was not finished yet PBS torque still thought I was using full 56 cores and prevent new tasks run but actually most cores were idle. I want PBS to detect this and use the idle cores to run new tasks.



      So my question is that are there some settings in PBS torque that can automatically detect real cores used in the task, and allocate the really idle cores to new tasks?



      #PBS -S /bin/sh
      #PBS -N alps_task
      #PBS -o stdout
      #PBS -e stderr
      #PBS -l nodes=1:ppn=56
      #PBS -q batch
      #PBS -l walltime=1000:00:00
      #HPC -x local
      cd /tmp/$PBS_O_WORKDIR
      alpspython spin_half_correlation.py 2>&1 > tasklog.log






      parallel-processing pbs torque






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 3:16







      demxiya1970

















      asked Nov 16 '18 at 3:10









      demxiya1970demxiya1970

      33




      33
























          1 Answer
          1






          active

          oldest

          votes


















          1














          A short answer to your question is No: PBS has no way to reclaim unused resources allocated to a job.



          Since your computation is essentially a bunch of independent tasks, what you could and probably should do is try to split your job into 56 independent jobs each running an individual combination of model parameters and when all the jobs are finished you could run an additional job to collect and summarize the results. This is a well supported way of doing things. PBS provides has some useful features for this type of jobs such as array jobs and job dependencies.






          share|improve this answer
























          • Thank you. Very concise and helpful answer!

            – demxiya1970
            Nov 27 '18 at 0:57













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330881%2fpbs-torque-how-to-solve-cores-waste-problem-in-parallel-tasks-that-spend-very-d%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          A short answer to your question is No: PBS has no way to reclaim unused resources allocated to a job.



          Since your computation is essentially a bunch of independent tasks, what you could and probably should do is try to split your job into 56 independent jobs each running an individual combination of model parameters and when all the jobs are finished you could run an additional job to collect and summarize the results. This is a well supported way of doing things. PBS provides has some useful features for this type of jobs such as array jobs and job dependencies.






          share|improve this answer
























          • Thank you. Very concise and helpful answer!

            – demxiya1970
            Nov 27 '18 at 0:57


















          1














          A short answer to your question is No: PBS has no way to reclaim unused resources allocated to a job.



          Since your computation is essentially a bunch of independent tasks, what you could and probably should do is try to split your job into 56 independent jobs each running an individual combination of model parameters and when all the jobs are finished you could run an additional job to collect and summarize the results. This is a well supported way of doing things. PBS provides has some useful features for this type of jobs such as array jobs and job dependencies.






          share|improve this answer
























          • Thank you. Very concise and helpful answer!

            – demxiya1970
            Nov 27 '18 at 0:57
















          1












          1








          1







          A short answer to your question is No: PBS has no way to reclaim unused resources allocated to a job.



          Since your computation is essentially a bunch of independent tasks, what you could and probably should do is try to split your job into 56 independent jobs each running an individual combination of model parameters and when all the jobs are finished you could run an additional job to collect and summarize the results. This is a well supported way of doing things. PBS provides has some useful features for this type of jobs such as array jobs and job dependencies.






          share|improve this answer













          A short answer to your question is No: PBS has no way to reclaim unused resources allocated to a job.



          Since your computation is essentially a bunch of independent tasks, what you could and probably should do is try to split your job into 56 independent jobs each running an individual combination of model parameters and when all the jobs are finished you could run an additional job to collect and summarize the results. This is a well supported way of doing things. PBS provides has some useful features for this type of jobs such as array jobs and job dependencies.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 '18 at 13:45









          Dmitri ChubarovDmitri Chubarov

          11k22454




          11k22454













          • Thank you. Very concise and helpful answer!

            – demxiya1970
            Nov 27 '18 at 0:57





















          • Thank you. Very concise and helpful answer!

            – demxiya1970
            Nov 27 '18 at 0:57



















          Thank you. Very concise and helpful answer!

          – demxiya1970
          Nov 27 '18 at 0:57







          Thank you. Very concise and helpful answer!

          – demxiya1970
          Nov 27 '18 at 0:57




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330881%2fpbs-torque-how-to-solve-cores-waste-problem-in-parallel-tasks-that-spend-very-d%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Xamarin.form Move up view when keyboard appear

          Post-Redirect-Get with Spring WebFlux and Thymeleaf

          Anylogic : not able to use stopDelay()