Qsub to any node with more than n cores available

Question

I have a program that is parallelized using MPI. It thinks that it is able to run across multiple nodes on our (CentOS 6.6)-based HPC grid, when in actual fact it only runs successfully on multiple cores of the same compute node.

e.g. If I qsub a job to the grid asking for 20 cores, and Grid Engine decides to split it over two different nodes, the program fails. However, if there is a node with 20 cores available, and Grid Engine sends it all to that one, the program runs successfully. The qsub script contains the command #$ -pe mpi 20 to select the number of cores.

So at the moment, I do a qstat -f -u "*" to manually identify a compute node with 20 available cores, and submit to that node with qsub -q general.q@node-X-X

What I am looking for is a way to tell Grid Engine to wait and only submit the job to a single compute node that has the required number of available cores. This will allow me to automate my job submission.

I am considering writing a bash script to parse the qstat -f -u "*" command, but there must be a more elegant solution. I have looked through the qsub manual but am unable to find a suitable flag or command line argument.

I'm not able to modify the program itself at this time and I am not a system administrator.

Here is some information on the different software versions I have available:

MPI/gridengine info:

> ompi_info | grep gridengine
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)

Grid engine version is: OGS/GE 2011.11p1

score 1 · Answer 1 · answered Aug 05 '17 at 11:45

To make Gridengine schedule your 20 Core job on a single node you would have to create a new parallel environment or adjust the one you are using. The setting you need is

allocation_rule    $pe_slots

From man sge_pe:

If the special denominator $pe_slots is used, the full range of processes as specified with the qsub(1) -pe switch has to be allocated on a single host.

Do not forget to attache the new PE to your queue.

To troubleshoot your origin problem, running the MPI job over more than one node, you could open a second question with more details on that.

Thank you, @Thomas. Unfortunately since I am not an admin on the system in question I am unable to verify that this works. It looks like a good answer. — feedMe, Jan 03 '18 at 10:38

score 1 · Accepted Answer · answered Oct 22 '18 at 09:52

If you use -pe smp 20 instead of -pe mpi 20 you will be using the SMP ("Shared memory parallelism") environment instead of MPI.

SMP is a simpler approach to parallelism and runs on a single computer, sharing the local system memory across threads. Therefore it places all of the requested slots on an individual node (if available), instead of splitting them over multiple compute nodes.

For me this seems to have solved the problem.

Qsub to any node with more than n cores available

2 Answers2