3

I am working on a cluster machine that uses the Slurm job manager. I just started a multithreaded code and I would like to check the core and thread usage for a given node ID. For example,

scoreusage -N 92512

were "scoreusage" is the command that I am unsure of.

Austin Downey
  • 165
  • 1
  • 7

2 Answers2

4

I find the built-in SLURM tools very basic. Instead, you can use something like htop, to monitor the (running) job in real time.

  1. Find which node the job is running on:
$ scontrol show job $JOB_ID | grep ' NodeList'
   NodeList=<HOSTNAME>
  1. ssh into the node: $ ssh <HOSTNAME>
  2. Run the monitoring program as required, e.g. $ htop
Sparhawk
  • 19,561
  • 18
  • 86
  • 152
  • The problem here is you can't monitor someone elses job since usually you can't access the node if you are not running something. – rsaavedra Jan 29 '20 at 14:53
1

It's been a few years since I ran a slurm cluster, but squeue should give you what you want. Try:

squeue --nodelist 92512 -o "%A %j %C %J"

(that should give your jobid, jobname, cpus, and threads for your jobs on node 92512)

BTW, unless you specifically only want details from one particular node, you might be better off searching by job id rather than node id.

There are a lot of good sites with documentation on using slurm available on the web, easily found via google - most universities etc running an HPC cluster write their own docs and help and "cheat-sheets", customised to the details of their specific cluster(s) (so take that into account and adapt any examples to YOUR cluster). There's also good generic documentation on using slurm at https://slurm.schedmd.com/documentation.html

cas
  • 1
  • 7
  • 119
  • 185
  • Thanks, I ran the squeue command and received ` 12 *` where I removed the JOB ID and file name for clarity. I will have to dig deeper into the documentation to see what I can find. – Austin Downey Jul 27 '17 at 11:00