Questions tagged [slurm]

SLURM is a workload manager for Linux clusters

SLURM (Simple Linux Utility for Resource Management) is a workload manager and job scheduler for Linux clusters and supercomputers.

Not to be confused with slurm, the network load monitor.

External links

77 questions
34
votes
1 answer

How to submit a job to a specific node using Slurm's sbatch command?

Our nodes are named node001 ... node0xx in our cluster. I wonder, is it possible to submit a job to a specific node using Slurm's sbatch command? If so, can someone post an example code for that?
Amir
  • 949
  • 1
  • 7
  • 12
29
votes
5 answers

Best way to cancel all the SLURM jobs from shell command output

I submitted lots of SLURM job script with debug time limit (I forgot to change the time for actual run). Now they are all submitted at the same time, so they all start with job ID 197xxxxx. Now, I can do squeue -u $USER | grep 197 | awk '{print…
Osman Mamun
  • 443
  • 1
  • 4
  • 8
24
votes
4 answers

SLURM: Custom standard output name

When running a SLURM job using sbatch, slurm produces a standard output file which looks like slurm-102432.out (slurm-jobid.out). I would like to customise this to (yyyymmddhhmmss-jobid-jobname.txt). How do I go about doing this? Or more generally,…
mindlessgreen
  • 1,229
  • 4
  • 12
  • 21
16
votes
3 answers

How to cancel jobs on Slurm with job ID(job number) bigger than a certain number?

I have submitted 800 jobs on Slurm. I want to cancel those jobs that have job ID/number bigger than a number(since there is a mistake in them). I don't want to cancel all my jobs because some are running and some that are in the queue are correct.
Mona Jalilvand
  • 171
  • 1
  • 1
  • 3
10
votes
1 answer

How to check SLURM environmental variables programmatically?

How can I programmatically access SLURM environmental variables, like MaxArraySize or MaxJobCount? I would like to partition my job arrays into chunks of the allowed maximum size. Can this information be queried with any of SLURM's commands? So far,…
István Zachar
  • 203
  • 2
  • 6
5
votes
1 answer

Is munge required for a single node slurm setup?

I'm installing slurm on a single server to be used for scheduling purposes among a small group of people. There is not now, nor will there ever be, an intent to scale beyond this single node. Is munge still a requirement for security in this case or…
drjrm3
  • 1,885
  • 4
  • 16
  • 17
5
votes
2 answers

Running GNU Parallel on 2 or more nodes with Slurm scheduler

I am trying to distribute independent runs of a process using GNU Parallel on a HPC that uses Slurm workload manager. Briefly, here is the data analysis set up: Script#1: myCommands ./myscript --input infile.txt --setting 1 --output out1 ./myscript…
cryptic0
  • 191
  • 6
4
votes
2 answers

`watch` command with piping `|`

I want to keep monitoring a specific job on a slurm worload like cluster. I tried to use the watch command and grep the specific id. If the job id is4138, I tried $> watch squeue -u mnyber004 | grep 4138 $> squeue -u mnyber004 | watch grep…
Many
  • 257
  • 1
  • 6
4
votes
1 answer

Possible effects of slurmstepd: error: Exceeded step memory limit at some point?

I have a question for those of you familiar with the scheduler Slurm. Sometimes I get the following error message slurmstepd: error: Exceeded step memory limit at some point. I know it means the memory allocated to my process wasn't enough.…
j91
  • 161
  • 3
3
votes
2 answers

Passing Argument to Comment in .sh script

I am a beginner in the use of .sh scripts so please excuse my ignorance. This is my problem: To submit my jobs to our cluster the corresponding submit file has to contain a "slurm header" and looks something like this. #!/bin/sh # ########## Begin…
stollenm
  • 133
  • 1
  • 5
3
votes
1 answer

restore $0 or $BASH_SOURCE after it is modified by the cluster

I am using a shared SLURM cluster. I am trying to get the path of the bash script from inside the script itself. There is already an excellent thread here:…
burger
  • 209
  • 1
  • 6
3
votes
1 answer

Watch-command-alias-expansion AND need to use quotes

My questions is similar to the watch question here but with a twist. I need to use quotes, which seem to be stripped by an aliased watch. I want run watch on a custom slurm squeue command: $alias squeue_personal='squeue -o "%.18i %.9P %.8j %.8u…
3
votes
0 answers

SLURM: restrict GPU access only to SLURM

I have a single machine (Ubuntu 16.04 Server) with 4 TitanX GPUs. This will be a lab machine on which students will learn about CUDA and stuff. I installed SLURM because I want a tool to schedule and enqueue jobs automatically based on GPU…
Kamil
  • 1,311
  • 2
  • 14
  • 31
3
votes
2 answers

Check CPU/thread usage for a node in the Slurm job manager

I am working on a cluster machine that uses the Slurm job manager. I just started a multithreaded code and I would like to check the core and thread usage for a given node ID. For example, scoreusage -N 92512 were "scoreusage" is the command that I…
Austin Downey
  • 165
  • 1
  • 7
3
votes
1 answer

Using SLURM without a feature

Suppose my super computer has the following NODELIST's with the included features: NODELIST FEATURES NodeA (none) NodeB specialfeature and I am trying to benchmark performance using or not using the specialfeature feature.…
chessofnerd
  • 115
  • 6
1
2 3 4 5 6