3

I submitted a job to a Linux cluster which uses SGE job scheduler. The job stat is qw for a long time, so I inspected the stats of computing nodes using "qstat -f".

I found that many nodes were labelled with stats "d", "adu" and "E". I wonder what these stats mean. The Grid Engine Man pages listed these stats for filtering queue instances ( -qs {a|c|d|o|s|u|A|C|D|E|S} ), but no further explanation on the meaning of these stats.

What do the states mean?

slm
  • 363,520
  • 117
  • 767
  • 871
Dejian
  • 788
  • 6
  • 9

1 Answers1

2

I know from experience that:

  • qw - queued waiting
  • E - error
  • a - denotes an alarm state
  • du - deleted by user

There's a table here:

Also you can use the -explain switch to qstat to find out more info:

 -explain a|A|c|E
      'c' displays the reason for the c(onfiguration  ambigu-
      ous)  state  of  a queue instance. 'a' shows the reason
      for the alarm state. Suspend alarm state  reasons  will
      be  displayed  by  'A'.  'E'  displays the reason for a
      queue instance error state.

      The output format for the alarm reasons is one line per
      reason containing the resource value and threshold. For
      details about the resource value please  refer  to  the
      description  of  the  Full  Format  in  section  OUTPUT
      FORMATS below.

References

slm
  • 363,520
  • 117
  • 767
  • 871