2

I'm trying to submit a job to a school server (HPC) with:

#!/bin/bash

#$ -S /bin/bash
#$ -cwd
#$ -o ./out_$JOB_ID.txt
#$ -e ./err_$JOB_ID.txt
#$ -notify

#$ -pe orte 1

date
pwd

##################################
RESULT_DIR=~/Results
SCRIPT_FILE=sample_job
##################################

. /etc/profile
. /etc/bashrc

module load packages/comsol/4.4
module load packages/matlab/r2012b

comsol server matlab "sample_job, exit" -nodesktop -mlnosplash

/bin/uname -a

mkdir $RESULT_DIR/$name
cp *.csv $RESULT_DIR/$name

The job aborts saying:

Sun Jun  8 14:20:21 EDT 2014
COMSOL 4.4 (Build: 150) started listening on port 2036
Use the console command 'close' to exit the program
/usr/bin/xterm Xt error: Can't open display: 
/usr/bin/xterm:  DISPLAY is not set
Program_did_not_exit_normally
Exception:
    com.comsol.util.exceptions.FlException: Program did not exit normally
Messages:
    Program did not exit normally

Stack trace:
    at com.comsol.mli.application.a.a(Unknown Source)
    at com.comsol.mli.application.MatlabApplication.doStart(Unknown Source)
    at com.comsol.util.application.ComsolApplication.doStart(Unknown Source)
    at com.comsol.util.application.ComsolApplication.doRun(Unknown Source)
    at com.comsol.bridge.Bridge$2.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

ERROR: Could not start COMSOL Application. See log file: /home/.comsol/v44/logs/server2.log
java.lang.IllegalStateException: Shutdown in progress
    at java.lang.ApplicationShutdownHooks.add(Unknown Source)
    at java.lang.Runtime.addShutdownHook(Unknown Source)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:699)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:451)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.comsol.util.application.ServerApplication.a(Unknown Source)
    at com.comsol.util.application.ServerApplication.a(Unknown Source)
    at com.comsol.util.application.ServerApplication.a(Unknown Source)
    at com.comsol.util.application.ServerApplication.main(Unknown Source)

What might be the reason and how should I fix it?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Sibbs Gambling
  • 1,646
  • 6
  • 20
  • 26

1 Answers1

2

I'm assuming that you're using GridEngine as the clustering software when you submit this script to run. Something like this:

$ qsub myscript.sh

You can include environment variables to qsub that you want the resulting shells that get spawned on the HPC cluster nodes like so:

$ qsub -v DISPLAY=$(hostname):0.0 myscript.sh

This should "inject" the hostname of the system that you're doing the submitting from as the system that you'd like any GUI's to be remote displayed to.

You may also need to do this to allow your local system to "receive" this remote displayed window. The easiest and least secure way to do this is like so:

$ xhost +

If this works and you're concerned about making this "more secure" you can be more explicit with xhost + but it's likely not necessary. Let us know how you make out and we can adjust this further, if needed.

What if the above doesn't work?

Newer versions of qsub now include a switch, -X which is purported to pass the environment variable, $DISPLAY along correctly like so:

$ qsub -X myscript.sh

You could also try using the submitting host's IP address instead of the hostname. It may be the case that the HPC nodes do not have DNS setup properly.

$ qsub -v DISPLAY="$(hostname -i):0.0" myscript.sh

References

slm
  • 363,520
  • 117
  • 767
  • 871
  • Hi, so it is comes from the no-display feature of the HPC node, right? I did `host +` and `qsub -v DISPLAY=$(hostname):0.0 run.sh` just now, but the error persists. – Sibbs Gambling Jun 08 '14 at 20:35
  • @FarticlePilter The HPC node should be able to remote display GUIs, so you'll have to work out this if you really want to get a GUI from it. Can you use `qsh`? This should return a `xterm` GUI. – slm Jun 08 '14 at 20:39
  • Yes, I can do `qsh` to get the GUI. Sorry for misleading you. I do not really need the GUI. It can be completely suppressed! The thing is I wish to get the software starting. Because of the GUI thing, it cannot be started. – Sibbs Gambling Jun 08 '14 at 20:42
  • @FarticlePilter - NP, I just wanted to confirm that the GUI could be displayed from the HPC node. So getting the display set will resolve your issue. When you run your command try `qsub -V myscript.sh`. This will pass ALL the environment variables to the batch job's shell. Also you can give the commands like this: `echo env | qsub -V. The resulting `-e` and `-o` files should contain the env. vars. that are available. – slm Jun 08 '14 at 21:35
  • @FarticlePilter - you might want to try the `-X` switch to `qsub` too. See here: http://hpc.uark.edu/hpc/support/interactive.html. – slm Jun 08 '14 at 21:46
  • @FarticlePilter - also see here: [Using a GUI display in a batch job]https://wiki.csiro.au/display/ASC/Using+a+GUI+display+in+a+batch+job. I'd get the IP address of your submitting host and try using that instead of the host's name. – slm Jun 08 '14 at 21:50