2

I am trying to configure mpirun and mpiexec to run software called Materials Studio on a 1 node, 2 processor, 12 core cluster. The submission scheme is PBS. I had everything set up properly (with some help) and where I could submit jobs and they would work well but after a few days I ran into issues where I would get this sort of error:

mpiexec_server.org: cannot connect to local mpd (/tmp/mpd2.console_user); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option)

It seemed like the daemon for mpd was somehow set up but eventually terminated. I had luck adding this (bold part) to my submission script:

export PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/bin:$PATH

export LD_LIBRARY_PATH=/data1/opt/MD/Linux-x86_64/IntelMPI/lib:/data1/opt/MD/Linux-x86_64/IntelMPI/bin:/data1/opt/MD/Linux-x86_64/IntelMKL/lib

**mpdboot -n 1 -f ~/mpd.hosts**

nohup mpd &

/data1/opt/MD/Linux-x86_64/IntelMPI/bin/mpiexec -n 6 /data1/opt/MD/2.0/TaskServer/Tools/vasp5.3.3/Linux-x86_64/vasp_parallel

The job now submits and runs properly but times out after 30 minutes or so. I tried adding '-r ssh' without quotes to the end of the mpdboot line but I am not sure if that is the right strategy to take. Also, I am a little confused about why I need to run this daemon in this script and why I need to call a hosts file when I run- I thought that PBS creates that when the job picks up. Could anyone please give me some advice on where to go next? Basically how can I prevent a job that is running from quitting because of something to do with the mpi daemon.

EDIT: Could anyone shed any light on what is involved with running that mpiexec that I have on the last line? If I properly link to the folder where it is, do I need to run a boot command? I must admit that I am confused why I need to run mpdboot/mpd when then whole point of mpiexec is to eliminate the need for mpd (at least according to the mpiexec website).

Rui F Ribeiro
  • 55,929
  • 26
  • 146
  • 227
sjensen
  • 121
  • 2
  • I guess I am a little confused why I need to run mpdboot and mpd in the first place. It seems like only the latest and greatest intel compiler suggests doing this. Is there a way to revert to previous functionality that would be present in say mpi 3.2 which I am told this code was compiled against? Thanks again! – sjensen Jun 10 '13 at 00:44

1 Answers1

0

I'm running a MD simulation. But, once I want to run the simulation in DL-POLY the simulation is not started. I used these commands:

$ ps aux | grep mpd 

$ nohup mpd > mpd.out 2> mpd.err < /dev/null/ &

$ mpiexec -n 4 DLPOLY.X >> job.out 2> job.err < /dev/null &

$ top

So that when I use the last command to see the process, I would see that the DL_POLY didn't appear. In the meanwhile, using the ll command I see that mpd.out has a zero value. I don't know why?

slm
  • 363,520
  • 117
  • 767
  • 871
Majid
  • 1