I am trying to setup a cluster of four nodes (all running Fedora 22) with OpenMPI.
On the master node, I've created a password-less key (~/.ssh/id_dsa) and copied ~/.ssh/id_dsa.pub to each of the three slave nodes' ~/.ssh/authorized_keys. So, from the master node, I can run ssh slave1, ssh slave2, or ssh slave3 and successfully get into the corresponding node, without being asked for a password. Same goes for ssh master.
However, I run into permission problems when I try to use mpirun. Here is the command I run:
/usr/lib64/openmpi/bin/mpirun -np 32 --hostfile .mpi_hostfile ./testprogram
and here is the first bit of the output:
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
ORTE was unable to reliably start one or more daemons.
When I subsequently run ssh slave3, I see the message "There were 2 failed login attempts since the last successful login." So it looks like the ssh authentication that mpirun is trying to do is failing for some reason.
Any ideas why I can do my password-less, key-based authentication just fine with ssh, but not with mpirun?
For the record, here is the contents of .mpi_hostfile:
# Host file for OpenMPI
# Master node, slots = num cores
localhost slots=8
# Slaves
slave1 slots=8
slave2 slots=8
slave3 slots=8