0

There are two computers with 12 physical cores each.

Computer A should accept jobs and distribute them among A and B

I want to setup Computers A and B such that

  • A will accept jobs (via ssh) and distribute them among A and B (more or less intelligently)
  • if possible I'd like to block 4 cores on each computer as "personal requiremets"

Jobs are expected to be either python scripts or executables written in c++ (can involve mpi code).

I have read of slurm and the Sun Grid Engine but that seems a bit too powerful/complicated for this use case (I don't want to spend a week reading how to do it and troubleshooting). Is there an easier solution that satisfies the requirements?

infinitezero
  • 187
  • 1
  • 7
  • if A can compute tasks to B, then B is the server, and A is the client, not the other way around. – Marcus Müller Mar 14 '22 at 16:00
  • Should I have chosen master(A)/slave(A+B) or host(A)/recipient(A+B) instead? – infinitezero Mar 14 '22 at 16:34
  • I'd say A is a client and A and B are compute nodes. But doesn't really matter. Can you tell us a bit about what "a job" is? – Marcus Müller Mar 14 '22 at 16:36
  • @MarcusMüller I have updated the post accordingly and hopefully answered your question :) – infinitezero Mar 14 '22 at 16:39
  • Thanks! Do your machines have a shared piece of storage, or how do you plan on getting results of the jobs back? – Marcus Müller Mar 14 '22 at 17:23
  • @MarcusMüller that is up to me. Currently there is only A. Machine B will be available in a couple months. I have basically no idea how to set it up, so I can not answer this any better – infinitezero Mar 14 '22 at 19:24
  • maybe we need to talk more deeply what your python tasks are, what kind and amount of data you need, and so on. – Marcus Müller Mar 14 '22 at 19:43
  • You'd be building a 2-node cluster. Slurm or similar cluster admin tool (e.g. pbs / torque) are what you need - one of the machines, i.e. computer "A", would need to be both the slurm controller node AND a compute node. The other, computer "B", only needs to be a compute node. You also need some form of shared storage between the two machines so that jobs and data can be shared between them. – cas Mar 15 '22 at 04:22
  • @MarcusMüller This is mainly thought as a shared work station for different people who want to run scripts that last a couple hours/days. MPI programs are likely to be seldom run, but I want to keep the option available. They will mainly perform mathematical tasks like solving PDEs. – infinitezero Mar 15 '22 at 09:14

0 Answers0