2

I am looking at this example from the apache spark documentation. It seems to pass a variable called local[4] into the --master variable. I have not seen that before. What does it mean? I am using bash on OS X. Is there a "local" array? Is it a maven construct?

# Package a jar containing your application
$ mvn package
...
[INFO] Building jar: {..}/{..}/target/simple-project-1.0.jar

# Use spark-submit to run your application
$ YOUR_SPARK_HOME/bin/spark-submit \
  --class "SimpleApp" \
  --master local[4] \
  target/simple-project-1.0.jar
...
Lines with a: 46, Lines with b: 23
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
bernie2436
  • 6,505
  • 22
  • 58
  • 69
  • 2
    If that's a command to be run in a shell, then `local[4]` is of no significance to the shell. The application may care about it. – muru Feb 10 '15 at 03:31

2 Answers2

2

This is a Spark question, not a bash or Maven issue. For Spark, the master name can be in the form:

local[K]: Run Spark locally with K worker threads (which should be set to the number of cores on your machine).

Check out https://github.com/mesos/spark/wiki/Spark-Programming-Guide#master-names for more information.

shimizu
  • 121
  • 4
0

Square brackets do not have a special meaning in this context are a globbing feature ("character range") in this context, so with default configuration (nullglob off) it will pass literal string local4 if there is a file local4 in the working directory, and local[4] otherwise (thanks @Celada).

This is not a variable reference (${local[4]} would be an array index substitution).

l0b0
  • 50,672
  • 41
  • 197
  • 360