This extends Remote debugging in Java with Java Debug Wire Protocol (JDWP) to debug Spark jobs written in Java. We need to debug both the “Driver” and the “Executor“.
Debugging the Spark Driver in Java
Step 1: Run the Spark submit job in the remote machine, which waits on port “7777” for the eclipse debugger to connect.
|
1 2 3 4 5 6 7 8 |
spark-submit --master yarn-client --driver-memory 5G --executor-memory 6G --num-executors 1 --executor-cores 1 \ --jars jars/myapp-common-1.1.0.jar,jars/myapp-shared-1.1.1.jar \ --conf spark.executor.extraJavaOptions="-Dconfig.resource=myapp.conf" --conf spark.driver.extraJavaOptions="-Dconfig.resource=myapp.conf -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777" \ --classcom.myapp.SparkSimpleJob \ --files conf/myapp.conf jars/myapp-spark-1.0.0.jar /some/input/path |
Step 2: Add break points to your code in the project “myapp”.
Step 3: Right mouse click on the “myapp” project within Eclipse and select “Debug As” -> “Debug Configurations” -> “Remote Java Application“. Right mousel click and create new:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
Name: myapp driver debug Project: myapp Connection Type: Standard (Socket Attach) Host: <Remote IP address where Spark job is listening on 7777 > Port: 7777 |
and click “Apply” and then “Debug”. This will make the paused Spark submit job to continue running and the code will pause at the first break-point in your code.
Debugging the Spark Executor in Java
Step 1: Right mouse click on the “myapp” project within Eclipse and select “Debug As” -> “Debug Configurations” -> “Remote Java Application“. Right mousel click and create new:
|
1 2 3 4 5 6 7 8 9 10 |
Name: myapp executor debug Project: myapp Connection Type: Standard (Socket Listen) Port: 7777 |
Step 2: Run the Spark submit job in the remote machine as shown below.
|
1 2 3 4 5 6 7 8 |
spark-submit --master yarn-client --driver-memory 5G --executor-memory 6G --num-executors 1 --executor-cores 1 \ --jars jars/myapp-common-1.1.0.jar,jars/myapp-shared-1.1.1.jar \ --conf spark.executor.extraJavaOptions="-Dconfig.resource=myapp.conf -agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=<eclipse-workspace-ip>:7777" \ --conf spark.driver.extraJavaOptions="-Dconfig.resource=myapp.conf" \ --classcom.myapp.SparkSimpleJob \ --files conf/myapp.conf jars/myapp-spark-1.0.0.jar /some/input/path |
The code will pause at the breakpoints within the executor codes.
Debugging both the Spark Driver & the Executor in Java
Step 1: Add the required break points to your “myapp” code in Eclipse.
Step 2: Run the configured executor debugger “myapp executor debug“.
Step 3: Run the Spark submit command as shown below with both Driver & Executor debugging turned on.
|
1 2 3 4 5 6 7 8 |
spark-submit --master yarn-client --driver-memory 5G --executor-memory 6G --num-executors 1 --executor-cores 1 \ --jars jars/myapp-common-1.1.0.jar,jars/myapp-shared-1.1.1.jar \ --conf spark.executor.extraJavaOptions="-Dconfig.resource=myapp.conf -agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=<eclipse-workspace-ip>:7777" \ --conf spark.driver.extraJavaOptions="-Dconfig.resource=myapp.conf -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777" --classcom.myapp.SparkSimpleJob \ --files conf/myapp.conf jars/myapp-spark-1.0.0.jar /some/input/path |
Step 4: Run the configured executor debugger “myapp driver debug“.
Debugging via spark2-shell in Scala
|
1 2 3 4 5 6 7 |
$spark2-shell --executor-cores 8 --driver-memory 16g --executor-memory 16g \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.yarn.maxAppAttempts=3 \ --conf spark.hadoop.hive.exec.max.dynamic.partitions=2000 \ --jars /path/to/jdbc/ojdbc8.jar,/path/to/hive/lib/hive-contrib.jar,/path/to/any/AnyOtherDependencies.jar |