Skip to content Skip to sidebar Skip to footer

Spark-submit With Specific Python Librairies

I have a pyspark code depending on third party librairies. I want to execute this code on my cluster which run under mesos. I do have a zipped version of my python environment that

Solution 1:

To submit you zip folder to python spark, you need to send the files using :

spark-submit --py-files your_zip your_code.py

While using it inside your code, you will have to use below statement:

sc.addPyFile("your_zip")
import your_zip

Hope this will help!!

Solution 2:

May be helpful to some people, if you have dependencies.

I found a solution on how to properly load a virtual environment to the master and all the slave workers:

virtualenv venv --relocatable
cd venv 
zip -qr ../venv.zip *

PYSPARK_PYTHON=./SP/bin/python spark-submit --master yarn --deploy-mode cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./SP/bin/python --driver-memory 4G --archives venv.zip#SP filename.py

Post a Comment for "Spark-submit With Specific Python Librairies"