Running Python Package .egg In Azure Databricks Job
Using build tool (setuptools) packaged my python code as .egg format. I wanted to run this package through job in azure data-bricks. I can able to execute the package in my local m
Solution 1:
The only way I've got this to work is by using the API to create a Python Job. The UI does not support this for some reason.
I use PowerShell to work with the API - this is an example that creates a job using an egg which works for me:
$Lib = '{"egg":"LOCATION"}'.Replace("LOCATION", "dbfs:$TargetDBFSFolderCode/pipelines.egg")
$ClusterId = "my-cluster-id"$j = "sample"$PythonParameters = "pipelines.jobs.cleansed.$j"$MainScript = "dbfs:" + $TargetDBFSFolderCode + "/main.py"
Add-DatabricksDBFSFile -BearerToken $BearerToken -Region $Region -LocalRootFolder "./bin/tmp" -FilePattern "*.*" -TargetLocation $TargetDBFSFolderCode -Verbose
Add-DatabricksPythonJob -BearerToken $BearerToken -Region $Region -JobName "$j-$Environment" -ClusterId $ClusterId `
-PythonPath $MainScript -PythonParameters $PythonParameters -Libraries $Lib -Verbose
That copies my main.py and pipelines.egg to DBFS then creates a job pointed at them passing in a parameter.
One annoying thing about eggs on Databricks - you must uninstall and restart the cluster before it picks up any new versions that you deploy.
If you use an engineering cluster this is not an issue.
Post a Comment for "Running Python Package .egg In Azure Databricks Job"