Skip to content Skip to sidebar Skip to footer

Running Python Package .egg In Azure Databricks Job

Using build tool (setuptools) packaged my python code as .egg format. I wanted to run this package through job in azure data-bricks. I can able to execute the package in my local m

Solution 1:

The only way I've got this to work is by using the API to create a Python Job. The UI does not support this for some reason.

I use PowerShell to work with the API - this is an example that creates a job using an egg which works for me:

$Lib = '{"egg":"LOCATION"}'.Replace("LOCATION", "dbfs:$TargetDBFSFolderCode/pipelines.egg")
$ClusterId = "my-cluster-id"$j = "sample"$PythonParameters = "pipelines.jobs.cleansed.$j"$MainScript = "dbfs:" + $TargetDBFSFolderCode + "/main.py"
Add-DatabricksDBFSFile -BearerToken $BearerToken -Region $Region -LocalRootFolder "./bin/tmp" -FilePattern "*.*"  -TargetLocation $TargetDBFSFolderCode -Verbose
Add-DatabricksPythonJob -BearerToken $BearerToken -Region $Region -JobName "$j-$Environment" -ClusterId $ClusterId `
    -PythonPath $MainScript -PythonParameters $PythonParameters -Libraries $Lib -Verbose

That copies my main.py and pipelines.egg to DBFS then creates a job pointed at them passing in a parameter.

One annoying thing about eggs on Databricks - you must uninstall and restart the cluster before it picks up any new versions that you deploy.

If you use an engineering cluster this is not an issue.

Post a Comment for "Running Python Package .egg In Azure Databricks Job"