Skip to content Skip to sidebar Skip to footer

How To Get Jobid That Was Submitted Using Dataproc Workflow Template

I have submitted a Hive job using Dataproc Workflow Template with the help of Airflow operator (DataprocWorkflowTemplateInstantiateInlineOperator) written in Python. Once the job i

Solution 1:

The JobId will be available as part of metadata field in Operation object that is returned from Instantiate operation. See this [1] article for how to work with metadata.

The Airflow operator only polls [2] on the Operation but does not return the final Operation object. You could try to add a return to execute.

Another option would to be to use dataproc rest API [3] after workflow finishes. Any labels assigned to the workflow itself will be propagated to clusters and jobs so you can do a list jobs call. For example the filter parameter could look like: filter = labels.my-label=12345

[1] https://cloud.google.com/dataproc/docs/concepts/workflows/debugging#using_workflowmetadata

[2] https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dataproc_operator.py#L1376

[3] https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs/list

Post a Comment for "How To Get Jobid That Was Submitted Using Dataproc Workflow Template"