Skip to content Skip to sidebar Skip to footer
Showing posts with the label Apache Spark Sql

Spark - Set Null When Column Not Exist In Dataframe

I'm loading many versions of JSON files to spark DataFrame. some of the files holds columns A,B… Read more Spark - Set Null When Column Not Exist In Dataframe

Removing Duplicate Columns After A Df Join In Spark

When you join two DFs with similar column names: df = df1.join(df2, df1['id'] == df2['i… Read more Removing Duplicate Columns After A Df Join In Spark

Spark: How To Parse Json String Of Nested Lists To Spark Data Frame?

How to parse JSON string of nested lists to spark data frame in pyspark ? Input data frame: +------… Read more Spark: How To Parse Json String Of Nested Lists To Spark Data Frame?

Pyspark, Compare Two Rows In Dataframe

I'm attempting to compare one row in a dataframe with the next to see the difference in timesta… Read more Pyspark, Compare Two Rows In Dataframe

Selecting Empty Array Values From A Spark Dataframe

Given a DataFrame with the following rows: rows = [ Row(col1='abc', col2=[8], col3=[18]… Read more Selecting Empty Array Values From A Spark Dataframe

Best Way To Get Null Counts, Min And Max Values Of Multiple (100+) Columns From A Pyspark Dataframe

Say I have a list of column names and they all exist in the dataframe Cols = ['A', 'B&… Read more Best Way To Get Null Counts, Min And Max Values Of Multiple (100+) Columns From A Pyspark Dataframe