scala - Is forcing action on spark dataframe required? -

i have below code snippet, working fine.

so here 2 actions in code. updatedf.first() , mergeddf.save() updatedf.count() dummy action, if remove job failing. necessary force such kind of action in code. feeling if remove updatedf.count(), first encounter action on mergeddf.save(). , when compute mergeddf.save() creates more intermediary dataframes causing job failed. suggest code change make better.

newdatadf.persist() val historydatadf=hivecontext.read.format("orc").load(stagingfullpath).persist()  val updatedf = historydatadf.coalesce(5).join(newdatadf, jobprimarykey).select(historydatadf.columns.map(historydatadf(_)): _*).persist()  println(updatedf.count())  val unchangeddf = historydatadf.except(updatedf).persist()  val mergeddf = unchangeddf.unionall(newdatadf).persist() mergeddf.write.format("orc").mode(org.apache.spark.sql.savemode.overwrite).save(stagingfullpath)

Search This Blog

Facebook Talkie

scala - Is forcing action on spark dataframe required? -

Comments

Post a Comment

Popular posts from this blog

delphi - How to make a proper alternate row color on a filtered TVirtualStringTree -

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -