Parallel writing from same DataFrame in Spark -
let's have dataframe
in spark , need write results of 2 databases, 1 stores original data frame other stores modified version (e.g. drops columns). since both operations can take few moments, possible/advisable run these operations in parallel or cause problems because spark working on same object in parallel?
import java.util.concurrent.executors import scala.concurrent._ implicit val ec = executioncontext.fromexecutor(executors.newfixedthreadpool(10)) def write1(){ //your save statement first dataframe } def write2(){ //your save statement second dataframe } def writealltables() { future{ write1()} future{ write2()} }
Comments
Post a Comment