scala - How DataFrame API depends on RDDs in Spark? -

some sources, this keynote: spark 2.0 talk mathei zaharia, mention spark dataframes built on top of rdds. have found mentions on rdds in dataframe class (in spark 2.0 i'd have @ dataset); still have limited understanding of how these 2 apis bound behind scenes.

can explain how dataframes extend rdds if do?

according databricks article deep dive spark sql’s catalyst optimizer (see using catalyst in spark sql), rdds elements of physical plan built catalyst. so, describe queries in terms of dataframes, in end, spark operates on rdds.

also, can view physical plan of query using explain instruction.

//  prints physical plan console debugging purpose auction.select("auctionid").distinct.explain()  // == physical plan == // distinct false // exchange (hashpartitioning [auctionid#0], 200) //  distinct true //   project [auctionid#0]  //   physicalrdd   //[auctionid#0,bid#1,bidtime#2,bidder#3,bidderrate#4,openbid#5,price#6,item#7,daystolive#8], mappartitionsrdd[11] @ mappartitions @ existingrdd.scala:37

Search This Blog

Facebook Talkie

scala - How DataFrame API depends on RDDs in Spark? -

Comments

Post a Comment

Popular posts from this blog

delphi - How to make a proper alternate row color on a filtered TVirtualStringTree -

loops - Spock: How to use test data with @Stepwise -

amazon web services - S3 Pre-signed POST validate file type? -