DataFrames

Back to Apache Spark

A distributed collection of data organized into named columns, similar to a table in a relational database. Higher-level abstraction than RDDs with optimization through Catalyst query optimizer.

data-pipelines batch spark dataframes