How do you rapidly derive complex insights on top of really big data sets in Cassandra?
This session draws upon Evan's experience building a distributed, interactive, columnar query engine on top of Cassandra and Spark.
We will start by surveying the existing query landscape of Cassandra and discuss ways to integrate Cassandra and Spark.
We will dive into the design and architecture of a fast, column-oriented query architecture for Spark, and why columnar stores are so advantageous for OLAP workloads. I will present a schema for Parquet-like storage of analytical datasets on Cassandra. Find out why Cassandra and Spark are the perfect match for enabling fast, scalable, complex querying and storage of big analytical data.
Now, we reach a way to deploy Big Data (check posts in previous link) with BI solutions like Pentaho, for instance