Cardinality Estimation through Histogram in Apache Spark 2.3Ron Hu (Huawei Technologies)Zhenhua Wang (Huawei Technologies) from change column length Watch Video
Preview(s):
Gallery
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)
⏲ Duration: 18 min 15 sec ✓ Published: 14-Jun-2018
Description: Apache Spark 2.2 shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, avg/max length, etc.) to improve the quality of query execution plans. Skewed data distributions are often inherent in many real world applications. In order to deal with skewed distributions effectively, we added equal-height histograms to Apache Spark 2.3. Leveraging reliable stati
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)