Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing! Holden Karau (Google) and Bryan Cutler (IBM) from pandas numpy Watch Video
Preview(s):
Gallery
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)
⏲ Duration: 18 min 50 sec ✓ Published: 11-Jun-2018
Description: PySpark is getting awesomer in Spark 2.3 with vectorized UDFs, and there is even more wonderful things on the horizon (and currently available as WIP packages). This talk will start by illustrating how to use PySpark’s new vectorized UDFs to make ML pipeline stages. Since most of us use Python in part because of its wonderful libraries, like pandas, numpy, and antigravity*, it’s important to be able to make sure that our dependencies are available on our cluster. Historically there’s been
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)