Spark Shuffle Partition Calculator

Calculating the correct number of Spark Shuffle Partitions is something that I help a lot of customers with. So, I decided to give you an easy way to do it - with a calculator!

 

Shuffle Size (GB):


Number of cores in the cluster:


Spark Partition Size (MB):



Here are the suggested Spark configs that you may want to use:

spark.conf.set("spark.sql.shuffle.partitions", 4292)

spark.conf.set("spark.sql.files.maxPartitionBytes", 134217728)



The code for this calculator is available here.