Spark Shuffle Partition Calculator

Calculating the correct number of Spark Shuffle Partitions is something that I help a lot of customers with. So, I decided to give you an easy way to do it - with a calculator!


Shuffle Size (GB):

Number of cores in the cluster:

Spark Partition Size (MB):

Here are the suggested Spark configs that you may want to use:

spark.conf.set("spark.sql.shuffle.partitions", 4292)

spark.conf.set("spark.sql.files.maxPartitionBytes", 134217728)

The code for this calculator is available here.