Adding nodes at runtime (Upscale) to already running Spark-on-Yarn clusters is fairly easy. But taking away these nodes (Downscale) when the workload is low at some later point of time in a difficult problem. To remove a node from a running cluster, We need to make sure that it is not used for compute as well as storage.
But on production workloads, we see that many of the nodes can’t be taken away because:
1. Nodes are running some containers although they are not fully utilized. That means all containers are fragmented on different nodes. Exa. – each node is running 1-2 containers/executors although they have resources to run 4 containers. Also long running Spark executors makes it even more difficult.
2. Nodes have some shuffle data in the local disk which will be consumed by Spark application running on this cluster later. In this case, the Resource Manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages or tasks.