Problem: Container Killed by YARN or hung for exceeding memory limits in Spark on AWS EMR
Problem: Container Killed by YARN or hung for exceeding memory limits in Spark on Amazon EMR
Resolution:
The solution Varies according to your use case. In my case i was not able to run multiple spark jobs. The first job eats up all the memory and never released it, unless we kill it. I know that's not a solution. Lets Check What could be the Possible resolutions:
Reduce the number of executor cores
This reduces the maximum number of tasks that the executor can perform, which reduces the amount of memory required. Depending on the driver container that's throwing this error or the other executor container that's getting this error, consider decreasing cores for either the driver or the executor.
On a running cluster:
Modify spark-defaults.conf on the master node.
Example:
#vi /etc/spark/conf/spark-defaults.conf
spark.driver.cores 1
spark.executor.cores 1 ---------------> I've Decreased this value to 1, the default value can vary according to your instances types
On a new cluster:
Add a configuration object similar to the following when you launch a cluster:
[
{
"Classification": "spark-defaults",
"Properties": {"spark.driver.cores" : "1",
"spark.executor.cores": "1"
}
}
]
For a single job:
Use the --executor-cores option to reduce the number of executor cores when you run spark-submit.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-cores 1 --driver-cores 1 /usr/lib/spark/examples/jars/spark-examples.jar 100
Increase memory overhead
Memory overhead is the amount of off-heap memory allocated to each executor. By default, memory overhead is set to either 10% of executor memory or 384, whichever is higher.
Consider making gradual increases in memory overhead, up to 25%. Be sure that the sum of the driver or executor memory plus the driver or executor memory overhead is always less than the value of yarn.nodemanager.resource.memory-mb for your Amazon Elastic Compute Cloud (Amazon EC2) instance type:
Formula:
spark.driver/executor.memory + spark.driver/executor.memoryOverhead < yarn.nodemanager.resource.memory-mb
If the error occurs in the driver container or executor container, consider increasing memory overhead for that container only. You can increase memory overhead while the cluster is running, when you launch a new cluster, or when you submit a job.
On a running cluster:
Modify spark-defaults.conf on the master node. Example:
#vi /etc/spark/conf/spark-defaults.conf
spark.driver.memoryOverhead 512
spark.executor.memoryOverhead 512
On a new cluster:spark.driver.memoryOverhead 512
spark.executor.memoryOverhead 512
Add a configuration object similar to the following when you launch a cluster:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.driver.memoryOverhead": "512",
"spark.executor.memoryOverhead": "512"
}
}
]
For a single job:
Use the --conf option to increase memory overhead when you run spark-submit.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --conf spark.driver.memoryOverhead=512 --conf spark.executor.memoryOverhead=512 /usr/lib/spark/examples/jars/spark-examples.jar 100
Increase driver and executor memory
If the error occurs in either a driver container or an executor container, consider increasing memory for either the driver or the executor, but not both. Be sure that the sum of driver or executor memory plus driver or executor memory overhead is always less than the value of yarn.nodemanager.resource.memory-mb for your EC2 instance type:
spark.driver/executor.memory + spark.driver/executor.memoryOverhead < yarn.nodemanager.resource.memory-mbOn a running cluster:
Modify spark-defaults.conf on the master node. Example:
#vi /etc/spark/conf/spark-defaults.conf
spark.executor.memory 1g ---------------> I've increased this value to 1g, the default value can vary according to your instances types.
spark.driver.memory 1g
spark.executor.memory 1g ---------------> I've increased this value to 1g, the default value can vary according to your instances types.
spark.driver.memory 1g
On a new cluster:
Add a configuration object similar to the following when you launch a cluster:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "1g",
"spark.driver.memory":"1g",
}
}
]
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.memory": "1g",
"spark.driver.memory":"1g",
}
}
]
For a single job:
Use the --executor-memory and --driver-memory options to increase memory when you run spark-submit.
Example:
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1g --driver-memory 1g /usr/lib/spark/examples/jars/spark-examples.jar 100
No comments