一、MR任务在Shuffle过程时报错
Ended Job = job_1556157715460_0029 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1556157715460_0029_m_000000 (and more) from job job_1556157715460_0029
Task with the most failures(4):
-----
Task ID:
task_1556157715460_0029_r_000000
URL:
http://account.jetbrains.com:8088/taskdetails.jsp?jobid=job_1556157715460_0029&tipid=task_1556157715460_0029_r_000000
-----
Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:395)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:310)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:291)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 4489 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
node节点无法从远端的其他节点获取shuffle的数据,可能的原因有很多,建议检查DNS、hosts配置等。