zhongq@cma_gov_cn
New Member
Environment of the machine:
RedHat Enterprise Linux 5.3x86_64;
Intel compiler 11.1; ifort, mpif90;
Test 1: at resolution ” -dyn fv –hgrid 10x15”
(1) in serial way: “configure –fc ifort –nosmp –nospmd”
(2) only smp : “configure –fc mpif90 –nosmp –ntasks 6 “
succeed.
Test 2: at resolution “dyn fv –hgrid 1.9x2.5”
(1)in serial way:” configure –fc ifort –nosmp –nospmd”
and (2) only smp mode: : configure –fc mpif90 –nosmp –ntasks 16(or other number of tasks,2/8/…also be tried), the follow error occurs :
“forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
cam 00000000004044AC Unknown Unknown Unknown
libc.so.6 00002B0714E258A4 Unknown Unknown Unknown
cam 00000000004043B9 Unknown Unknown Unknown
yhrun: error: cn803: task 0: Exited with exit code 71 “
(3)in hybrid mode: “configure –fc mpif90 –smp –spmd –ntasks 16 –nthreads 4”, then error report as follows:
yhrun: error: cn434: task 0: Aborted
yhrun: First task exited 60s ago
yhrun: tasks 1-15: running
yhrun: task 0: exited abnormally
yhrun: Terminating job step 997954.0
slurmd[cn434]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
slurmd[cn435]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[cn434]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
slurmd[cn435]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
What can i judge from the error information. Is it the problem in enviroment or in procedure?
Any suggestion will be thankful!
RedHat Enterprise Linux 5.3x86_64;
Intel compiler 11.1; ifort, mpif90;
Test 1: at resolution ” -dyn fv –hgrid 10x15”
(1) in serial way: “configure –fc ifort –nosmp –nospmd”
(2) only smp : “configure –fc mpif90 –nosmp –ntasks 6 “
succeed.
Test 2: at resolution “dyn fv –hgrid 1.9x2.5”
(1)in serial way:” configure –fc ifort –nosmp –nospmd”
and (2) only smp mode: : configure –fc mpif90 –nosmp –ntasks 16(or other number of tasks,2/8/…also be tried), the follow error occurs :
“forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
cam 00000000004044AC Unknown Unknown Unknown
libc.so.6 00002B0714E258A4 Unknown Unknown Unknown
cam 00000000004043B9 Unknown Unknown Unknown
yhrun: error: cn803: task 0: Exited with exit code 71 “
(3)in hybrid mode: “configure –fc mpif90 –smp –spmd –ntasks 16 –nthreads 4”, then error report as follows:
yhrun: error: cn434: task 0: Aborted
yhrun: First task exited 60s ago
yhrun: tasks 1-15: running
yhrun: task 0: exited abnormally
yhrun: Terminating job step 997954.0
slurmd[cn434]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
slurmd[cn435]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[cn434]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
slurmd[cn435]: *** STEP 997954.0 KILLED AT 2012-11-20T23:33:01 WITH SIGNAL 9 ***
What can i judge from the error information. Is it the problem in enviroment or in procedure?
Any suggestion will be thankful!