huazhen
Member
Hi there,I am trying to run CESM2 on our super computer. I use intel/2017.u2 compiler. I get the following job process messages in $CaseStatus when I am trying to validate a CESM port with prognostic components in http://esmci.github.io/cime/users_guide/porting-cime.html by running "./create_case SMS_D_Ld1.f09_g17.B1850cmip6.spartan_intel.allactive-defaultio_min". ---------------------------------------------------2019-07-19 22:12:27: case.run starting ---------------------------------------------------2019-07-19 22:14:04: model execution starting ---------------------------------------------------2019-07-19 22:14:20: model execution success ---------------------------------------------------2019-07-19 22:14:20: case.run error ERROR: RUN FAIL: Command 'mpirun -n 90 /data/cephfs/punim0769/cesm/scratch/SMS_D_Ld1.f09_g17.B1850cmip6.spartan_intel.allactive-defaultio_min.20190719_213423_3p62od/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failedSee log file for details: /data/cephfs/punim0769/cesm/scratch/SMS_D_Ld1.f09_g17.B1850cmip6.spartan_intel.allactive-defaultio_min.20190719_213423_3p62od/run/cesm.log.10252393.190719-221227 But I still cannot find out the solution of this problem by checking the details in log file (will attach full log file below). Based on the following messages in log file (line 332), I think the problem maybe connect with settings of "openmpi" or "mpirun", but I still don't know how to fix it. Do you have any suggestions? Any help is much appreciated. Thanks a lot.I will attach the following five files./data/cephfs/punim0769/cesm/scratch/SMS_D_Ld1.f09_g17.B1850cmip6.spartan_intel.allactive-defaultio_min.20190719_213423_3p62od/CaseStatus/data/cephfs/punim0769/cesm/scratch/SMS_D_Ld1.f09_g17.B1850cmip6.spartan_intel.allactive-defaultio_min.20190719_213423_3p62od/run/cesm.log.10252393.190719-221227/home/huazhenl/.cime/config_machines.xml/home/huazhenl/.cime/config_compilers.xml/home/huazhenl/.cime/config_batch.xml --------------------------------------------------------------------------MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLDwith errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.You may or may not see output from other processes, depending onexactly when Open MPI kills them.--------------------------------------------------------------------------