Main menu

Navigation

MPI_COMM_WORLD rank error with CESM1_1_2 with COSP1.4

1 post / 0 new
yiyi063@...
MPI_COMM_WORLD rank error with CESM1_1_2 with COSP1.4

 

Hello there, 

 

I have set up a FAMIP simulation with CESM1_1_2 with COSP 1.4. I did a sensitivity study by adding a constant to a specific variable in microphysics scheme in CAM. I have run this simulation for 20 years successfully in early December. This time I set up two more simulations by modifying different variables in microphysics scheme. I create these two cases by cloning the previous one, but with slight changes in the microphysics code. However, one case has been aborted after running for 1.5 years, another one stopped at 12.5 years. I got error messages from one of cases as below:

 

tail /glade/work/yiyi063/CESMLE_cosp_bergtot01/run/cesm.log.190117-103920

 

180:MPT: (gdb) No stack.

 

180:MPT: (gdb)

 

0:MPT: /proc/1871/exe: Permission denied.

 

0:MPT: Attaching to process 1871

 

0:MPT: ptrace: Operation not permitted.

 

0:MPT: /gpfs/fs1/scratch/yiyi063/CESMLE_cosp_bergtot01/run/1871: No such file or directory.

 

0:MPT: (gdb) No stack.

 

0:MPT: (gdb)

 

-1:MPT ERROR: MPI_COMM_WORLD rank 121 has terminated without calling MPI_Finalize()

 

-1:     aborting job

 

 

 

Another case shows similar error message but with different MPI_COMM_WORLD rank number:

 

vi /glade/scratch/yiyi063/CESMLE_cosp_bergsnow01/run/cesm.log.190116-163524

 

120:MPT: /proc/30343/exe: Permission denied.

 

120:MPT: Attaching to program: /proc/30343/exe, process 30343

 

120:MPT: ptrace: Operation not permitted.

 

120:MPT: /gpfs/fs1/scratch/yiyi063/CESMLE_cosp_bergsnow01/run/30343: No such file or directory.

 

120:MPT: (gdb) No stack.

 

120:MPT: (gdb)

 

0:MPT: /proc/68709/exe: Permission denied.

 

0:MPT: Attaching to process 68709

 

0:MPT: ptrace: Operation not permitted.

 

0:MPT: /gpfs/fs1/scratch/yiyi063/CESMLE_cosp_bergsnow01/run/68709: No such file or directory.

 

0:MPT: (gdb) No stack.

 

0:MPT: (gdb)

 

241:forrtl: severe (174): SIGSEGV, segmentation fault occurred

 

241:Image              PC                Routine            Line        Source

 

241:cesm.exe           0000000002252F35  Unknown               Unknown  Unknown

 

241:cesm.exe           0000000002250B57  Unknown               Unknown  Unknown

 

241:cesm.exe           000000000220A434  Unknown               Unknown  Unknown

 

241:cesm.exe           000000000220A246  Unknown               Unknown  Unknown

 

241:cesm.exe           000000000216FB09  Unknown               Unknown  Unknown

 

241:cesm.exe           0000000002179970  Unknown               Unknown  Unknown

 

241:libpthread.so.0    00002AAAAB7C8870  Unknown               Unknown  Unknown

 

…...

 

 

 

-1:MPT ERROR: MPI_COMM_WORLD rank 2 has terminated without calling MPI_Finalize()

 

-1:     aborting job

 

 

 

I was wondering if anyone has any clue with issue. Or does this problem has anything to do with recent Cheyenne system update/maintenance?  I would really appreciate your help.

 

 

 

Best,

 

Yiyi

 

 

 

 

 

 

Yiyi

Who's new

  • hellosyndy@...
  • soorajkrish90@...
  • joao.bettencourt@...
  • oumzh6@...
  • turuncu