nitkbhat@gmail_com
Member
I am getting a run time error when I try to run a threaded model with just 2 threads per MPI task. (PFA the env_mach_pes.xml file). The model runs for some time and gives the following error in the log. (PFA the cesm log file) MCT::m_Router::initp_: RGSMap indices not increasing...Will correctMCT::m_Router::initp_: GSMap indices not increasing...Will correct(seq_domain_areafactinit) : min/max mdl2drv 0.999841513526222 1.00031732638246 areafact_a_ATM(seq_domain_areafactinit) : min/max drv2mdl 0.999682774281628 1.00015851159572 areafact_a_ATM(seq_domain_areafactinit) : min/max mdl2drv 0.999841513526350 1.00076245909423 areafact_l_LND(seq_domain_areafactinit) : min/max drv2mdl 0.999238121806731 1.00015851159559 areafact_l_LND(seq_domain_areafactinit) : min/max mdl2drv 0.999996826904345 0.999996826905162 areafact_r_ROF(seq_domain_areafactinit) : min/max drv2mdl 1.00000317310491 1.00000317310572 areafact_r_ROF(seq_domain_areafactinit) : min/max mdl2drv 0.999565456406962 1.00000000000000 areafact_o_OCN(seq_domain_areafactinit) : min/max drv2mdl 1.00000000000000 1.00043473250326 areafact_o_OCN(seq_domain_areafactinit) : min/max mdl2drv 0.999565456406962 1.00000000000000 areafact_i_ICE(seq_domain_areafactinit) : min/max drv2mdl 1.00000000000000 1.00043473250326 areafact_i_ICE(seq_mct_drv) : Initialize atm component phase 2 ATM[48:node2] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 48[56:node1] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0[52:node2] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 56internal ABORT - process 52[50:node2] unexpected disconnect completion event from [56:node1]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 50[58:node1] unexpected disconnect completion event from [48:node2]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 58[63:node1] unexpected disconnect completion event from [48:node2]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 63[55:node2] unexpected disconnect completion event from [56:node1]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 55[59:node1] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 59[54:node2] unexpected disconnect completion event from [56:node1]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 54[53:node2] unexpected disconnect completion event from [56:node1]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 53[61:node1] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 61[60:node1] unexpected disconnect completion event from [77:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 60[57:node1] unexpected disconnect completion event from [48:node2]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 57[51:node2] unexpected disconnect completion event from [56:node1]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 51[62:node1] unexpected disconnect completion event from [78:node7]Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0internal ABORT - process 62APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) How do i solve this error? Is there anything incorrect with my PE layout? In this case, I have 2 threads for each component. Additionally, I wanted to know if it possible to run CESM model with different number of threads for each component? I see that the OMP_NUM_THREADS is set in $CASE.run file. Thanks