aketh_tm@gmail_com
Member
Hi all,I am running CESM 1.2.2 with pgi 17.4 compiler. I was successful in completing the run of the code using -O0. However, using O2 seems to create the either an error or is stuck for ever and does not proceed. One can notice from the excerpt an mellanox related warning that can be neglected. This is error appears even when using O0 optimization but run completes. This is due to mpirun using TCP instead of DAPL and the default fabric. Any help on how to run CESM with PGI and a -02 optimization?The code ran for an hour and timed out.
--------------------------------------------------------------------------WARNING: There was an error initializing an OpenFabrics device. Local host: c18-02 Local device: mlx5_0--------------------------------------------------------------------------[c18-04][[37092,1],2][btl_openib_component.c:1646:init_one_device] error obtaining device attributes for mlx5_0 errno says Success [c19-03][[37092,1],5][btl_openib_component.c:1646:init_one_device] [c19-01][[37092,1],3][btl_openib_component.c:1646:init_one_device] error obtaining device attributes for mlx5_0 errno says Successerror obtaining device attributes for mlx5_0 errno says Success
7 more processes have sent help message help-mpi-btl-openib.txt / error messages. Memory block size conversion in bytes is 1026.008 MB memory dealloc in MB is 0.00Memory block size conversion in bytes is 1027.01seq_flds_mod: read seq_cplflds_inparm namelist from: drv_inseq_flds_mod: read seq_cplflds_userspec namelist from: drv_inseq_flds_mod: seq_flds_a2x_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:Sa_pslvseq_flds_mod: seq_flds_a2x_fluxes= Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4seq_flds_mod: seq_flds_x2a_states= Sf_lfrac:Sf_ifrac:Sf_ofrac:Sx_avsdr:Sx_anidr:Sx_avsdf:Sx_anidf:Sx_tref:Sx_qref:So_t:Sx_t:Sl_fv:Sl_ram1:Sl_snowh:Si_snowh:So_ssq:So_re:Sx_u10:So_ustarseq_flds_mod: seq_flds_x2a_fluxes= Faxx_taux:Faxx_tauy:Faxx_lat:Faxx_sen:Faxx_lwup:Faxx_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4seq_flds_mod: seq_flds_l2x_states= Sl_avsdr:Sl_anidr:Sl_avsdf:Sl_anidf:Sl_tref:Sl_qref:Sl_t:Sl_fv:Sl_ram1:Sl_snowh:Sl_u10seq_flds_mod: seq_flds_l2x_fluxes= Fall_swnet:Fall_taux:Fall_tauy:Fall_lat:Fall_sen:Fall_lwup:Fall_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4:Flrl_rofliq:Flrl_roficeseq_flds_mod: seq_flds_x2l_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Slrr_volrseq_flds_mod: seq_flds_x2l_fluxes= Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Flrr_floodseq_flds_mod: seq_flds_i2x_states= Si_avsdr:Si_anidr:Si_avsdf:Si_anidf:Si_tref:Si_qref:Si_t:Si_snowh:Si_u10:Si_ifracseq_flds_mod: seq_flds_i2x_fluxes= Faii_swnet:Fioi_swpen:Faii_taux:Fioi_taux:Faii_tauy:Fioi_tauy:Faii_lat:Faii_sen:Faii_lwup:Faii_evap:Fioi_melth:Fioi_meltw:Fioi_saltseq_flds_mod: seq_flds_x2i_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:So_t:So_s:So_u:So_v:So_dhdx:So_dhdyseq_flds_mod: seq_flds_x2i_fluxes= Faxa_rain:Faxa_snow:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Fioo_qseq_flds_mod: seq_flds_o2x_states= So_t:So_s:So_u:So_v:So_dhdx:So_dhdy:So_bldepthseq_flds_mod: seq_flds_o2x_fluxes= Fioo_qseq_flds_mod: seq_flds_x2o_states= Sa_pslv:So_duu10n:Si_ifrac:Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokesseq_flds_mod: seq_flds_x2o_fluxes= Faxa_rain:Faxa_snow:Faxa_prec:Faxa_lwdn:Foxx_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Foxx_taux:Foxx_tauy:Foxx_lat:Foxx_sen:Foxx_lwup:Foxx_evap:Fioi_melth:Fioi_meltw:Fioi_salt:Forr_roff:Forr_ioffseq_flds_mod: seq_flds_s2x_states= seq_flds_mod: seq_flds_s2x_fluxes= seq_flds_mod: seq_flds_x2s_states= seq_flds_mod: seq_flds_x2s_fluxes= seq_flds_mod: seq_flds_g2x_states= seq_flds_mod: seq_flds_g2x_fluxes= seq_flds_mod: seq_flds_x2g_states= seq_flds_mod: seq_flds_x2g_fluxes= seq_flds_mod: seq_flds_xao_states= So_tref:So_qref:So_ssq:So_re:So_u10:So_duu10n:So_ustarseq_flds_mod: seq_flds_xao_albedo= So_avsdr:So_anidr:So_avsdf:So_anidfseq_flds_mod: seq_flds_r2x_states= Slrr_volrseq_flds_mod: seq_flds_r2x_fluxes= Forr_roff:Forr_ioff:Flrr_floodseq_flds_mod: seq_flds_x2r_states= seq_flds_mod: seq_flds_x2r_fluxes= Flrl_rofliq:Flrl_roficeseq_flds_mod: seq_flds_w2x_states= Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokesseq_flds_mod: seq_flds_w2x_fluxes= seq_flds_mod: seq_flds_x2w_states= Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepthseq_flds_mod: seq_flds_x2w_fluxes= [c18-01:115959] 7 more processes have sent help message help-mpi-btl-openib.txt / error in device init [c18-01:115959] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
--------------------------------------------------------------------------WARNING: There was an error initializing an OpenFabrics device. Local host: c18-02 Local device: mlx5_0--------------------------------------------------------------------------[c18-04][[37092,1],2][btl_openib_component.c:1646:init_one_device] error obtaining device attributes for mlx5_0 errno says Success [c19-03][[37092,1],5][btl_openib_component.c:1646:init_one_device] [c19-01][[37092,1],3][btl_openib_component.c:1646:init_one_device] error obtaining device attributes for mlx5_0 errno says Successerror obtaining device attributes for mlx5_0 errno says Success
7 more processes have sent help message help-mpi-btl-openib.txt / error messages. Memory block size conversion in bytes is 1026.008 MB memory dealloc in MB is 0.00Memory block size conversion in bytes is 1027.01seq_flds_mod: read seq_cplflds_inparm namelist from: drv_inseq_flds_mod: read seq_cplflds_userspec namelist from: drv_inseq_flds_mod: seq_flds_a2x_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:Sa_pslvseq_flds_mod: seq_flds_a2x_fluxes= Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4seq_flds_mod: seq_flds_x2a_states= Sf_lfrac:Sf_ifrac:Sf_ofrac:Sx_avsdr:Sx_anidr:Sx_avsdf:Sx_anidf:Sx_tref:Sx_qref:So_t:Sx_t:Sl_fv:Sl_ram1:Sl_snowh:Si_snowh:So_ssq:So_re:Sx_u10:So_ustarseq_flds_mod: seq_flds_x2a_fluxes= Faxx_taux:Faxx_tauy:Faxx_lat:Faxx_sen:Faxx_lwup:Faxx_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4seq_flds_mod: seq_flds_l2x_states= Sl_avsdr:Sl_anidr:Sl_avsdf:Sl_anidf:Sl_tref:Sl_qref:Sl_t:Sl_fv:Sl_ram1:Sl_snowh:Sl_u10seq_flds_mod: seq_flds_l2x_fluxes= Fall_swnet:Fall_taux:Fall_tauy:Fall_lat:Fall_sen:Fall_lwup:Fall_evap:Fall_flxdst1:Fall_flxdst2:Fall_flxdst3:Fall_flxdst4:Flrl_rofliq:Flrl_roficeseq_flds_mod: seq_flds_x2l_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Slrr_volrseq_flds_mod: seq_flds_x2l_fluxes= Faxa_rainc:Faxa_rainl:Faxa_snowc:Faxa_snowl:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Flrr_floodseq_flds_mod: seq_flds_i2x_states= Si_avsdr:Si_anidr:Si_avsdf:Si_anidf:Si_tref:Si_qref:Si_t:Si_snowh:Si_u10:Si_ifracseq_flds_mod: seq_flds_i2x_fluxes= Faii_swnet:Fioi_swpen:Faii_taux:Fioi_taux:Faii_tauy:Fioi_tauy:Faii_lat:Faii_sen:Faii_lwup:Faii_evap:Fioi_melth:Fioi_meltw:Fioi_saltseq_flds_mod: seq_flds_x2i_states= Sa_z:Sa_u:Sa_v:Sa_tbot:Sa_ptem:Sa_shum:Sa_pbot:Sa_dens:So_t:So_s:So_u:So_v:So_dhdx:So_dhdyseq_flds_mod: seq_flds_x2i_fluxes= Faxa_rain:Faxa_snow:Faxa_lwdn:Faxa_swndr:Faxa_swvdr:Faxa_swndf:Faxa_swvdf:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Fioo_qseq_flds_mod: seq_flds_o2x_states= So_t:So_s:So_u:So_v:So_dhdx:So_dhdy:So_bldepthseq_flds_mod: seq_flds_o2x_fluxes= Fioo_qseq_flds_mod: seq_flds_x2o_states= Sa_pslv:So_duu10n:Si_ifrac:Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokesseq_flds_mod: seq_flds_x2o_fluxes= Faxa_rain:Faxa_snow:Faxa_prec:Faxa_lwdn:Foxx_swnet:Faxa_bcphidry:Faxa_bcphodry:Faxa_bcphiwet:Faxa_ocphidry:Faxa_ocphodry:Faxa_ocphiwet:Faxa_dstwet1:Faxa_dstwet2:Faxa_dstwet3:Faxa_dstwet4:Faxa_dstdry1:Faxa_dstdry2:Faxa_dstdry3:Faxa_dstdry4:Foxx_taux:Foxx_tauy:Foxx_lat:Foxx_sen:Foxx_lwup:Foxx_evap:Fioi_melth:Fioi_meltw:Fioi_salt:Forr_roff:Forr_ioffseq_flds_mod: seq_flds_s2x_states= seq_flds_mod: seq_flds_s2x_fluxes= seq_flds_mod: seq_flds_x2s_states= seq_flds_mod: seq_flds_x2s_fluxes= seq_flds_mod: seq_flds_g2x_states= seq_flds_mod: seq_flds_g2x_fluxes= seq_flds_mod: seq_flds_x2g_states= seq_flds_mod: seq_flds_x2g_fluxes= seq_flds_mod: seq_flds_xao_states= So_tref:So_qref:So_ssq:So_re:So_u10:So_duu10n:So_ustarseq_flds_mod: seq_flds_xao_albedo= So_avsdr:So_anidr:So_avsdf:So_anidfseq_flds_mod: seq_flds_r2x_states= Slrr_volrseq_flds_mod: seq_flds_r2x_fluxes= Forr_roff:Forr_ioff:Flrr_floodseq_flds_mod: seq_flds_x2r_states= seq_flds_mod: seq_flds_x2r_fluxes= Flrl_rofliq:Flrl_roficeseq_flds_mod: seq_flds_w2x_states= Sw_lamult:Sw_ustokes:Sw_vstokes:Sw_hstokesseq_flds_mod: seq_flds_w2x_fluxes= seq_flds_mod: seq_flds_x2w_states= Sa_u:Sa_v:Sa_tbot:Si_ifrac:So_t:So_u:So_v:So_bldepthseq_flds_mod: seq_flds_x2w_fluxes= [c18-01:115959] 7 more processes have sent help message help-mpi-btl-openib.txt / error in device init [c18-01:115959] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages