Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Error running multi-instance single-point case

Hi, I was trying to run a multi-instance case for a single point on yellowstone. I created the case using PTCLM (so it creates surface data for the point). I added modified source mods to the case sourcemods folder, and changed user_nl_clm to include a few variables in the history tape. And the run went successfully.Then I was trying to change env_mach_pes.xml:# MAX_TASKS_PER_NODE comes from $case/Tools/mkbatch.$machine@ ptile = $MAX_TASKS_PER_NODE / 2 @ nthreads = 1   @ atm_tasks = $ptile * $num_instances * 2   @ lnd_tasks = $ptile * $num_instances * 2   @ ice_tasks = $ptile * $num_instances   @ ocn_tasks = $ptile * $num_instances   @ cpl_tasks = $ptile * $num_instances   @ glc_tasks = $ptile * $num_instances   @ rof_tasks = $ptile * $num_instances * 2    @ wav_tasks = $ptile * $num_instances
./xmlchange NTHRDS_ATM=$nthreads,NTASKS_ATM=$atm_tasks,NINST_ATM=$num_instances./xmlchange NTHRDS_LND=$nthreads,NTASKS_LND=$lnd_tasks,NINST_LND=$num_instances./xmlchange NTHRDS_ICE=$nthreads,NTASKS_ICE=$ice_tasks,NINST_ICE=1./xmlchange NTHRDS_OCN=$nthreads,NTASKS_OCN=$ocn_tasks,NINST_OCN=1./xmlchange NTHRDS_CPL=$nthreads,NTASKS_CPL=$cpl_tasks./xmlchange NTHRDS_GLC=$nthreads,NTASKS_GLC=$glc_tasks,NINST_GLC=1./xmlchange NTHRDS_ROF=$nthreads,NTASKS_ROF=$rof_tasks,NINST_ROF=$num_instances./xmlchange NTHRDS_WAV=$nthreads,NTASKS_WAV=$wav_tasks,NINST_WAV=1./xmlchange ROOTPE_ATM=0./xmlchange ROOTPE_LND=0./xmlchange ROOTPE_ICE=0./xmlchange ROOTPE_OCN=0./xmlchange ROOTPE_CPL=0./xmlchange ROOTPE_GLC=0./xmlchange ROOTPE_ROF=0 ./xmlchange ROOTPE_WAV=0
And I also made user_nl_clm, user_nl_datm for each case.Then I got these errror in cesm.log:  14: NetCDF: Invalid dimension ID or name  13: NetCDF: Variable not found  13: NetCDF: Variable not found  13: NetCDF: Invalid dimension ID or name  13: NetCDF: Invalid dimension ID or name  13: NetCDF: Invalid dimension ID or name  13: NetCDF: Invalid dimension ID or name   13: NetCDF: Invalid dimension ID or name(There are A LOT of these warnings)..... 17:(seq_domain_areafactinit) : min/max mdl2drv   1.000000000000000      1.000000000000000    areafact_l_LND0018  17:(seq_domain_areafactinit) : min/max drv2mdl   1.000000000000000      1.000000000000000    areafact_l_LND0018  18:(seq_domain_areafactinit) : min/max mdl2drv   1.000000000000000      1.000000000000000    areafact_l_LND0019  18:(seq_domain_areafactinit) : min/max drv2mdl   1.000000000000000      1.000000000000000    areafact_l_LND0019  19:(seq_domain_areafactinit) : min/max mdl2drv   1.000000000000000      1.000000000000000    areafact_l_LND0020  19:(seq_domain_areafactinit) : min/max drv2mdl   1.000000000000000      1.000000000000000    areafact_l_LND0020  18:(seq_mct_drv) : Initialize atm component phase 2 ATM0019  16:(seq_mct_drv) : Initialize atm component phase 2 ATM0017  15:(seq_mct_drv) : Initialize atm component phase 2 ATM0016  17:(seq_mct_drv) : Initialize atm component phase 2 ATM0018   19:(seq_mct_drv) : Initialize atm component phase 2 ATM0020.... 18:OMP: Warning #123: Ignoring invalid OS proc ID 3.  18:OMP: Warning #124: No valid OS proc IDs specified - not using affinity.  19:OMP: Warning #123: Ignoring invalid OS proc ID 4.  19:OMP: Warning #124: No valid OS proc IDs specified - not using affinity.  16:OMP: Warning #123: Ignoring invalid OS proc ID 1.  16:OMP: Warning #124: No valid OS proc IDs specified - not using affinity.  17:OMP: Warning #123: Ignoring invalid OS proc ID 2.  17:OMP: Warning #124: No valid OS proc IDs specified - not using affinity.INFO: 0031-251  task 15 exited: rc=-8INFO: 0031-251  task 16 exited: rc=-8INFO: 0031-251  task 17 exited: rc=-8INFO: 0031-251  task 18 exited: rc=-8INFO: 0031-251  task 19 exited: rc=-8   5:forrtl: error (78): process killed (SIGTERM)   5:Image              PC                Routine            Line        Source                5:libpthread.so.0    00002B9B0225D2A5  Unknown               Unknown  Unknown   5:libpoe.so          00002B9B06952AE2  Unknown               Unknown  Unknown   5:libpthread.so.0    00002B9B02255851  Unknown               Unknown  Unknown    5:libc.so.6          00002B9B0345C90D  Unknown               Unknown  Unknown  I am wondering 1) what is the meaning of this rc=-8? 2) what are the NetCDF errors?  Thanks,-Xi
 

santos

Member
I don't know much about PTCLM, but I can answer your two questions:1) rc=-8 is most likely a floating point or other arithmetic exception. SIGFPE happens to be signal 8 on most Linux systems; I'm not sure why, but on yellowstone you tend to get a negative version of the usual error codes. You might have gotten core_lite files from this error in your run directory, but there's a known problem on Yellowstone where sometimes no files are produced, or they are empty, due to a race condition.2) The netCDF "errors" are produced when the model is checking for a variable that's not on a given file. These warnings always appear, even in runs that are working fine, because some files may contain optional fields that the model can use, but are not required. (We need to figure out how to shut the messages off, since nothing is wrong in most cases when this is printed.)
 
Top