Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

cesm1_2_2_CAMChem Running Issue - NetCDF: Invalid dimension ID or name

Hi jedwards, According to what I observed, the "NetCDF: Invalid dimension ID or name" doesn't seem to be an issue.I set up a new case with the following compset and resolution:export COMPSET=F1850CNCHMexport RES=f02_t12export MACH=userdefinedexport CCSMROOT=$CESM_PREFIX/cesm1_2_2_CAMChemexport CASE=SCASE1export CASEROOT=$CESM_PREFIX/MYCASESexport EXEROOT=$CESM_PREFIX/MYCASES/$CASE/EXECexport RUNDIR=$CESM_PREFIX/MYCASES/$CASE/EXEC/run And this is what I observed on monitoring the mpi processes using "top" command:1) The model runs for a while (approx 3-5 minutes)2) Then suddenly the memory of each process increases.3) Until (for my cluster), it reached to 41... GB per process and the application crashed, since that much amount of memory is not available. So the issue doesn't seem to be related to any incorrect installation of netcdf or incorrect build of the apllication.Just one thing more, please suggest me the smallest resolution and related compset so that I can test atleast 1 successful run of the application.I am not a domain expert, thus, having trouble in deciding the models. It would be nice if you suggest one.Note: The cluster is having 12 nodes with approx 128GB per node. Thanks,Vineet More 
 

jedwards

CSEG and Liaisons
Staff member
RES=f02_t12 is an extremely high resolution grid and will not be suitable for your system.  I would try f19_g16 first and f09_g16 if that is successful.  
 

jedwards

CSEG and Liaisons
Staff member
RES=f02_t12 is an extremely high resolution grid and will not be suitable for your system.  I would try f19_g16 first and f09_g16 if that is successful.  
 

jedwards

CSEG and Liaisons
Staff member
RES=f02_t12 is an extremely high resolution grid and will not be suitable for your system.  I would try f19_g16 first and f09_g16 if that is successful.  
 

jedwards

CSEG and Liaisons
Staff member
RES=f02_t12 is an extremely high resolution grid and will not be suitable for your system.  I would try f19_g16 first and f09_g16 if that is successful.  
 

jedwards

CSEG and Liaisons
Staff member
RES=f02_t12 is an extremely high resolution grid and will not be suitable for your system.  I would try f19_g16 first and f09_g16 if that is successful.  
 
Hi jedwards,This is what I have tried:Created a new case with: COMPSET=F1850CNCHM & RES=f19_g16And what I observed was that while having "mpirun -np 64 -map-by ppr:16:node $EXEROOT/cesm.exe" (doing this with interactive access with the correct files sourced. I just get access to the nodes using srun, then i enter into csh, then I source env_mach_specific and then I source .run to observe whats happening), it didn't create 16 procs per node. It created 28 per node. And the same issue as previously mentioned occured. The processes try to use approx 40gb each, the RAM runs out of memory and the application terminates.So let me know if the COMPSET & RES still to high ( please suggest smaller versions of both) or whether I need to run the problem on a larger cluster?Also, I am in progress of creating a case with COMPSET=F1850CNCHM & RES=f09_g16, but not sure whether this will run.[Update : Tried with COMPSET=F1850CNCHM & RES=f09_g16, observed same issue. Out of RAM]Thanks,Vineet More
 
Hi jedwards,This is what I have tried:Created a new case with: COMPSET=F1850CNCHM & RES=f19_g16And what I observed was that while having "mpirun -np 64 -map-by ppr:16:node $EXEROOT/cesm.exe" (doing this with interactive access with the correct files sourced. I just get access to the nodes using srun, then i enter into csh, then I source env_mach_specific and then I source .run to observe whats happening), it didn't create 16 procs per node. It created 28 per node. And the same issue as previously mentioned occured. The processes try to use approx 40gb each, the RAM runs out of memory and the application terminates.So let me know if the COMPSET & RES still to high ( please suggest smaller versions of both) or whether I need to run the problem on a larger cluster?Also, I am in progress of creating a case with COMPSET=F1850CNCHM & RES=f09_g16, but not sure whether this will run.[Update : Tried with COMPSET=F1850CNCHM & RES=f09_g16, observed same issue. Out of RAM]Thanks,Vineet More
 
Hi jedwards,This is what I have tried:Created a new case with: COMPSET=F1850CNCHM & RES=f19_g16And what I observed was that while having "mpirun -np 64 -map-by ppr:16:node $EXEROOT/cesm.exe" (doing this with interactive access with the correct files sourced. I just get access to the nodes using srun, then i enter into csh, then I source env_mach_specific and then I source .run to observe whats happening), it didn't create 16 procs per node. It created 28 per node. And the same issue as previously mentioned occured. The processes try to use approx 40gb each, the RAM runs out of memory and the application terminates.So let me know if the COMPSET & RES still to high ( please suggest smaller versions of both) or whether I need to run the problem on a larger cluster?Also, I am in progress of creating a case with COMPSET=F1850CNCHM & RES=f09_g16, but not sure whether this will run.[Update : Tried with COMPSET=F1850CNCHM & RES=f09_g16, observed same issue. Out of RAM]Thanks,Vineet More
 
Hi jedwards,This is what I have tried:Created a new case with: COMPSET=F1850CNCHM & RES=f19_g16And what I observed was that while having "mpirun -np 64 -map-by ppr:16:node $EXEROOT/cesm.exe" (doing this with interactive access with the correct files sourced. I just get access to the nodes using srun, then i enter into csh, then I source env_mach_specific and then I source .run to observe whats happening), it didn't create 16 procs per node. It created 28 per node. And the same issue as previously mentioned occured. The processes try to use approx 40gb each, the RAM runs out of memory and the application terminates.So let me know if the COMPSET & RES still to high ( please suggest smaller versions of both) or whether I need to run the problem on a larger cluster?Also, I am in progress of creating a case with COMPSET=F1850CNCHM & RES=f09_g16, but not sure whether this will run.[Update : Tried with COMPSET=F1850CNCHM & RES=f09_g16, observed same issue. Out of RAM]Thanks,Vineet More
 
Hi jedwards,This is what I have tried:Created a new case with: COMPSET=F1850CNCHM & RES=f19_g16And what I observed was that while having "mpirun -np 64 -map-by ppr:16:node $EXEROOT/cesm.exe" (doing this with interactive access with the correct files sourced. I just get access to the nodes using srun, then i enter into csh, then I source env_mach_specific and then I source .run to observe whats happening), it didn't create 16 procs per node. It created 28 per node. And the same issue as previously mentioned occured. The processes try to use approx 40gb each, the RAM runs out of memory and the application terminates.So let me know if the COMPSET & RES still to high ( please suggest smaller versions of both) or whether I need to run the problem on a larger cluster?Also, I am in progress of creating a case with COMPSET=F1850CNCHM & RES=f09_g16, but not sure whether this will run.[Update : Tried with COMPSET=F1850CNCHM & RES=f09_g16, observed same issue. Out of RAM]Thanks,Vineet More
 
Sure, will port cesm 2 on our system, if that is an option. Just want to be sure that, can I do everything in cesm 2 that was possible in cesm1.2.2camchem ?
 
Sure, will port cesm 2 on our system, if that is an option. Just want to be sure that, can I do everything in cesm 2 that was possible in cesm1.2.2camchem ?
 
Sure, will port cesm 2 on our system, if that is an option. Just want to be sure that, can I do everything in cesm 2 that was possible in cesm1.2.2camchem ?
 
Sure, will port cesm 2 on our system, if that is an option. Just want to be sure that, can I do everything in cesm 2 that was possible in cesm1.2.2camchem ?
 
Top