Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Questions on NTASKS, ROOTPE, and submission

xiangli

Xiang Li
Member
You need to check to see what your mpi library calls the mpi compiler wrappers and use those.
If both mpif90 and mpifort are available I think they are identical or nearly so - either should be fine.
Hi Jim,

I checked that for my case mpif90 should be good.

Now I could successfully run B1850 cases, though there were still some errors after running ./scripts_regression_tests. Do I need to care about them seriously?

Here are the errors I got:

1708634249548.png

1708634289625.png

Any suggestions would be appreciated!

Thanks,
Xiang
 

xiangli

Xiang Li
Member
I think that you have the significant tests working now and can set that aside now. You might want to consider running the ECT test
and comparing to our baseline. CESM2 | Verification
Hi Jim,

Yes, I'm trying doing the validation now.

I downloaded input data for UF-CAM-ECT here:

1708981992523.png

And I tried running this:

1708982035338.png

There was an error when creating the case. Shall I edit the ensemble.py file?

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
This is another python porting issue. The variable rand_ints should be of type int but it's type float.
 

xiangli

Xiang Li
Member
This is another python porting issue. The variable rand_ints should be of type int but it's type float.
Hi Jim,

Yes, I changed line 30 to ippt = '{0:03f}'.format(ll) and the case could be created. However, the case failed after running for no more than 60 seconds, with the error messages:

1709139915394.png

1709139950476.png

Looking forward to your suggestions!

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
Look in the atm.log for clues about the namelist that it couldn't read. Is the filesystem with your run directory available on the compute nodes?
Try rebuilding with DEBUG=TRUE so that you get a traceback.
 

xiangli

Xiang Li
Member
Look in the atm.log for clues about the namelist that it couldn't read. Is the filesystem with your run directory available on the compute nodes?
Try rebuilding with DEBUG=TRUE so that you get a traceback.
Hi Jim,

Thanks for your suggestions! I set DEBUG=TRUE and reran the case.

Here is the cesm.log:

1709150395125.png

Here is the atm.log:

1709150438351.png

Is this related to the line inithist = 'NONE' in user_nl_cam?

1709150506409.png

Shall we add the following files into user_nl_cam?

1709150531305.png

Any suggestions would be appreciated!

Thanks,
Xiang
 

jedwards

CSEG and Liaisons
Staff member
I don't think that's it. It acts like there is a formatting error in the namelist file but I can't tell what it is? can you post your CaseDocs/atm_in file? I'm disappointed that it's not giving source line numbers or sharing the exact problem it's finding.
 

xiangli

Xiang Li
Member
I don't think that's it. It acts like there is a formatting error in the namelist file but I can't tell what it is? can you post your CaseDocs/atm_in file? I'm disappointed that it's not giving source line numbers or sharing the exact problem it's finding.
Hi Jim,

Please see the atm_in file attached.

Thanks,
Xiang
 

Attachments

  • atm_in.txt
    16.8 KB · Views: 1

jedwards

CSEG and Liaisons
Staff member
I don't see what the issue is. You may need to rerun with a debugger or add some print statements to find the exact problem.
 

xiangli

Xiang Li
Member
I don't see what the issue is. You may need to rerun with a debugger or add some print statements to find the exact problem.
Hi Jim,

Let me try rerun with --debugger. Here is another question I'd like to ask you.

I was trying doing a branch run based on the restart files from the FV2 BHIST 001 run. Before building the case, I copied the user_nl_* files to the case directory and modified the env_run.xml as follows:

1709134766751.png



1709134789447.png



Here is where I deposit the restart files:

1709134859829.png



However, the case failed, with the following messages:

cesm.log:

1709153683569.png

atm.log:

1709153714007.png

Are these errors similar to the previous one?

Appreciate your help!

Thanks,
Xiang
 

xiangli

Xiang Li
Member
I don't see what the issue is. You may need to rerun with a debugger or add some print statements to find the exact problem.
Hi Jim,

This bug was due to incomplete input file. Now it has been solved.

For the ECT test, cases could be successfully submitted:

1709485429970.png

However, there was an error possibly related to mpi:

1709485506930.png

cesm.log:

1709485539089.png

Since other cases such as B1850 and BHIST could run without problem, I am wondering why this error would happen.

Your suggestions would be greatly appreciated!

Thanks,
Xiang
 
Top