Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

Infinite hang occurs in CPL during initialization or model execution

Kihang Youn

Kihang Youn
New Member
Hi all,

Hello, I get intermittently stuck in infinite hang while running the model.
Based on the model log, it stops at cpl.log and I don't know if that helps, but it stops at the (seq_mct_drv) : creating gsmap_ax, creating dom_ax part.
The problem is that if I run it over and over again, sometimes it works well.
Is there anything I can check for more debugging? Or can there be such a case like this?

Best Regards,
Kihang
 

fischer

CSEG and Liaisons
Staff member
Hi Kihang,

Since it runs sometimes, that leads me to believe that it's a system issue. You can turn on debugging but using ./xmlchange DEBUG=TRUE.
Then rebuild and run, you might get more information about the hang. But please read the following about information to include so we
may better assist you.


Thanks
Chris
 

Kihang Youn

Kihang Youn
New Member
Hi Chris,

Here are my log files from the run and I will give you details more.

Error depending on process combination

When I did it with the following combination, it worked without a problem.
- NTASK_OCN=1140, NTASK_ATM=6460, NTASK_CPL=6460
But when I changed the process combination, the problem occurred.
- NTASK_OCN=2280, NTASK_ATM=5320, NTASK_CPL=5320

Source code location where the model stops

As a result of adding a little print statement to the code, I found that seq_map_map(mapper_Co2x(eoi), dom_oo(eoi)%data, dom_ox%data, In the msgtag=~) function, we saw that the model was stopped. In the seq_map_mod.F90 source code, it stops at mct_rearr_rearrange(line 483, cesm1_2_2_1).

Model log file

And the model log file is also attached.


Best Regards,
Kihang
 

Attachments

  • cesm_forum_kyoun.zip
    98.3 KB · Views: 5

fischer

CSEG and Liaisons
Staff member
Hi Kihang,

If you don't mind me asking, why are you using cesm1_2_2_1 and not a newer cesm2 version?

Chris
 

Kihang Youn

Kihang Youn
New Member
Hi Chris,

The model version, compset, and resolution I want to optimize are fixed, so it is impossible to change the version. :-(

I checked that it was stuck in the rearrage function MPI_WAITANY, and it seems to be because of the component's comm_world in my opinion.

Let me check it by adjusting the number of cpl processes.

Best Regards,
Kihang
 
Top