Welcome to the new DiscussCESM forum!
We are still working on the website migration, so you may experience downtime during this process.

Existing users, please reset your password before logging in here: https://xenforo.cgd.ucar.edu/cesm/index.php?lost-password/

Error in running OM4_025 test case

abhisekc

Abhisek Chatterjee
New Member
Hi Marshall and Russ,

This is in continuation to our previous discussion in Google MOM6 forum. To help others, I have appended our previous conversion below my main message.

I am still unable to figure out the problem. Initially, the layout was not mentioned in the SIS_input. As you suggested, I included the below lines in SIS_input and SIS_layout:

LAYOUT = 32,18 !


The SIS_layout and MOM_layout is now identical and as given below:
LAYOUT = 32,18
IO_LAYOUT = 2,2
MASKTABLE = "mask_table.96.32x18" ! 32*18-96 = 480 PEs

Now while running, first I am getting a warning:
ARNING: MOM_file_parser : LAYOUT occurs more times than is permitted. Line: 'LAYOUT = 32,18' in file SIS_layout is being ignored.

and finally, the model is coming out with the same error,
---------
NOTE: MOM_domains_init: reading maskmap information from INPUT/mask_table.96.32x18
parse_mask_table: Number of domain regions masked in ice model = 96

FATAL: fms_io(parse_mask_table_2d): mpp_npes() .NE. layout(1)*layout(2) - nmask for ice model
----------

So, I now I have modifed the fms_io.F90 code (as Russ suggested) for the lines: 8444-8445 as given below:

-----
if( mpp_npes() .NE. layout(1)*layout(2) - nmask )then
write(stdoutunit,*)"npes=",mpp_npes(),",layout(1)="layout(1),",layout(2)="layout(2)
call mpp_error(FATAL, &
"fms_io(parse_mask_table_2d): mpp_npes() .NE. layout(1)*layout(2) - nmask for "//trim(modelname))
endif
-----

I hope this is correct. It is now compiling. I will update you with the outcome.

The fms.out file is attached with this message for reference (in case it helps in diagnosing the problem).

Thanks,

Abhisek Chatterjee
INCOIS






**************************************************OLD Messages*********************************************
Hi Marshall,

The big problem with this FMS error message is that it only tells you what the problem is when it knows what the full error is, why it occurred and has all the information at hand but would rather keep it a secret!

In fms_io.F90 (line 8451)we have

!--- make sure mpp_npes() == layout(1)*layout(2) - nmask
if( mpp_npes() .NE. layout(1)*layout(2) - nmask ) call mpp_error(FATAL, &
"fms_io(parse_mask_table_2d): mpp_npes() .NE. layout(1)*layout(2) - nmask for "//trim(modelname))


The problem is detected so why not print out mpp_npes(), layout(1), layout(2) and nmask so the user can quickly see the problem rather than hunting down things all over the place? It's done for the check just above it.

Cheers,
Russ

marsha...@noaa.gov


to MOM Users Mailing List

480 CPUs is correct (32*18 - 96) when the mask table is present. Based on your error, I don't think anything else is required. It may be that your LAYOUT is set in MOM_input but not SIS_input. It could also be an issue with the MPI launcher flags, e.g. maybe you are specifying nodes and it is implicitly assigning more than 480 CPUs.

First I would suggest looking in `MOM_parameter_doc.layout` and `SIS_parameter_doc.layout` and confirming that LAYOUT is 32, 18 and that MASKTABLE points to the correct file in both MOM and SIS. (You could also explicitly set these in MOM_input and SIS_input).

Next, try to confirm that you are launching MPI with 480 ranks.

If none of that works, then we may need more information.

We are currently trying to migrate MOM6 support to the CESM forums, so you may want to ask your question over there: MOM6


On Monday, June 7, 2021 at 3:13:11 AM UTC-4 chatterj...@gmail.com wrote:
Dear MOM community,

I am new to MOM6 and currently exploring test cases to understand the process of setting up a regional model for my applications.

I have now encountered an error while running the OM4_025 test case under the ice_ocean_SIS2 experiments. It seems this error is coming from a mismatch with the layout and number of processor allocated.

I have tried using npes=576 processors (18 cores with 32 processors each) as the layout suggested. Also, tried with npes= 480 with the same number of cores allocated. But both the times it came out with the same error.

---
NOTE: MOM_domains_init: reading maskmap information from INPUT/mask_table.96.32x18
parse_mask_table: Number of domain regions masked in ice model = 96

FATAL: fms_io(parse_mask_table_2d): mpp_npes() .NE. layout(1)*layout(2) - nmask for ice model
---

Can someone please suggest if I have to make any other modification somewhere to run this test case?

Thank you,

With best regards,
Abhisek
 

Attachments

  • fms.txt
    400.3 KB · Views: 0

abhisekc

Abhisek Chatterjee
New Member
Dear Marshall, Russ, and MOM6 friends,

I have now edited the fms_io.F90 code for the lines: 8444-8445 to write out the number of npes and layouts as given below:

if( mpp_npes() .NE. layout(1)*layout(2) - nmask )then
write(stdoutunit,*)"npes=",mpp_npes(),",layout(1)=",layout(1),",layout(2)=",layout(2)
call mpp_error(FATAL, &
"fms_io(parse_mask_table_2d): mpp_npes() .NE. layout(1)*layout(2) - nmask for "//trim(modelname))
endif


But it is coming out with the below error:

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor

Can you please suggest how to fix this?

Thanks,
Abhisek Chatterjee
INCOIS
 

adcroft

Alistair Adcroft
New Member
To clarify one issue above: the parameters can be read from more than one file. The list of parameter files to read is set in `input.nml`. In the case of OM4_025 we set things up with more than the usual number of input files (see OM4_025/input.nml) to facilitate different layouts for testing and production. The layout parameters are in the *_layout files.

Documentation on the parameter files is at Run-time Parameter System — MOM6 0.2a3 documentation
 
Top