Scheduled Downtime
On Tuesday 24 October 2023 @ 5pm MT the forums will be in read only mode in preparation for the downtime. On Wednesday 25 October 2023 @ 5am MT, this website will be down for maintenance and expected to return online later in the morning.
Normal Operations
The forums are back online with normal operations. If you notice any issues or errors related to the forums, please reach out to help@ucar.edu

problem reading new data file in CESM1.2.0

Hi,  I am trying to read a new input data file in CESM1.2.0 by adding my own subroutines which functions well on desktop computers.   The subroutine is called by tphysac and I modified the subroutine by using getunit and freeunit.  When the subroutine is activated, the model will report a series of errors and run very slowly, eventually will exceed the runtime limit and exit.  Any help will be appreciated.  FengThe error message in the CESM log is in the following: 0:forrtl: severe (29): file not found, unit 72, file /glade/scratch/tian/F_2000_WACCM_Basecase/run/fort.72   0:Image              PC                Routine            Line        Source   0:libirc.so          00002B5D6F628A1E  Unknown               Unknown  Unknown   0:libirc.so          00002B5D6F6274B6  Unknown               Unknown  Unknown   0:cesm.exe           0000000001A4C0F2  Unknown               Unknown  Unknown   0:cesm.exe           00000000019CCA2C  Unknown               Unknown  Unknown   0:cesm.exe           00000000019CBF4C  Unknown               Unknown  Unknown   0:cesm.exe           00000000019FCAE2  Unknown               Unknown  Unknown   0:cesm.exe           0000000000E39C8A  openread_                  79  tf_OpenRead.f90   0:cesm.exe           0000000000CE59EE  tf_photo_co2_             150  tf_photo_CO2.f90   0:cesm.exe           000000000084D2D3  tf_test_mp_tf_tes         278  tf_test.f90   0:cesm.exe           0000000000760AEA  physpkg_mp_tphysa        1452  physpkg.F90   0:cesm.exe           000000000075F6DA  physpkg_mp_phys_r        1136  physpkg.F90   0:libiomp5.so        00002B5D6FAED003  Unknown               Unknown  Unknown  The program to open, read, and close the data file are in the following (line 79 in the tf_OpenRead.f90 is the READ(unit_2222..) statement):    if (masterproc) then      unit_2222  = getunit()      OPEN(unit_2222,FILE=TRIM(DIRDATA)//'/H2SO4.DAT', &     &                     status='UNKNOWN', iostat=IERR )      if( IERR /= 0 ) then       write(iulog,*) 'tf_open_init: failed to open unit_2222,&     &   error=', IERR       call endrun      end if !      READ(unit_2222,8003) PresSatH2O,PresSatH2SO4,TTAB,FTAB      CLOSE(unit_2222)       call freeunit(unit_2222)    endif  8003 FORMAT(1P6E12.5)
 

santos

Member
I'm very sorry for the delay; I somehow missed this post earlier.
My best guess is that the file you are trying to open is in the wrong location, and/or DIRDATA is being set incorrectly. If you open the file with status="OLD", not status="UNKNOWN", it should give a more specific error if the file is missing. I would make sure that the file's location and name matches the file that the code is trying to open; if necessary, you can print DIRDATA to the log to make sure that the location correct.
 
Hi Santos,I have been out of town for some time and just found your response to my question. Thank you. The problem is not where the files are. The files exist and can be read all right. That being said, there are some warning messages in the cesm log file.  For example, 168:forrtl: severe (39)" error during read, unit 0, file stderrsometimes I get a message saying:1:forrtl: Bad file descriptor Another related issue is: should I only read in my data for master processors?  When I do this, can all processors get the input data? If I do not do this, the input data file will be accessed by many processors and the have to stay in a queue, which takes time. Currently I am forcing the data reading to be done at WACCM step zero. Is this right? Look forward to hearing from you soon! regards,Feng
 

santos

Member
This is very unclear. These messages are different from the ones you mentioned originally. Furthermore, the code from your first post had "if (masterproc)", so processor 168 shouldn't be running that part at all. Either there's some bug elsewhere in your code, or there's a problem with the system you're running on, or you've changed the code without mentioning it. Regardless, no process should ever read from unit 0.Since these issues are in your own modifications, and not the released CESM code, I can only help to a limited extent. You will need to figure out where in your code these errors occur, and check that the open and read statements are receiving the arguments that you think they are getting (on the target system, *not* your desktop). As I said before, it would also help to make sure that you use status="OLD" for all required inputs, not status="UNKNOWN".Also, if you have MPI on your desktop, you can try running with multiple processes on your desktop. If your code only works with one process, I suspect that your code is not correct for parallel execution.If you find that one of the CAM routines is not working how you expect, please let us know. If you find a problem with your system or its MPI implementation, you will have to file a bug report with whoever administers your machine.As for your question about the proper time to read an input file; you should read files during "init" phase. That is, the code to read this file should be in its own subroutine that is called by "phys_init" in the "physpkg" module, and it should store the data in a variable that can be accessed by your other code later.If the data is fairly small, you can just let all the processes read it. But if it is not small, you will have to have the master process read it and then broadcast the information to other processes using MPI. If you use a parallel I/O library, like the PIO library bundled with CESM, that will handle the communication for you, but you will have to learn how to use the library properly to get the information you need distributed in the right way.
 
Hi Sean, Thanks for the responses.There are some changes from my previous model one month ago. That's the source of the confusions. First, when I read in my data file using "masterproc", it seems that some processors did not get the data (some Nans in the results). So I switched to allow every processor to read the data files and that's when the error messages appear.Second,  I am using Yellowstone at this moment. Who is  administering MPI on Yellowstone?Third, I am using status="OLD" for all required inputs, not status="UNKNOWN". Fourth, what should I do in order to open a file for output? Should I open it in the phys_init subroutine as well? Should each processor open one or should the master processor do this?    Thank you! regards,Feng
 

santos

Member
Hmm. It seems that some processes are not using a valid unit id (e.g. from "getunit"), so that's what I would check first.CISL would be the ones to talk to about yellowstone (cislhelp@ucar.edu). Since yellowstone is one of the machines we test most often, issues with the yellowstone port should be rare, but there's a small chance that you could find a new issue.Where you should output a file depends on why you need to output it. You should almost never open your own file at all, since addfld and outfld are used to output history data for you, and all debugging messages should be printed to "iulog". If you need your output in a different form from the CAM history file, you should output to CAM history during the run, and postprocess the data into a different format with your own tools later.However, if there is an exceptional case where you do need to output the data yourself, you should probably open and write the file in the same routine where your data is generated. In that case, you should definitely not have every process writing to the same file. I recommend using PIO to output the data in this case.
 

santos

Member
Sorry, I want to clarify. If there is a problem with the yellowstone port of CESM, that should be brought up in this forum, since that's a machine that CSEG supports. However, if there's a problem with yellowstone itself, or basic software on yellowstone (compilers, MPI, or LSF), that should be submitted to CISL.
 
Hi Sean, Thanks for the helpful suggestions and comments.  I have followed your suggestion to let all of my data reading subroutines to be called from phys_init.  In comparison, my old program reads in the data during the run from tphysac. The two version use the same data reading subroutine and are called by each column in each chunk.I added two timers in the run before and after the photochemistry is called, which includes data reading in the old version but does not have data reading in the new version.In the old version, the model finishes the data reading and the photochemistry calculations for one column  in about 30~40 seconds.  In the new version with data reading in the init phase, the model runs very slowly. the time difference between them is more than 3 hours!!!  This is very strange.Because the model output files are too big, I cut and paste the timing parts to a file and attach it to this post.The following is how I read in my data during the init phase. A subroutine called tf_init contains the following and tf_init is called by phys_init:       do I_Chunk=begchunk,endchunk
        tf_state = tf_state_all(I_Chunk)
        lchnk = tf_state%lchnk
        ncol = tf_state%ncol        do n_tf_col = 1, ncol
         TF_COL = n_tf_col
         TF_CHUNK = lchnk
         WRITE(IULOG,*) "chunk#=", lchnk, ncol, pcols, "col#=", n_tf_col
!
       CALL OPENREAD(FCO2_V(1))
         DO tf_J = 1, pver
       tf_state%q(n_tf_col,tf_J,indx_O_TF) =   USOL(LO,pver-tf_J+1)*rmass_O_TF/29.......        enddo   ! end ncol loop. this is to store USOL into state.
!
        tf_state_all(I_Chunk) = tf_state
!
      enddo     ! end chunk loopIN the old version of the model, I just moved the OPENREAD statement to  the beginning of the photochemistry calculation subroutine. Could you please give me some hints on what is going wrong with my model? Thank you! regards,Feng 
 

santos

Member
There's not enough information here to determine the problem. Perhaps you are doing extra copies in the "new" code, or accidentally doing something in every column instead of once per thread. I would not store any more data from the file than you absolutely have to, or multiple copies of the same data. (And you should definitely not store physics_state objects, unless you absolutely have to.)Try to change the "new" and "old" codes so that they look as similar as possible, and copy and store data in the exact same places. (So do not do copies of state objects unless your old code did that as well.) This should narrow down the list of potential problems.
 
HI Sean,The issue is that the data stored in state%q in the init phase are not passed to the column chemistry subroutine correctly. Although I added my own init subroutines at the very end of the phys_init subroutine and store the data in the state%q structure, the data there is reset somehow.  Now because all columns in one chunk share the same memory, any change made to the mixing ratio profiles in a column will propogate to the next column in the same chunk, if state%q, or other structure like it, is not used to initialyze the mixing ratio profiles in the next column.So the question is whether the mysterious resetting of state%q is only an issue during the initialization phase, or occurs at other places. Do you have any comments?regards,
Feng
 

santos

Member
Well, the state is really not supposed to be changed at initialization; at the beginning of the run the state is determined by initial conditions or restart files. After that, the tracers are changed by all kinds of processes, so you can't count on those being preserved until your particular package runs. So my advice depends on the question: what you are intending to do when you overwrite state%q using data from your file?1) If you are trying to change the initial condition for some tracers, the best way to do that is by changing the initial condition file before the run even starts. You can make whatever changes you want to the initial condition file, and not worry about doing this inside CAM.2) If you are trying to prescribe the concentrations of some tracers every time step, so no other parameterization is allowed to set them, you will have to do something similar to WACCM's specified chemistry ("waccm_ghg"). In that case, the tracers may not be kept in state%q at all, but instead are generated from input files inside the chemistry routines.3) If you are trying to add a new physical and/or chemical process, where CAM will treat the tracers normally, but you want to introduce a new package that produces a tendency on them, then you should not be setting state%q directly, because that will disregard the data in the initial conditions and/or the other physical processes. You should instead use a "ptend" object to set the rates of your processes, like the other physics parameterizations do. Furthermore, you should only be storing enough data to produce these tendencies later in tphysac; that might not be the entire array of tendencies, but instead it might be just the raw data you've read from the file.I hope that helps.
 

santos

Member
I'm glad to see that you've worked this out. I believe that you called my office last week, but I was in Illinois at the time.
 
Top