Main menu

Navigation

What is the "case.st_archive" used for? Why is the status of this submission always "PEND"?

6 posts / 0 new
Last post
liangpeng0405@...
What is the "case.st_archive" used for? Why is the status of this submission always "PEND"?

Hi,

I'm a rookie of the CESM, and recently I have tried to configure the test case "b.e20.B1850.f19_g17.test" according to the "Quick Started Guide". After the building of the test case, I tried to submit this case with "case.submit", and the results are as follow:


Creating component namelists
Finished creating component namelists
Check case OK
submit_jobs case.run
Submit job case.run
Submitting job script bsub  < .case.run --resubmit
Submitted job id is 55440
Submit job case.st_archive
Submitting job script bsub -w 'done(55440)' < case.st_archive --resubmit
Submitted job id is 55441
Submitted job case.run with id 55440
Submitted job case.st_archive with id 55441


I think these mean the submission is successful. In my opinion, the submission of "case.run" is used to run the model, but I'm wondering what the submission of "case.st_archive" is used for. Besides that, the status of "case.st_archive" is always "PEND" as follows:

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
55440   liangpe RUN   normal     login1      4*comput34  *.test.run Jun 18 17:01
                                             23*comput54
                                             28*comput53
                                             28*comput56
                                             28*comput55
                                             32*comput58
                                             1*comput57
55441   liangpe PEND  normal     login1                  *t_archive Jun 18 17:01

However, there're at least 200 more cpus that are availble for "case.st_archive", so why the "case.st_archive" is always "PEND".

Thank you very much.

Any sugguestions will be quite helpful for me.

 

 

jedwards

Case.st_archive is used to move the output from the case.run into the archive directory for post processing.  It is dependent on the successful completion of the model run 

and will stay in the queue until the model run completes.  

CESM Software Engineer

liangpeng0405@...

I'm greatly thankful for your reply which is really quite helpful for me. I still have another question. Is there any way to check the detailed status of the model execution (e.g. which time step has the model been integrated to now, which component of the model is running now)? I used the command "bpeek" (the batch system of my cluster is "lsf") to check the status, but only the submission information is illustrated as following:

<< output from stdout >>
Setting resource.RLIMIT_STACK to -1 from (-1, -1)
Generating namelists for /public/home/liangpeng/cases/b.e20.B1850.f19_g17.test
Creating component namelists
   Calling /public/home/liangpeng/cesm2/components/cam//cime_config/buildnml
CAM namelist copy: file1 /public/home/liangpeng/cases/b.e20.B1850.f19_g17.test/Buildconf/camconf/atm_in file2 /public/home/liangpeng/cesm2/scratch/b.e20.B1850.f19_g17.test/run/atm_in
   Calling /public/home/liangpeng/cesm2/components/clm//cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/components/cice//cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/components/pop//cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/components/mosart//cime_config/buildnml
   Running /public/home/liangpeng/cesm2/components/cism//cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/components/ww3//cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/cime/src/components/stub_comps/sesp/cime_config/buildnml
   Calling /public/home/liangpeng/cesm2/cime/src/drivers/mct/cime_config/buildnml
Finished creating component namelists
-------------------------------------------------------------------------
 - Prestage required restarts into /public/home/liangpeng/cesm2/scratch/b.e20.B1850.f19_g17.test/run
 - Case input data directory (DIN_LOC_ROOT) is /public/home/liangpeng/cesm2/inputdata
 - Checking for required input datasets in DIN_LOC_ROOT
-------------------------------------------------------------------------
2019-06-19 15:17:19 MODEL EXECUTION BEGINS HERE
run command is mpirun  -np 144 /public/home/liangpeng/cesm2/scratch/b.e20.B1850.f19_g17.test/bld/cesm.exe  >> cesm.log.$LID 2>&1 

<< output from stderr >>

However, when I run the "ROMS" oceanic model, I can use "bpeek" command to check the detailed status of the model, including the time step, input data information, output data information, et al.

Besides that, I have another question. As is shown in the "CIME user guide", we can control the processors and threads the case will use when executing the model. However, I found that the total number of cpus used to executing the "cesm.exe" is not calculated by simply summing up the "NTASKS" of each component (NTHREAD=1). So is there any method to calculate the total number of cpus that will be used?

Thanks again for the previous reply.

Best regards.

 

jedwards

cesm log output is written to various log files in the run directory.   Information about the progress of a currently running model is in the cpl.log 

Information about the progress of individual component models is in the component log for that component: atm.log, lnd.log, ocn.log, etc

each log file is appended with a job id and date so that it is unique.  

CESM Software Engineer

liangpeng0405@...

Thank you very much for your replies. They are really very helpful.

I have checked the log files in the run directroy. However, I found that the model got stuck for the following reason:

x1v6/forcing/o2_consumption_scalef_0.30_POP_gx1v6_20180623.nc
     (open_read_netcdf) nsize =        110
     (open_read_netcdf) len(work_line) =         80
 string too short; not enough room to read comment from /public/home/liangpeng/c
 esm2/inputdata/ocn/pop/gx1v6/forcing/o2_consumption_scalef_0.30_POP_gx1v6_20180
 623.nc
     (passive_tracer_tools:read_field_3D) reading RESTORE_INV_TAU_MARGINAL_SEA_ONLY from /public/home/liangpeng/cesm2/inputdata/ocn/pop/gx1v6/forcing/ecosys_restore_inv_tau_POP_gx1v6_20170125.nc

When the POP ocean model tried to read the tracer data, there is not enough room. Then, the model got stuck and did not continue to run (although the submitted job won't be killed by the batch system). Is this related to the hardware of the cluster? Is the room related to the definition of the "OMP_STACKSIZE" in the "config_machines" file (although the mpi used when running the model is indeed "mpich")? Besides that, when installing the "netcdf-c" and "netcdf-fortran", they are not configured and supported with "pnetcdf" (but supported with "hdf5"). Will the "pnetcdf" be helpful to decrease the room that is necessary to run the model?

Thanks again for the insighful suggestions.

Best regards.

klindsay

Please note that the message you are seeing from POP is a warning message, not an error message (details below). It should not be leading to a model hang. I suspect that something else is causing the hang that you are experiencing. Perhaps the other log files have useful information regarding the hang.

POP has fixed length character strings that netcdf attributes are read into. Before reading attributes from a file, POP checks to see if they’ll fit. If they won’t fit, POP skips the get attribute call, to avoid overrunning the character string buffer. A warning message is written when this happens. This is what you are seeing. This usually happens with long attributes like history, which appears to be the case here. I don’t think I’ve ever seen this indicate a true problem. As far as I know, there is enough space for the attributes that POP relies on.

Log in or register to post comments

Who's new

  • stans
  • ahadibfar@...
  • jskang@...
  • divanova@...
  • mrostami@...