Thanks.
I tested the test run and the machine does not seem to allow interactive run for more than 4 nodes.
$ #./create_newcase --case ~/cases/FW1850_SO2 --compset FW1850 --res f09_f09_mg17 --machine fram --project NN1004K
./create_test ERS.f09_g17.A --machine fram --project NN1004K
Testnames: ['ERS.f09_g17.A.fram_intel']
Creating test directory /cluster/work/users/xiangyuli/cesm/ERS.f09_g17.A.fram_intel.20190412_144817_hica32
RUNNING TESTS:
ERS.f09_g17.A.fram_intel
Starting CREATE_NEWCASE for test ERS.f09_g17.A.fram_intel with 1 procs
Finished CREATE_NEWCASE for test ERS.f09_g17.A.fram_intel in 1.499997 seconds (PASS)
Starting XML for test ERS.f09_g17.A.fram_intel with 1 procs
Finished XML for test ERS.f09_g17.A.fram_intel in 0.272974 seconds (PASS)
Starting SETUP for test ERS.f09_g17.A.fram_intel with 1 procs
Finished SETUP for test ERS.f09_g17.A.fram_intel in 1.875918 seconds (PASS)
Starting SHAREDLIB_BUILD for test ERS.f09_g17.A.fram_intel with 1 procs
Finished SHAREDLIB_BUILD for test ERS.f09_g17.A.fram_intel in 141.881331 seconds (PASS)
Starting MODEL_BUILD for test ERS.f09_g17.A.fram_intel with 4 procs
Finished MODEL_BUILD for test ERS.f09_g17.A.fram_intel in 23.417793 seconds (PASS)
Starting RUN for test ERS.f09_g17.A.fram_intel with 1 proc on interactive node and 32 procs on compute nodes
Finished RUN for test ERS.f09_g17.A.fram_intel in 2.760764 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /cluster/work/users/xiangyuli/cesm/ERS.f09_g17.A.fram_intel.20190412_144817_hica32
Errors were:
submit_jobs case.test
Submit job case.test
ERROR: Command: 'sbatch --time 00:59:00 -p normal --account NN1004K .case.test --skip-preview-namelist' failed with error 'sbatch: error: --nodes >= 4 required for normal and optimist jobs
sbatch: error: Batch job submission failed: Node count specification invalid' from dir '/cluster/work/users/xiangyuli/cesm/ERS.f09_g17.A.fram_intel.20190412_144817_hica32'
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
At test-scheduler close, state is:
FAIL ERS.f09_g17.A.fram_intel (phase RUN)
Case dir: /cluster/work/users/xiangyuli/cesm/ERS.f09_g17.A.fram_intel.20190412_144817_hica32
test-scheduler took 172.466439962 seconds
Yes, I have the write permission to all nodes.
Fresh simulations indeed work well.
This problem only occur when restarting a crashed simulation.
For crashed simulations, there is no folder "rest/".
So I copied all the restart files from the run directory.
Could this be a problem?