I tried historical compset on two clusters:
./create_newcase --case /home/meisam/scratch/cases/histJuneTestBeluga --compset IHistClm50BgcCrop --res f19_g17 --machine beluga --walltime 02:00:00 --run-unsupported
AND
./create_newcase --case /home/meisam/scratch/cases/histJune --compset IHistClm50BgcCrop --res f19_g17 --machine narval --walltime 02:00:00 --run-unsupported
The config_machine file and other related files are attached for both runs (on two clusters).
After submitting the job on two clusters (Beluga and Narval) they both failed after running for couple of minutes with the following message:
on Beluga : " case.run error
ERROR: RUN FAIL: Command 'srun -n 80 --ntasks-per-node=40 /scratch/meisam/histJuneTestBeluga/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed "
on Narval: "case.run error
ERROR: RUN FAIL: Command 'srun -n 128 --ntasks-per-node=64 /scratch/meisam/histJune/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed "
The two log file for these errors are attached also.
What do you think is the issue here? Do I need to change the number of tasks per node for both configs, because these numbers (80 and 128) are above the limit of these two clusters. I don't understand the error. what can we understand from the above error and those two log files (CESM log files that I have attached)?
Thank you for your help
./create_newcase --case /home/meisam/scratch/cases/histJuneTestBeluga --compset IHistClm50BgcCrop --res f19_g17 --machine beluga --walltime 02:00:00 --run-unsupported
AND
./create_newcase --case /home/meisam/scratch/cases/histJune --compset IHistClm50BgcCrop --res f19_g17 --machine narval --walltime 02:00:00 --run-unsupported
The config_machine file and other related files are attached for both runs (on two clusters).
After submitting the job on two clusters (Beluga and Narval) they both failed after running for couple of minutes with the following message:
on Beluga : " case.run error
ERROR: RUN FAIL: Command 'srun -n 80 --ntasks-per-node=40 /scratch/meisam/histJuneTestBeluga/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed "
on Narval: "case.run error
ERROR: RUN FAIL: Command 'srun -n 128 --ntasks-per-node=64 /scratch/meisam/histJune/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed "
The two log file for these errors are attached also.
What do you think is the issue here? Do I need to change the number of tasks per node for both configs, because these numbers (80 and 128) are above the limit of these two clusters. I don't understand the error. what can we understand from the above error and those two log files (CESM log files that I have attached)?
Thank you for your help
Attachments
-
CESM log Narval.txt284.9 KB · Views: 8
-
config batch narval.txt1.2 KB · Views: 0
-
config compiler beluga.txt567 bytes · Views: 0
-
config compiler narval.txt575 bytes · Views: 0
-
config machine beluga.txt2.4 KB · Views: 1
-
config machine narval.txt2.4 KB · Views: 1
-
CESM log Beluga.txt262.1 KB · Views: 4