Hello,
I tried to port CESM on Azure CycleCloud with Intel Xeon Platinum 8168 processor cores per nodes. Building of the software works fine and also the run completes on the example case (compset: B1850, res: f19_g17) with up to two nodes. Unfortunately, if you use three nodes or more, the application lifelocks, that means, that the output files (in the scratch directory) are only created up to a specific time and after that they remain unchanged, although the CPU usage is at 100 percent. This happens with Intel Parallelstudio as well as with GNU/OpenMPI as compiler.
All relevant files are in the /mnt/nfs_shares/homes/cesm/ folder. The /mnt/nfs_shares/ directory is a NFS directory which is shared across all nodes. Because I thought that this is a problem with NFS, I built NetCDF without PnetCDF, but the problem still exists. Or is there some extra stuff I missed?
For completeness, I attached all important configuration files and scripts in a zip (this time without PnetCDF because I thought this will help to solve the problem).
Best regards,
Gabriel
I tried to port CESM on Azure CycleCloud with Intel Xeon Platinum 8168 processor cores per nodes. Building of the software works fine and also the run completes on the example case (compset: B1850, res: f19_g17) with up to two nodes. Unfortunately, if you use three nodes or more, the application lifelocks, that means, that the output files (in the scratch directory) are only created up to a specific time and after that they remain unchanged, although the CPU usage is at 100 percent. This happens with Intel Parallelstudio as well as with GNU/OpenMPI as compiler.
All relevant files are in the /mnt/nfs_shares/homes/cesm/ folder. The /mnt/nfs_shares/ directory is a NFS directory which is shared across all nodes. Because I thought that this is a problem with NFS, I built NetCDF without PnetCDF, but the problem still exists. Or is there some extra stuff I missed?
For completeness, I attached all important configuration files and scripts in a zip (this time without PnetCDF because I thought this will help to solve the problem).
Best regards,
Gabriel