chris_fletcher@utoronto_ca
New Member
Hi, we're trying to port CCSM4 to the IBM Power6 in Toronto (where CCSM3 runs well) and the model builds ok but we can't get it to run. The error message in ccsm.log is:
Here are the details:
System: IBM Power6, AIX, xlf v12.1.0.2, NetCDF-4, IBM default MPI, LoadLeveller
Run: Compset B1850CN, 0.9x1.25_gx1v6, "generic IBM" case settings, 128PEs.
(NOTE: the error doesn't seem to depend on the details of the run; we've tried a few different configurations, all failed)
The atmosphere initializes ok, then it crashes while initializing the land model. The last few lines of lnd.log look like this:
Google suggested that adding -binitfini poe_remote_main to the linking step would provide more informative error messages, but this didn't change anything.
Thanks in advance for any suggestions.
0: INTERNAL ERROR : catalog was closed, or catalog was not initialized.
0: sayMessage will not print the error message.
which seems to be something strange at system-level -- has anyone seen it before? 0: sayMessage will not print the error message.
Here are the details:
System: IBM Power6, AIX, xlf v12.1.0.2, NetCDF-4, IBM default MPI, LoadLeveller
Run: Compset B1850CN, 0.9x1.25_gx1v6, "generic IBM" case settings, 128PEs.
(NOTE: the error doesn't seem to depend on the details of the run; we've tried a few different configurations, all failed)
The atmosphere initializes ok, then it crashes while initializing the land model. The last few lines of lnd.log look like this:
total runoff cells numr = 116332 numrl = 84511 numro = 31821
rtm decomp info proc = 0 begr = 1 endr = 7282 numr = 7282
proc = 0 begrl= 1 endrl= 6409 numrl= 6409
proc = 0 begro= 1 endro= 873 numro= 873
And the last few lines of ccsm.log (error message at the end):rtm decomp info proc = 0 begr = 1 endr = 7282 numr = 7282
proc = 0 begrl= 1 endrl= 6409 numrl= 6409
proc = 0 begro= 1 endro= 873 numro= 873
15: proc= 15 clump no = 1 clump id= 16 beg pft = 26685 end pft = 28331 total pfts per clump = 1647
1: rtm decomp info proc = 1 begr = 7283 endr = 14564 numr = 7282
1: proc = 1 begrl= 6410 endrl= 12430 numrl= 6021
1: proc = 1 begro= 874 endro= 2134 numro= 1261
7: rtm decomp info proc = 7 begr = 51011 endr = 58302 numr = 7292
7: proc = 7 begrl= 41437 endrl= 45596 numrl= 4160
7: proc = 7 begro= 9575 endro= 12706 numro= 3132
14: rtm decomp info proc = 14 begr = 102485 endr = 109838 numr = 7354
14: proc = 14 begrl= 75192 endrl= 80532 numrl= 5341
14: proc = 14 begro= 27294 endro= 29306 numro= 2013
15: rtm decomp info proc = 15 begr = 109839 endr = 116332 numr = 6494
15: proc = 15 begrl= 80533 endrl= 84511 numrl= 3979
15: proc = 15 begro= 29307 endro= 31821 numro= 2515
0:INTERNAL ERROR : catalog was closed, or catalog was not initialized.
0: sayMessage will not print the error message.
The only thing that ran successfully was compset X, where all models are dead. Every other configuration we've tried fails (e.g. compset C) with the same error each time.1: rtm decomp info proc = 1 begr = 7283 endr = 14564 numr = 7282
1: proc = 1 begrl= 6410 endrl= 12430 numrl= 6021
1: proc = 1 begro= 874 endro= 2134 numro= 1261
7: rtm decomp info proc = 7 begr = 51011 endr = 58302 numr = 7292
7: proc = 7 begrl= 41437 endrl= 45596 numrl= 4160
7: proc = 7 begro= 9575 endro= 12706 numro= 3132
14: rtm decomp info proc = 14 begr = 102485 endr = 109838 numr = 7354
14: proc = 14 begrl= 75192 endrl= 80532 numrl= 5341
14: proc = 14 begro= 27294 endro= 29306 numro= 2013
15: rtm decomp info proc = 15 begr = 109839 endr = 116332 numr = 6494
15: proc = 15 begrl= 80533 endrl= 84511 numrl= 3979
15: proc = 15 begro= 29307 endro= 31821 numro= 2515
0:INTERNAL ERROR : catalog was closed, or catalog was not initialized.
0: sayMessage will not print the error message.
Google suggested that adding -binitfini poe_remote_main to the linking step would provide more informative error messages, but this didn't change anything.
Thanks in advance for any suggestions.