replacing CLM5.0 vegetation data with my own dataset – any references to share?

chenyihui

chen
New Member
Hi everyone,
I'm new to CLM and trying to do something but have no idea where to start. I'd really appreciate some guidance.
What I want to do:
I have my own vegetation data (land cover data for China, IGBP classification, 17 types). I want to use this data to replace the default vegetation data in CLM5.0, and then run an I2000Clm50BgcCropGs case.
My situation:
  • I have CLM5.0 installed and have run simple cases successfully
  • My vegetation data is already in netCDF format
  • I know CLM stores vegetation data in the surface dataset file
  • But I have no idea what to do next
My questions:
  1. Are there any tutorials, manuals, or posts that explain how to do this?
  2. What is the general workflow? What tools do I need?
  3. Has anyone done something similar and can share some references or experience?
I'm really new to this, so beginner-friendly resources would be very helpful.
Thank you!
 

chenyihui

chen
New Member
Dear Scientist:

I have successfully generated a surface dataset for CLM5.0 following the standard workflow, but the vegetation data in it is the default one. Now I would like to replace it with my own vegetation data. I'm not sure about the correct workflow and would appreciate any advice.

My main questions are:
  1. Should I replace the vegetation data after generating the surface dataset (i.e., directly modify variables like PCT_NAT_PFT in the output .nc file), or should I replace the original vegetation input data before running ./mksurfdata.pl?
  2. Which approach is more recommended? Is there a standard procedure or any references I can follow?
 
Vote Upvote 0 Downvote

slevis

Moderator
Staff member
Many others have done this before you, and it's doable; however, it's not something that we support, so you may not find much relevant documentation, though I suspect that you will find threads in the Forums discussing the experiences of others. Ultimately, you get to pick the method that you prefer. Either method should work. I definitely recommend reading through relevant threads in these Forums to learn from the experiences of others.
 
Vote Upvote 0 Downvote

chenyihui

chen
New Member
Many others have done this before you, and it's doable; however, it's not something that we support, so you may not find much relevant documentation, though I suspect that you will find threads in the Forums discussing the experiences of others. Ultimately, you get to pick the method that you prefer. Either method should work. I definitely recommend reading through relevant threads in these Forums to learn from the experiences of others.

Thank you. I noticed that there are few posts providing a solution process for this issue, so I will continue searching for answers. If possible, could you recommend similar posts to me if you come across them later? I would greatly appreciate it.
 
Vote Upvote 0 Downvote

chenyihui

chen
New Member
Many others have done this before you, and it's doable; however, it's not something that we support, so you may not find much relevant documentation, though I suspect that you will find threads in the Forums discussing the experiences of others. Ultimately, you get to pick the method that you prefer. Either method should work. I definitely recommend reading through relevant threads in these Forums to learn from the experiences of others.

Hi Slevis,

I have decided to try modifying the surfdata file directly using a programming tool (such as Matlab or Python).
My dataset is a 20-class PFT map (one type per grid cell), but I am not entirely sure how to map it to the various vegetation-related variables in the surfdata file.

My PFT data consists of the following 20 types:
1775047436112.png

I would like to ask the following questions:
  1. Should my types 1–14 and type 19 (barren land) be assigned to PCT_NAT_PFT?
  2. Should my type 15 (crop) be assigned to PCT_CFT?
  3. Should my types 16, 17, 18, and 20 (wetland, urban, glacier, water) be assigned to PCT_WETLAND, PCT_URBAN, PCT_GLACIER, and PCT_LAKE, respectively?
  4. Since each grid cell has only one PFT type, should I simply set the corresponding variable to 100%?
  5. Do I need to modify the phenology variables such as MONTHLY_LAI?
I would greatly appreciate it if you or someone around you could help answer these questions. Thank you!

 
Vote Upvote 0 Downvote

oleson

Keith Oleson
CSEG and Liaisons
Staff member
The mapping of other land classifications to CLM's classification is a science question that you will need to resolve yourself in the end.
However, I think your general approach is reasonable.
So,

  1. Should my types 1–14 and type 19 (barren land) be assigned to PCT_NAT_PFT
CLM's classification is here: 2.2. Surface Characterization, Vertical Discretization, and Model Input Requirements — ctsm release-clm5.0 documentation
So if there is a one-to-one correspondence then yes. Seems like type 19 could be assigned to bare soil which is pft 0.
  1. Should my type 15 (crop) be assigned to PCT_CFT
Either 15 (unirrigated) or 16 (irrigated) or some split between the two.
  1. Should my types 16, 17, 18, and 20 (wetland, urban, glacier, water) be assigned to PCT_WETLAND, PCT_URBAN, PCT_GLACIER, and PCT_LAKE, respectively?
We don't model wetlands in CLM5 so you'll have to figure out what to assign that to.
There are three urban landunits, tall building district, high density, and medium density, so you'll either need to assign your urban type to one of those or split it among the three somehow.
  1. Since each grid cell has only one PFT type, should I simply set the corresponding variable to 100%?
Yes.
  1. Do I need to modify the phenology variables such as MONTHLY_LAI?
MONTHLY_LAI etc is only used in satellite phenology (SP) mode. If you are running in that mode then it seems like you could start with the suppllied values, unless you have better data. I think that in general there will be valid values for every pft in every gridcell, except possibly for climate distinctions, e.g., you may not find any reasonable values for boreal trees in the tropics if your dataset specifies that for some reason, and it's possible that some values will simply have been filled in with some constant value for lack of data. You'll need to verify that your MONTHLY_LAI etc is reasonable for all of the pfts in your entire domain.

As you can see from the Forums, there are the following rules, I think these hold for CLM5:
PCT_CROP + PCT_NATVEG + PCT_LAKE + PCT_WETLAND + PCT_GLACIER + PCT_URBAN = 100
The sum of PCT_NAT_PFT should be 100. I think you'll find that even if, e.g., PCT_GLACIER is 100, you'll still need PCT_NAT_PFT to sum to zero, by putting 100 for bare ground.
I also think that in CLM5, the sum of PCT_CFT should be 100 everywhere. Those pfts (cfts) will only be used if PCT_CROP > 0.
 
Vote Upvote 0 Downvote

chenyihui

chen
New Member
The mapping of other land classifications to CLM's classification is a science question that you will need to resolve yourself in the end.
However, I think your general approach is reasonable.
So,

  1. Should my types 1–14 and type 19 (barren land) be assigned to PCT_NAT_PFT
CLM's classification is here: 2.2. Surface Characterization, Vertical Discretization, and Model Input Requirements — ctsm release-clm5.0 documentation
So if there is a one-to-one correspondence then yes. Seems like type 19 could be assigned to bare soil which is pft 0.
  1. Should my type 15 (crop) be assigned to PCT_CFT
Either 15 (unirrigated) or 16 (irrigated) or some split between the two.
  1. Should my types 16, 17, 18, and 20 (wetland, urban, glacier, water) be assigned to PCT_WETLAND, PCT_URBAN, PCT_GLACIER, and PCT_LAKE, respectively?
We don't model wetlands in CLM5 so you'll have to figure out what to assign that to.
There are three urban landunits, tall building district, high density, and medium density, so you'll either need to assign your urban type to one of those or split it among the three somehow.
  1. Since each grid cell has only one PFT type, should I simply set the corresponding variable to 100%?
Yes.
  1. Do I need to modify the phenology variables such as MONTHLY_LAI?
MONTHLY_LAI etc is only used in satellite phenology (SP) mode. If you are running in that mode then it seems like you could start with the suppllied values, unless you have better data. I think that in general there will be valid values for every pft in every gridcell, except possibly for climate distinctions, e.g., you may not find any reasonable values for boreal trees in the tropics if your dataset specifies that for some reason, and it's possible that some values will simply have been filled in with some constant value for lack of data. You'll need to verify that your MONTHLY_LAI etc is reasonable for all of the pfts in your entire domain.

As you can see from the Forums, there are the following rules, I think these hold for CLM5:
PCT_CROP + PCT_NATVEG + PCT_LAKE + PCT_WETLAND + PCT_GLACIER + PCT_URBAN = 100
The sum of PCT_NAT_PFT should be 100. I think you'll find that even if, e.g., PCT_GLACIER is 100, you'll still need PCT_NAT_PFT to sum to zero, by putting 100 for bare ground.
I also think that in CLM5, the sum of PCT_CFT should be 100 everywhere. Those pfts (cfts) will only be used if PCT_CROP > 0.
Thank you so much! I understand now. Next, I'll move on to the actual operation.
 
Vote Upvote 0 Downvote
Back
Top