With a Little Help From A.I. Friend

Over the past several weeks my placement has gained momentum and gone through several stages. Once we had the datasets we needed it, was time to build and train a model to best identify varying types of tree mortality.

Mucho Mortality Dataset

The leaders at Salo created early-stage mortality datasets for large areas of California using LiDAR tree hight in conjuncture with high resolution NAIP and Rapideye spectral data. We decided our next step would be to improve the irregularities and inconsistencies in these datasets and get one, really beautiful, dataset to describe forest mortality in Sierra National Forest. Firstly, I worked to assess the agreement between the NAIP and Rapideye datasets. I found that the NAIP data had higher success in identifying mortality within forested areas on the scale of individual tree crowns, but struggled to identify larger areas of dead trees from old fires or beetle outbreaks. Fortunately, the Rapideye dataset was more successful in these areas (Photos below).

Mortality dataset comparison: Tan = both live, blue = NAIP dead RE live, green = RE dead NAIP live, red = both dead

NAIP True color image of above mortality agreement comparison .

Model Selection

Once we had a pretty good mortality dataset that we felt described dead trees with high confidence, we took it a step further and masked out the ground (yeah, the ground) using California LiDAR data. This gave us a treecover map for California which was used to define the sampling boundaries for our model. The model I tested as a part of my contribution to the larger project was developed by Dr. Phil Brodrick. The EcoCNN model applies a convoluted nerual network (CNN) machine learning approach to ecological applications. CNNs are particularly useful when identifying biophysical characteristics with unique spatial orientations. The basic idea behind a CNN is: parse through a feature dataset (spectral bands) with corresponding response dataset (binary dead/live tree) and record the feature signals at the response sites. Then, basically scramble these signals with various filters (convoluting them) and identify any patters that arise before then un-mixing them, continuing to look for patterns. This scrambling/filter application is done based on hierarchal patterns in data architecture and was inspired by the way our neurons interpret the stuff we see!

Overall, the CNN constructed models did pretty well. With relatively little input and parameter tuning, we were able to improve consistency in the mortality datasets. Initial applications of the model significantly under-predicted mortality, but this was improved after applying the treecover sampling boundaries to the feature datasets and increasing the sampling size when training the model.

EcoCNN output example where the entire extent is labeled with the confidence each pixel is a dead tree. Darker reddish colors indicate higher likelihood of dead tree pixels.

So, now what..?

Great question. During this last few weeks I’m hoping to scale these improved mortality datasets up to Sentinel-2 and Landsat-8 resolutions. Once this happens, we’ll be able to assess and compare different spectral patterns associated with conditions of tree physiology and do so throughout various stages of mortality. Meaning lots of cool data visualization (pretty pictures and graphs).