NDVI from First Principles

Introduction: The Meaning of NDVI

The Normalized Difference Vegetation Index (NDVI) is a widely-used metric for quantifying the health and density of vegetation on Earth because it is robust, versatile and easy to interpret. It is robust because it has a high correlation with ground truth, irrespective of the type of vegetation in question. It is versatile because it has application in a wide range of fields.

It is easy to interpret, not only because of its simplicity but because of its near linearity. By construction, NDVI will be a value between -1 and 1. A region with absolutely nothing growing in it will have an NDVI of zero. As you move across either time or space, NDVI will increase in proportion to the quantity and health of the vegetation in the region. It will reach its maximum value of 1.0 if the region is entirely covered with dense, healthy vegetation. NDVI values less than 0 suggest a lack of dry land; a lake or ocean will have an NDVI of -1.

The conceptual simplicity of NDVI, while attractive, can also be a trap for practitioners: Interpreting NDVI requires a proper theoretical understanding of how it is constructed so that one can know when, where and how it can be deceptive. This article presents NDVI from first principles so as to equip a practitioner with a proper understanding of the index and how to use it.

Derivation of the NDVI Formula

NDVI follows from a scientific understanding of how different wavelengths of light are reflected (or not) by plants. There are two bands (red and near-infrared) that are key. First consider red light: if you shine it on a plant, not much of it will be reflected back because chlorophyll, (like any green matter), absorbs it.  Hence, if you measure the amount of red light that is reflected back to you, you are implicitly measuring “greenness”.

There is another wavelength that is useful: If you shine near-infrared light on healthy plants most of it will be reflected back. This is because the particular way in which plant cells are organized prevents near-infrared light from being absorbed. Thus the amount of near-infrared light reflected back in any region on Earth will vary with the amount of healthy plant cells present. (Whereas the amount of red light reflected back will vary inversely.)

For any particular location, if you measure the reflectance (in lumens per square meter) at the near-infrared (NIR) and red (RED) bands, you can take a simple measurement:

\(NIR - RED\)

This difference will vary proportionately with the quantity and quality of the vegetation present. You can do this using sensor data from a satellite. The only difference being that the sensor output is a number ranging from 0 to 1, where 1 implies 100% of light is reflected back and 0 means none is.

There is one problem however: the measure will also vary with the intensity of the light. (When light intensity is 10% higher, so too is the difference between NIR and RED.) But this is easily corrected by normalizing for the total intensity (NIR+RED), hence the term “normalized difference”:

\[NDVI = \frac{NIR - RED}{NIR + RED}\]

NDVI is that simple; it is an elegant, unitless measurement that, by construction, will vary between -1 and 1, proportional to the health and density of the vegetation in the area being measured:

Limitations of NDVI

NDVI has limitations and the use of the index without regard for them can lead directly to flawed analyses.

Soil Effects and NDVI

When soil gets wet, it gets darker. When it gets darker, it reflects less red light and less near-infrared light. Recall the denominator of the NDVI formula is the total near-infrared and red light reflected by the area in question. Therefore changing soil color will change the denominator of the NDVI equation leading to an altered NDVI value not because of vegetation changes but because of soil color variation. This is highly undesirable of course.

A practitioner using NDVI to study areas where there is sparse vegetation has to be alert to this effect and control for it. Aware of this limitation, they may In fact decide to adopt a different index, such as the Soil Adjusted Vegetation Index (SAVI).

Vegetation Classification and NDVI

NDVI at or near 1.0 tells you the ground is essentially covered by vegetation, but it doesn’t tell you what exactly is there. Consider an old growth rainforest and a golf course. The former is one of the highest density biomasses on Earth. The latter most certainly is not. If their densities were on a spectrum, these two terrains would be at opposite ends.

Rain forests and golf courses are indistinguishable from NDVI’s vantage point. This illustrates an important limitation of NDVI (and many other indexes too): they cannot tell you much about the nature of the vegetation they detect; they see chlorophyll and plant matter. As such, all vegetation basically looks the same. For this reason, NDVI is completely inappropriate for classification.

NDVI Saturation

If you start with a barren field (NDVI=0) and move across either time or space in the direction of more vegetation, NDVI will increase. NDVI will approach 1.0 when the field, observed from above, “looks green” everywhere.

However, as you move from barren field towards dense vegetation, you will reach a point where NDVI simply cannot capture additional increases in biomass. This phenomenon is called saturation; it happens because, at a certain quantity/quality of vegetation, almost 100% of red light is absorbed by the leaves. More leaves can certainly appear, but it wont result in significantly more red light being absorbed meaning (mathematically) NDVI can’t get much bigger.

Let’s reconcile saturation with the NDVI formula:

\[NDVI = \frac{NIR - RED}{NIR + RED}\]

As vegetation increases, NIR goes up and RED goes down.  NDVI will increase almost linearly relative to NIR-RED. (It is not quite linear because the denominator will change slightly because NIR and RED don’t change at exactly the same rate.)

At some point, (before vegetation density is actually at its maximum), RED gets close to zero and stops increasing in proportion to vegetation density. NIR will still increase, but that will increase the numerator and denominator by equal amounts, meaning NDVI will not increase much even as NIR does.  Mathematically, we have RED close to zero and essentially constant relative to NIR:

\[NDVI = \frac{NIR - k}{NIR + k}\]

This of course is a function that approaches 1.0 asymptotically:

When analyzing densely vegetated areas, practitioners must be cognizant of saturation. There are other indices like SAVI, EVI and EVI2 that are less prone to this limitation.

NDVI and Atmospheric Interference

When light travels through any medium other than a vacuum, it will be refracted in some way. When this phenomenon occurs in the Earth’s atmosphere, it’s called Rayleigh scattering. The amount of Rayleigh scattering that occurs is a function of atmospheric conditions such as clouds, aerosols and water vapor. To complicate matters further, the RED band is more susceptible to Rayleigh scattering than the near-infrared band. Thus the same sensor, measuring the same location at two different times will likely record different levels of RED and NIR radiation even if ground conditions are unchanged.

Atmospheric aerosols, for example, will reflect substantial amounts of red light that sensors on satellites will register. That red light might otherwise have been absorbed by vegetation had there not been aerosol interference. The net result is a potentially overstated measure of red reflectance and hence an understatement of vegetation levels.

Unchecked, Rayleigh scattering would render NDVI almost useless. Thus, sensor readings have to be continuously corrected for atmospheric conditions. Atmospheric correction relies on additional sensing mechanisms on satellites and on optical modeling. Models have to take measurable atmospheric conditions as an input and estimate optical impact (e.g. refraction) as an output. A properly calibrated model will thus yield the expected optical effect of current atmospheric conditions enabling the adjustment of raw sensor outputs accordingly.

Both NASA and ESA are at the scientific forefront of this problem. Atmospheric correction mechanisms applied to data from Sentinel-2 and from Landsat are extremely effective. The quality of atmospheric correction on commercial satellites can be inferior to what NASA and ESA offer; practitioners should be alert to this.

Conclusion

Ratio-based vegetation indices were conceived in the 1960s, first appearing in academic literature in a 1969 paper on multispectral recognition written by Frank Kriegler and several of his colleagues. NDVI was first applied to real world vegetation monitoring problems in 1974 by a team of researchers from Texas A&M University who used it to measure biomass in the American Great Plains.

In 1985, the power and effectiveness of the metric became abundantly clear after it was successfully used by Compton Tucker, John Townsend and Thomas Goff to classify vegetation for the entire African continent. Over the 40 years between then and now, NDVI has had a revolutionary impact on commercial agriculture because of its ability to effectively delineate vegetation and vegetative stress.

NDVI is easy to understand, derive and utilize. Because it offers a robust intuitive measure of vegetation health, it is readily used industrially for tracking seasonal changes and long term structural trends.

Further Reading: