Historical Crop Patterns in the US Corn Belt
Case Study

Historical Crop Patterns in the US Corn Belt

In this case study, we build a historical profile of crop patterns across the US Corn Belt by leveraging detailed Sentinel-2 derived NDVI.

Introduction

The Corn Belt, a region in the Midwestern United States, is renowned for its high agricultural productivity, particularly in corn production. Spanning across states such as Iowa, Illinois, southern Minnesota, and eastern Nebraska, the Corn Belt plays a crucial role in the US economy. Understanding the agricultural patterns in this region is essential for predicting crop yield and managing resources effectively. One widely-used method for monitoring vegetation health and productivity is the Normalized Difference Vegetation Index (NDVI).

Questions

  1. What are the historical patterns of the growing season in the top 4 Corn Belt states as shown by NDVI data?
  2. When do the growing seasons typically start, peak, and finish?
  3. Can we identify historical anomalies in the NDVI data?

Extracting Locations from United States CDL 

The United States Cropland Data Layer (CDL) is a great resource for extracting information about corn farm locations. To perform our analysis, we first extracted a random sample of 10,000 locations from the GeoTIFF map where corn was grown. 


from rasterio.warp import transform


united_states = rasterio.open('data/us/2022_30m_cdls/2022_30m_cdls.tif')
band = united_states.read(1)


filtered_band = np.where(band == 1) # corn
sample_indices = np.random.choice(len(filtered_band[0]), size=10000, replace=False)


# Get the row and column indices
rows = filtered_band[0][sample_indices]
cols = filtered_band[1][sample_indices]


# Transform the row and column indices to the original CRS (EPSG:5070)
x_coords, y_coords = rasterio.transform.xy(united_states.transform, rows, cols)


# Perform the coordinate transformation from EPSG:5070 to EPSG:4326
longitude, latitude = transform(united_states.crs, 'EPSG:4326', x_coords, y_coords)


data = {'lat': latitude,
       'lon': longitude}


corn_loc_df = pd.DataFrame(data)

Next, we used the First Order Admin feature from Natural Earth Data to filter the original dataset for only the locations in the key Corn Belt states


# Read shapefile
shpfilename = shapereader.natural_earth(resolution='50m', category='cultural', name='admin_1_states_provinces')
reader = shapereader.Reader(shpfilename)
places = reader.records()


# Create a list of state shapes
states_of_interest = ['Iowa', 'Illinois', 'Nebraska', 'Minnesota']
state_shapes = []
for place in places:
   name = place.attributes['name']
   if name in states_of_interest:
       state_shapes.append(place.geometry)


# Convert the DataFrame to a GeoDataFrame
gdf = gpd.GeoDataFrame(corn_loc_df, geometry=gpd.points_from_xy(corn_loc_df['lon'], corn_loc_df['lat']), crs='EPSG:4326')


# Perform spatial join with state shapes
joined_gdf = gpd.sjoin(gdf, states_gdf, how='inner', op='within')


# Drop the index_right column and convert the filtered GeoDataFrame back to a DataFrame
filtered_df = pd.DataFrame(joined_gdf.drop(columns=['geometry', 'index_right']))


# Prepare points for query
points = np.vstack([filtered_df.lat.values, filtered_df.lon.values]).T.tolist()

These locations, represented by latitude and longitude coordinates, served as the basis for our subsequent data query.

Corn Farm Locations

Next we did a sanity check by plotting the locations to be used in the query:


# create a dictionary of colors for each state
state_colors = {
   'Iowa': 'blue',
   'Nebraska': 'green',
   'Minnesota': 'red',
   'Illinois': 'orange'
}


# create figure
fig = plt.figure(figsize=(8,5), facecolor='none', dpi=96)
ax = fig.add_subplot(111, projection=ccrs.Mercator())
ax.set_extent([-107.23, -84.38, 35.57, 50.46], ccrs.PlateCarree()) # Set extent for corn belt states


# plot the corn farm locations for each state with a different color
for state, color in state_colors.items():
   state_df = filtered_df[filtered_df['state'] == state]
   ax.scatter(state_df['lon'], state_df['lat'], s=5, transform=ccrs.Geodetic(), zorder=2, edgecolor='#ffffff', linewidth=0.25, color=color, label=state)


# add a title and legend
plt.title('Corn Farm Locations Across the Top Corn Belt States')
plt.legend()

Querying Data from Streambatch

We utilized the Streambatch API to obtain NDVI (Normalized Difference Vegetation Index) data from Sentinel-2 satellite imagery. NDVI is a key indicator of plant health and growth, making it a valuable metric for analyzing the corn growing season.


ndvi_request = {
   'variable': ['ndvi.sentinel2'],
   'space': points,
   'time': {
       'start':'2018-03-01',
       'end':'2022-12-31',
   }
}


# Make the request to the API
response = requests.post('streambatch.io/async', json=ndvi_request, headers=api_header)


# Parse the response to get the access URL and query ID for the result data
result_data = json.loads(response.content)
access_url = result_data.get('access_url')

The query retrieves NDVI values over a specified time range (`start` and `end`) for the given locations (`space`). We obtained NDVI values for the years 2018 to 2022.


# Read the data from the access URL into a pandas DataFrame
df = pd.read_parquet(access_url, storage_options={"anon": True})

Analysis

For the analysis, we filtered the dataset to only include data for the months April to November by adding 'year', 'month', and 'day' columns to the dataframe.


# Add 'year', 'month', and 'day' columns to the dataframe
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month
df['week'] = df['time'].dt.isocalendar().week
df['day'] = df['time'].dt.day


# Filter data for April 1st to November 30th
growing_season_df = df[(df['month'] >= 4) & (df['month'] <= 11)]

Next, we grouped the data by year and month, then computed the mean NDVI for each group. We also created a pivot table from the growing_season_monthly_mean.


# Group data by year and month, then compute the mean NDVI for each group
growing_season_monthly_mean = growing_season_df.groupby(['year', 'month']).mean().reset_index()

# Create a pivot table from the growing_season_monthly_mean
year_month_pivot = growing_season_monthly_mean.pivot_table(index='month', columns='year', values='ndvi.sentinel2')

We plotted a single heatmap of the monthly average NDVI for all corn farms across the top 4 corn belt states:


# Group data by year and month, then compute the mean NDVI for each group
growing_season_monthly_mean = growing_season_df.groupby(['year', 'month']).mean().reset_index()

# Create a pivot table from the growing_season_monthly_mean
year_month_pivot = growing_season_monthly_mean.pivot_table(index='month', columns='year', values='ndvi.sentinel2')

NDVI values typically begin to increase starting in June, although 2019 was an overall exception with a much later start to the growing season. Indeed, 2019 was a particularly bad year for US corn, with production at its lowest since 2015. Peak NDVI values range between 0.78-0.81, with peaks typically occurring in either July or August. Similar to 2019, 2022 was also associated with decreased corn production in the US, although it may be necessary to visualize the data at a higher temporal resolution (e.g. weekly) to observe any anomalies that may be present.

Exploring Weekly Average NDVI for the Corn Belt States Individually

To further understand the variations in the corn growing seasons and identify any unique patterns or differences between the top corn-producing states, we decided to take a deeper dive by analyzing the data at both the state level and a weekly temporal resolution. By separating the data by state, we can better understand the specific growing conditions and trends in each of the top corn-producing states. 

To achieve this, we added a 'State' column to the dataframe to note which state each corn farm location was in. 


# Convert the growing_season_df to a GeoDataFrame
growing_season_gdf = gpd.GeoDataFrame(growing_season_df, geometry=gpd.points_from_xy(growing_season_df['lon'], growing_season_df['lat']), crs='EPSG:4326')

# Perform spatial join with state shapes
joined_growing_season_gdf = gpd.sjoin(growing_season_gdf, states_gdf, how='inner', op='within')

# Drop the index_right column and convert the joined GeoDataFrame back to a DataFrame
growing_season_df_with_state = pd.DataFrame(joined_growing_season_gdf.drop(columns=['geometry', 'index_right']))

Then, we plotted the weekly average NDVI during the growing season for each state individually.


# Aggregate the data by state, year, and week
state_year_week_df = growing_season_df_with_state.groupby(['state', 'year', 'week'])['ndvi.sentinel2'].mean().reset_index()

# Create a heatmap for each state
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(15, 8))
for i, state in enumerate(states_of_interest):
   state_data = state_year_week_df[state_year_week_df['state'] == state]


   # Create a pivot table with 'year' as index, 'week' as columns, and 'ndvi.sentinel2' as values
   state_pivot = state_data.pivot_table(index='week', columns='year', values='ndvi.sentinel2')


   # Create a heatmap
   heatmap = sns.heatmap(state_pivot, cmap=sns.cubehelix_palette(start=2, rot=0, dark=0, light=.95, reverse=False, as_cmap=True), linewidths=.5, vmin=0, vmax=1, ax=axes[i])
   heatmap.set_yticklabels(heatmap.get_yticklabels(), rotation=0)
   heatmap.xaxis.tick_top()
   heatmap.xaxis.set_label_position('top')


   axes[i].set_title(f'Average NDVI for {state} Corn', pad=30)
   axes[i].set_xlabel('')
   axes[i].set_ylabel('Week of the Year')
   axes[i].tick_params(axis='both', which='both', length=0)

To better understand the timing changes in NDVI values quantitatively, we identified the peak NDVI value and noted the week in which it occurred for each state and year.


# Find the week with the highest NDVI value for each state and year
max_ndvi_week = state_year_week_df.groupby(['state', 'year'])['ndvi.sentinel2'].idxmax()


# Get the corresponding rows with the highest NDVI values
max_ndvi_data = state_year_week_df.loc[max_ndvi_week].reset_index(drop=True)


# Round the NDVI values to 2 decimal places
max_ndvi_data['ndvi.sentinel2'] = max_ndvi_data['ndvi.sentinel2'].round(2)


# Record max timing by state
max_timing_by_state_year = max_ndvi_data[['state', 'year', 'week']]

Next, we plotted the timing of the peak value for each state, for each year, to visualize the variations in NDVI timing over time.


# Plot the timing of peak NDVI for each state over time
plt.figure(figsize=(12, 8))
for state in max_timing_by_state_year['state'].unique():
   state_data = max_timing_by_state_year[max_timing_by_state_year['state'] == state]
   plt.plot(state_data['year'], state_data['week'], marker='o', linestyle='-', label=state)
plt.xlabel('Year')
plt.ylabel('Week Number of Peak NDVI')
plt.title('Timing of Peak NDVI for Each State Across the Corn Belt Over Time')
plt.legend(loc='upper right')
plt.grid(True, linewidth=0.3)
plt.show()

Results:

  • The 2019 growing season was delayed across all of the top four Corn Belt states, (sometimes by up to 7 weeks in the case of Illinois), likely impacting overall crop production. 
  • The 2022 growing season experienced an anomalous event during the last week in August (34th week) that was picked up across Iowa, Illinois, and Minnesota. August 2022 was already the hottest ever recorded for North America, and the anomalous NDVI decreases observed are likely a result of the intense heatwaves that took a toll on crops across the US during the last week of August 2022
  • Among the top four Corn Belt states, Nebraska consistently had lower overall NDVI across all years included in the analysis, suggesting that the growing conditions in Nebraska may be less favorable than those in the other top corn-producing states.

Conclusions:  

In conclusion, analyzing NDVI data provides valuable insights into the corn growing season, allowing us to identify trends and anomalies in the crop production process. By examining this data on a state-by-state basis, we can better understand regional variations in crop health and growth patterns, which can inform decision-making in agricultural practices and policies.

Next steps:

  • Determine how NDVI can be used as a predictor of crop yield to better understand the relationship between NDVI and production.
  • Explore the factors contributing to the anomaly in 2022, potentially through weather data or other relevant datasets.
  • Expand the analysis to include additional states and a longer time frame to identify broader trends in corn production across the United States.
  • Incorporate other satellite-derived datasets, such as precipitation or soil moisture data, to provide a more comprehensive understanding of the factors influencing crop health and productivity.

Related Posts

Ready?

The data is.

Don't spend valuable resources doing research and image processing. We've done that for you.

Get the power of daily, high resolution NDVI data with the industry's best API. And then get to work on actual analysis.

Start Now