GIS Meetup and Educational Seminar Summary #10

Presented at: Esri Geodesign Summit 2021

Presented on: 9 February 2021

Presenter:

  • Carsten Lange. Professor Economics, Cal Poly, Pomona

Predict Urban Growth Patterns Using Machine Learning video

External link

Overview

This presentation covers the application of the random forest machine learning algorithm to a real-world problem.

Carsten Lange presents the results of his collaboration with Witold Fraczek, Esri Applications Prototype Lab, and Jian Lange, Esri Spatial Analysis Product Manager. He highlights the benefits and pitfalls of a multidisciplinary collaboration using machine learning implemented across two different software platforms: ArcGIS Pro and R.

The goal of the collaboration was to predict areas of the tri-city region of North Carolina that changed from an undeveloped classification in 2001, to an urban classification in 2016. The study covered a rectangular area of 80 miles by 50 miles. This area was divided into a 30-meter by 30-meter raster grid resulting in 11 million raster cells.

Building the Model

Building the model involved four steps:

  1. Categorizing all raster cells as either developed or undeveloped for both 2001 and 2016.

  2. Comparing the classifications for 2001 to 2016 and identifying cells that were:

  3. Undeveloped areas in 2001 that remained undeveloped in 2016.

  4. Undeveloped areas in 2001 that that became developed in 2016.

  5. Collecting data to explain urbanization.

  6. Applying the ML random forest algorithm using R to predict which areas changed from undeveloped to developed.

Data Source

The data for this study was obtained from the National Land Cover Database (NLCDB). In the NLCDB data each cell falls into one of twenty categories. For this study, these were recategorized into one of two classes, either developed or undeveloped.

Change in Classification

ArcGIS was then used to examine which raster cells had changed classification. The dataset is very unbalanced.

The findings were as follows:

Developed 2016 Undeveloped 2016
Developed 2001 1,720,225 2,378
Undeveloped 2001 10,635
(Became Urban=Yes)
8,918,415
(Became Urban=No)

Additional Data

Next, additional data was collected for each of the approximately 9 million raster cells which could potentially explain urbanization. This data was displayed as layers in ArcGIS Pro using the Spatial Analyst Extension. The following data was collected:

  • Drive time to major city

  • Predicted population growth

  • Distance to closest highway

  • Distance to closest road

  • Distance to closest airport

  • Distance to closest flood zone

  • Distance to closest protected area

R Bridge

This data was then aggregated into a single layer and pushed to R using the "R bridge" which created a dynamic link between ArcGIS Pro and R. It only takes five lines of code in R-Studio to import the data from ArcGIS. Next, the data was split in to two sets: 85% training data and 15% testing data to be used to validate the model. A random forest model with 500 decision trees was trained using the training data. At first glance the predictions made by this mode look impressive with the correct category predicted for 99.8% of cells. However, these results look too good to be true. A confusion matrix gives more insight into how the model performed.

Actual no Actual yes
Predicted no 1,310,690 1,594
Predicted yes 0 0

Decision Tree

The decision tree simply predicted that no cells changed from undeveloped to urban. The model achieved this high precision because only an exceedingly small proportion of the cells changed in reality. Because the training dataset was so unbalanced this model is not useful. This outcome highlights some problems with all decision tree models which are:

  • Illustrative but do not have strong predictive power

  • Sensitive to changes in data and hyperparameters

Random Forest

The random forest algorithm uses the average of many decision trees to improve the predictive power. To improve the performance of the random forest algorithm, the Synthetic Minority Oversampling Technique (SMOTE), i.e., up- and down-sampling, was used to balance the data. This procedure eliminates random records from the majority class and generates new records in the minority class like the existing records using a KNN (k-nearest neighbors) algorithm. Using this approach, the random forest model made these predictions from the training data. The accuracy dropped to 95%.

Actual no Actual yes
Predicted no 1,245,362 173
Predicted yes 65,328 1,421

Prediction

All the data was the used to predict the probability of a cell changing from undeveloped to urban. The data was processed in in R then the predictions were sent back to ArcGIS Pro using the “R-bridge.” This allowed the results to be viewed in in ArcGIS Pro as a raster layer of color-coded predictions. The predictions made by this model have subsequently been confirmed with recent satellite imagery; some of the areas predicted to have a high likelihood urbanization in 2016 already have houses visible in 2020 satellite images.

Reaction

For me this was a great illustration of a real-world use of machine learning to address an issue I care about. I am interested in the preservation of open space and can see how a model like this could help planners prioritize which undeveloped areas are in most urgent need of protection. This was a straightforward application of machine learning and I feel like I could follow all the steps in both ArcGIS and R. The talk also highlighted some of the pitfalls of a naïve application of machine learning.

Utilizing ArcGIS Urban, Dashboards and Hub for Future Land Use Change Scenarios

Presented on: 9 February 2021

Presented at: ESRI Geodesign Summit 2021

Presenters:

  • Jennifer Immich, GIS Analyst II, City of Boulder
  • Kalani Pahoa, Planning and Development Services, City of Boulder

Utilizing ArcGIS Urban, Dashboards and Hub video

External link

Overview

This short presentation covers the use of ArcGIS Urban, Operation Dashboard, and Hub by the city of Boulder to monitor, communicate, and assess future land use change scenarios. The goal of this project was to provide better information on land use planning and future land use changes to city staff, decision makers, and wider the community. This is important because land use planning impacts the amount of urban development and open space, jobs, housing, resiliency, environmental stewardship, and transportation. Easy access to this information gives stakeholders and decision makers a better understanding of issues.

Capacity Indicators and Projections

The project created an expanded set of capacity indicators and projections. Integrating Hub with ArcGIS enabled the team to communicate this complex information to stakeholders. ArcGIS was configured with the city's land use and zoning code data. This included 25 land use districts, 44 zoning districts, 24 space use types, and 75 common building types.

In addition, information on various development standards, lot coverage maximums, FAR density calculations, and overall land efficiency numbers was entered into the system. The various space use categories were linked to indicators such as energy use and carbon emissions.

Because the city of Boulder did not have an existing land use GIS database available, parcel data was used to generate this information.

Scenarios

Four scenarios were modeled. Each scenario model provides summary information for total population, housing, and jobs plus a breakdown of space use. The models also calculate the impact of each potential development.

Operations Dashboard

Operations dashboard is used to present a series of views that cover carbon and co2 measures, land use percentages by acreage, and jobs by space use type. In addition, four transportation scenarios were modeled. The data is presented on a community engagement webpage and linked to a survey on the same website. Overall, the project provides easy access to information which was previously difficult or impossible to obtain such as energy consumption, water use, and number of housing units.

Reaction

It was interesting to see some of the details of a project that uses the ArcGIS tools to make city planning information more easily accessible to decision makers and the public. I am particularly interested in the use of such data in the preservation of open space, resiliency planning, and the design of smarter developments which reduce energy consumption and carbon emissions and make our cities more livable and pleasant.

Building Conservation Resiliency with Geodesign

Presented on: 9 February 2021

Presented at: ESRI Geodesign Summit 2021

Presenter:

  • Eliza Gutierrez-Dewar, consultant for the geodesign team and Esri's professional services

Building Conservation Resiliency with Geodesign video

External link

Conservation Planning

Conservation planning is important to protect biodiversity in the face of diminishing and threatened natural resources. The goal of conservation planning is to conserve a certain percentage of land area for biodiversity. But how should we prioritize which areas to protect? This presentation claims that the answer is an iterative and geographically informed and holistic planning process. This process of conservation planning is called geodesign and involves the following steps:

  • Assess current status

  • Assemble and review data

  • Map assets to conserve

  • Assess constraints and threats

  • Set goals and prioritize areas

  • Implement and take actions

ArcGIS GeoPlanner

The ArcGIS GeoPlanner product was demonstrated showing the steps in this process using San Bernardino county as an example. The GeoPlanner tool allows planners to be strategic about where they put protected areas. In addition to land cover class, this process uses the following indicators:

  • Species richness hot spots

  • Intact habitat cores

  • Suitability layer - a weighted raster overlay model consisting of

  • Carbon biomass

  • Bioclimates

  • Ecophysiographic diversity

  • Population

Scenarios

These layers are used to create various scenarios which represent the proposed changes to zoning. These scenarios can then be analyzed from various perspectives including:

  • Constraints and threats

  • Potential habitat loss

Reaction

Because I care deeply about the preservation of open space and biodiversity, I was interested in this presentation. Although GIS provides a tool to analyze the impacts of competing land use demands, it is still up to all of us to develop the political will to protect critical habit from development and other land uses which threaten biodiversity. Easily accessible information on these issues is a necessary step but there is still much more to be done to change short-sighted pro-growth attitudes.


Published

Last Updated

Category

Essay

Tags