vignettes/gat_tutorial.Rmd
gat_tutorial.Rmd
This vignette is designed for a user with no experience in R who just wants to run the default version of the Geographic Aggregation Tool (GAT). This manual breaks down the different parts of GAT and describes what to do and what to expect using 2010 Census shapefiles for Hamilton and Fulton Counties in New York State, which are embedded in the package. These shapefiles will be used in this manual.
Note: For users who wish to modify GAT, developer options have been written into several of the package functions. Check the functions’ help pages (linked from function names) and the Technical Notes for more information. Users interested only in running GAT can ignore these links.
This tutorial will walk you through the following activities. You will start with a shapefile of towns in two New York State counties. You will aggregate these towns to a minimum population of 5000 based on population weighted centroids, which will be calculated using census block files.
The results will be displayed in the log and as a series of maps in a PDF. The boundaries of the new areas will be saved as an ESRI shapefile and, if desired, as a Google Earth KML file so that you can import them to your preferred GIS application.
You will also learn the following:
Please note that R is case sensitive. The provided code is designed to be copy-pasted, but if you prefer to type it yourself, be aware of capitalization.
To follow this vignette, you can use the embedded shapefiles in this package. The filepath after the command below points to the folder in your package install that contains these shapefiles. When prompted to select a shapefile, navigate to this filepath.
paste0(tools::getVignetteInfo("gatpkg", all = TRUE)[1, "Dir"], "/extdata/")
Alternatively, you can use your file manager’s search feature to locate the file “hftown.shp”.
To run GAT, you only need one line of code:
The steps below will walk you through the dialogs created by GAT and describe the types of input expected using the embedded shapefiles.
GAT starts by creating initial variables and displaying a progress bar that will continually update as it runs. GAT’s processes occur in this order:
This is the interactive portion of GAT. The steps below will walk you through what to expect from GAT. If any portion of this section is cancelled, GAT will quit.
GAT starts with this progress bar, which will update as GAT proceeds:
Do not close the progress bar before GAT finishes running. If you do, GAT will crash with this error:
warning(‘Error in structure(.External(.C_dotTclObjv, objv), class = “tclObj”) :
[tcl] bad window path name “.113”.’)
This step requests the shapefile from the user using a pop-up dialog. This progress bar will be displayed:
In this dialog, you can navigate to any shapefile on your computer or network. To follow the steps as shown, navigate to the “extdata” folder in your installation of “gatpkg”, which you can reach via your filepath in Running GAT. Select the “hftown” shapefile.
Select “hftown.shp” and click Open
to move to the next
step or click Cancel
to end GAT.
The function locateGATshapefile() checks that:
If any of these checks fail, you will receive an error message and will be asked to select a new shapefile. The “open file” dialog may not look exactly as shown; sometimes it includes a folder list on the left.
Learn more about the example aggregation shapefile, hftown.
This step requests a unique identifier variable from the user using a pop-up dialog. This ID will be overwritten as areas are aggregated and a crosswalk from these identifiers to the ones produced by GAT will be generated at the end.
In this dialog, select the variable you would like to use from the drop-down list. To follow the steps as shown, select “ID”.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting a
shapefile), Cancel
to end the program, or Help
to get further guidance.
The function identifyGATid() lists only character variables with unique values from the shapefile’s DBF. If the identifier you are looking for does not show up, it may either be numeric or contain at least one non-unique value. If you wish to use a numeric identifier, you will need to convert it to character before running GAT.
This step requests the boundary variable from the user using a pop-up dialog. GAT will aggregate within this boundary first, if possible.
In this dialog, select the variable you would like to use from the drop-down list. If you would like to require that GAT enforces boundaries, click the checkbox as shown in the image. If you do not want to use a boundary, select “NONE” from the drop-down menu. If you select “NONE”, the program will ignore whether you checked the box.
To follow the steps as shown, select “COUNTY” and check the box.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting the
identifier), Cancel
to end the program, or
Help
to get further guidance.
The function identifyGATboundary() lists only character variables with at least one non-unique value from the shapefile’s DBF. If the boundary you are looking for does not show up, it may either be numeric or contain no non-unique values.
This step requests the aggregation variables and their desired minimum and maximum values from the user using a pop-up dialog.
In this dialog, select the variables you would like to aggregate from the drop-down lists. Enter your desired minimum and maximum values for each aggregated area in the text boxes. You may want to include minimum values to increase the likelihood of stable rates and a maximum value to either exclude very populous areas or to ensure most areas have roughly the same population size.
To follow the steps as shown, select “TOTAL_POP”. Change the minimum value to “6,000”. Change the maximum value to “15,000”. Leave all options on the second row as “none”.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting the
boundary variable), Cancel
to end the program, or
Help
to get further guidance.
The function identifyGATaggregators() lists only numeric variables from the shapefile’s DBF. The value boxes can support numbers, commas, and decimals. The value box also allows negative values, but the use of negative values has not been tested in GAT.
If you enter any other characters into the value boxes, you will get an error and a new dialog will appear asking you to re-enter the value. If you enter “none” in the value boxes, that triggers default values, which are the minimum value of the selected variable for “minimum value” and the sum of values for the selected variable for “maximum value”.
This step requests up to three exclusion criteria from the user using a pop-up dialog.
In this dialog, select each variable you would like to use to define exclusions from the drop-down lists. For each variable selected, choose a condition and enter a numeric value. Areas meeting any of these criteria will be removed from the aggregation, but retained in the shapefile. If you do not want to use an exclusion criterion, select “NONE” from the drop-down menu. If you select “NONE”, the program will ignore the corresponding condition and value.
To follow the steps as shown, select “MY_FLAG”. Leave the condition as “equals” and change the value to “1”. This will exclude one area to illustrate how excluded areas are displayed on the generated maps and indicated in the generated DBF and log files. Leave the other two criteria as “NONE”.
Click Next >
to continue, < Back
to
move to the previous step (selecting the aggregation variables),
Cancel
to end the program, or Help
to get
further guidance.
You will get the following confirmation dialog.
Click Yes
to move to the next step,
< Back
to move to the previous step (selecting the merge
method), Repeat
to return to the exclusion criteria, or
Help
to get further guidance.
The function inputGATexclusions() lists only numeric variables from the shapefile’s DBF and the option “NONE”. The value dialog can support numbers, commas, and decimals. The value box also allows negative values, but the use of negative values has not been tested in GAT.
You will get an error message and a request to reselect criteria if:
This step requests the merge method from the user using a pop-up dialog.
In this dialog, select the method you would like to use to aggregate areas. If you select any option other than “closest area by population-weighted centroid”, geographic centroid will be used. Drop-downs are ignored for options that are not selected.
To follow the steps as shown, select “closest area” and choose “population-weighted” from the drop-down.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting the
exclusion criteria), Cancel
to end the program, or
Help
to get further guidance.
The function inputGATmerge() lists only numeric variables from the shapefile’s DBF for the ratio drop-downs. The ratio denominator (second list) lists only numeric variables without zero or missing values after removing exclusion criteria.
You will get an error message and a request to reselect criteria if:
For more information about the different merge options, including advanced options and how to trigger them, see Selecting the most eligible neighbor in the Technical Notes.
This step requests the population shapefile from the user using a pop-up dialog. If you select an option other than “closest population-weighted centroid” in the previous dialog, this step is skipped.
In this dialog, you can navigate to any shapefile on your computer or network.
To follow the steps as shown, navigate to the “extdata” folder in your installation of “gatpkg”, if it does not open automatically. Select the “hfblockgrp” shapefile.
Click “Open” to move to the next step or click “Cancel” to go back to the merge selection dialog.
If there are multiple numeric variables, you will get a dialog requesting that you select one. If there is only one suitable variable, you will get this message:
Click “Yes” to continue or “No” to return to the previous step (selecting the merge method).
You can select the same file for both aggregating and population weighting. GAT treats them as two separate objects, so you will still have to identify the population variable for the new object.
The shapefiles’ areas do not need to line up (for example, census tracts within counties) because the population weighting intersects the population object with the aggregation object and assigns population to intersected areas based on the population proportion of each area that falls inside each area to aggregate.
If a population file does not cover the full extent of the areas to aggregate, geographic centroids will be used for areas that cannot be assigned population-weighted coentroids. You will not get a warning if this occurs, so please check your shapefiles before reading them into GAT.
Learn more about the example population shapefile, hfpop.
This step requests the rate calculation information from the user using a pop-up dialog.
In this dialog, select the numerator, denominator, and map color scheme from the drop-downs. Enter a rate name and multiplier value. If you check “Do NOT calculate a rate” at the top, all other options are ignored.
To follow the steps as shown, select “TOTAL_POP” for the numerator, “AREALAND” for the denominator, and “Greens” for the map color scheme. Name your rate variable “pop_dens” (short for “population density”) and leave the multiplier as “10,000”.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting merge
settings), Cancel
to end the program, or Help
to get further guidance.
This step is optional. To skip rate calculation, check the box beside “Click here if you do NOT want to calculate a rate.”
The function inputGATrate() lists only numeric variables from the shapefile’s DBF for the numerator and denominator drop-downs. The denominator lists all numeric variables, but can be modified to list only numeric variables without zero or missing values, after removing exclusion criteria.
The rate name must:
All characters that are not letters, underscores, or numbers will be removed from the rate name. Do not use the rate name “no_rate”; it is reserved to designate when the rate will not be calculated.
The multiplier box can support numbers, commas, and decimals. If you enter any other characters into the multiplier box, you will get an error and a new dialog will appear asking you to re-enter the multiplier. To calculate a ratio, set the multiplier to 1.
This step asks whether the user wants to save a KML file using a pop-up dialog.
In this dialog, select “Yes” or “No”.
To follow the steps as shown, select “No”.
Click Next >
to move to the next step,
< Back
to move to the previous step (selecting the rate
settings), Cancel
to end the program, or Help
to get further guidance.
The function saveGATkml() requests a simple “Yes” or “No”. While a KML file is optional, a shapefile is saved by default.
This step asks the user where to save the shapefile using a pop-up dialog.
In this dialog, you can navigate to any folder on your computer or network.
To follow the steps as shown, navigate to your documents folder. Enter “hftown_agg6k15k_popwt”.
Click “Save” to move to the next step or click “Cancel” to end GAT.
The function saveGATfiles() checks if the name you provide already exists. If so, it asks if you want to overwrite the existing files. All files created will be saved here, with this filename (including log, plots, and KML, if requested).
This step requests confirmation of all settings from the user using a pop-up dialog.
In this dialog, you can select a step to modify or move on. To modify steps, select the first step you want to modify from top to bottom. After each modification, you will return to this dialog to select the next step.
If you choose to move on, select an item from the drop-down menu and
click Confirm
. To cancel, click Cancel GAT
.
Click Help
to get further guidance. To follow the steps as
shown, select “None” and click Confirm
.
The function confirmGATbystep() requests a
simple Yes
or No
. To end GAT from here, click
the “x” in the upper right corner.
At this point, if the user has not cancelled the program, the shapefile processing begins.
The first processing step is to read in the shapefile. In this step, R creates a simple features object, reads its projection, and calculates the geographic centroid for each polygon. Then GAT creates a flag variable to apply exclusion criteria to relevant polygons based on the exclusions and maximum values that you selected.
This is the second processing step and probably the slowest step in the entire program, especially if you select population weighting. It opens a new progress bar specifically for the aggregation loop. This progress bar updates with the number of areas left to aggregate as each aggregation completes, then closes when the loop finishes.
This step calls the function defineGATmerge() to create the aggregation key or crosswalk. It creates a subset of polygons for which their value for the aggregation variable(s) is below your minimum desired aggregation value(s). It reorders the subset from largest to smallest aggregation value. Starting with the largest value, it determines acceptable neighbors for aggregation based on your settings.
This is the third processing step, which aggregates areas as defined by the crosswalk created in Step 13. If you requested rates, they are calculated in this step.
This step calls the function mergeGATareas() to perform the aggregation. It also cleans up row names and other information that may not have been properly retained during the aggregation.
This step creates a new compactness ratio variable that is added to the aggregated shapefile’s dataset.
This step runs the function calculateGATcompactness().
This section contains the maps that are created by GAT. The final PDF will contain between four and seven maps, depending on your selections. Given all seven maps:
Note: R package documentation allows you to view images, but not link to them. For all maps, view larger images by right clicking on the maps and selecting, “Open image in new tab.”
Two choropleth maps are produced in this step. The two maps use the same color scale and range, which should make comparisons easy.
This step calls the function plotGATmaps() to draw the map.
If you selected a second aggregation variable, two choropleth maps are produced in this step. One map shows the distribution of the second aggregation variable before aggregating and the other shows the distribution after aggregating.
This step calls the function plotGATmaps() to draw the map.
In this map, the aggregated area map is overlaid on top of the original map so you can see how smaller areas combined into larger ones.
This step calls the function plotGATcompare() to draw the map.
This map displays a choropleth scale of the compactness ratios for the aggregated areas.
This step calls the function plotGATmaps() to draw the map. For information about how to interpret the compactness ratio, see Compactness Ratio in the Technical Notes.
This map displays a choropleth scale of whatever rate or ratio you decided to calculate. It also includes the rate function and summary statistics.
This step calls the function plotGATmaps() to draw the map.
This section contains the file saving steps.
In this section, all maps are saved to a PDF file, with one map per page, in the same folder as the aggregated shapefile. The same save filename is used with “plots” at the end.
An Rdata file is also saved. This file includes all settings in R
format, which can be opened in R or run through GAT a second time. If
you want to use the Rdata file to re-run GAT, use this code and change
myfilepath
to the location and name of your Rdata file:
myfilepath <- "C:/users/ajs/mygatfilesettings.Rdata"
gatpkg::runGATprogram(settings = myfilepath)
File saved to the save folder if you followed the vignette:
The purpose of saving this shapefile is to provide a key/crosswalk (GATid) identifying which aggregated area each original area fell inside when aggregated. The GATflag variable flags areas that were excluded from aggregation or generated warnings in the log:
The original shapefile is saved with the following changes:
Files saved to the save folder if you followed the vignette:
In this section, the aggregated shapefile is saved. It includes all of the variables in the original shapefile with these changes:
Additional variables created as part of the aggregation process and included in the shapefile’s database:
Files saved to the save folder if you followed the vignette:
In this step, nothing happens if you are following the vignette and selected no when asked if you wanted to save a KML file. If you had selected yes, a KML file would be created in this step that could then be opened in Google Earth.
Files saved to the save folder, if “Yes” was selected in Step 9:
In this step, a text file is saved that includes all of your settings, basic information on data distributions for the aggregation variable(s), and warnings regarding possible issues with the aggregation.
Files saved to the save folder if you followed the vignette:
All files are saved to the save folder you selected. The log file contains a partial list of files created. The R console will display a full list of all files created and their location when the aggregation is complete, which will look something like this:
The following files have been written to the folder
C:/Users/ajs11/Documents/GAT:
hftown_agg6k15k_popwt.dbf
hftown_agg6k15k_popwt.prj
hftown_agg6k15k_popwt.shp
hftown_agg6k15k_popwt.shx
hftown_agg6k15k_popwtin.dbf
hftown_agg6k15k_popwtin.prj
hftown_agg6k15k_popwtin.shp
hftown_agg6k15k_popwtin.shx
hftown_agg6k15k_popwtplots.pdf
hftown_agg6k15k_popwt.log
hftown_agg6k15k_popwtsettings.RdataSee the log file for more details.
load("C:/filepath/settings.Rdata")
.