vignettes/gat_results_eval.Rmd
gat_results_eval.Rmd
The Geographic Aggregation Tool (GAT) generates and saves several files in the save folder defined by the user. Generated files include:
Log: The log includes all of your settings and a brief list of variables created by GAT, including a data dictionary for the GATflag variable, which corresponds with the highlighted areas displayed in the plots.
PDF of plots: The plots may also indicate potential issues in your merge, depending on the selections you made. If relevant, the plots will highlight the following geographic areas that are:
In addition, the compactness ratio plot will display a lighter color for areas with low compactness ratios.
Crosswalk shapefile and associated DBF: The crosswalk shapefile, which ends with “*in.shp”, is identical to your input file with a few variables added to the table by GAT. These variables are noted in the log. Note the variable GATflag, which identifies any areas excluded by the aggregation process.
Aggregated shapefile and associated DBF: The aggregated shapefile is your final product. The variable GATid links the crosswalk and aggregated shapefiles. Again, note the variable GATflag.
Optionally, aggregated KML file: This file will only be written if you request it and is not used in assessment.
If you notice highlighted areas in places you do not expect, there
are a few things you can do. First, review your settings. Note
especially aggregation minimum and maximum values, exclusion criteria,
and boundary settings. All basic settings are written to the log. In
addition, the advanced setting adjacent = TRUE
in the
function options for runGATprogram()
may cause unexpected
behavior. These will be covered in the next few subsections.
These areas meet one or more of the exclusion criteria you provided. They will be highlighted in blue-grey on the plots and labeled “1” in the variable GATflag in both shapefiles. Check that the areas you excluded are the areas you expected to exclude. Refer to the log or to the subtitles in the choropleth plots to verify how you defined your exclusion criteria.
GAT has no mechanism to include areas that meet some, but not all, exclusion criteria. For more complex exclusion rules, you should create a numeric Boolean flag (e.g. 0 = true/include, 1 = false/exclude) prior to running your shapefile through GAT. Then use that Boolean flag as your exclusion variable.
These areas had a value higher than your maximum value for one or both of your aggregation variables. They will be highlighted in magenta on the plots and labeled “5” in the variable GATflag in both shapefiles. Check that the areas highlighted are those you expected to have high values.
These areas may have aggregated at least once. However, they did not aggregate enough to meet your minimum aggregation value for one or both aggregation variables. They will be highlighted in cyan on the plots and labeled “10” in the variable GATflag in the aggregated shapefile. In general, any areas with GATflag = 10 are problematic. Some of the situations that cause variables to be flagged are listed below, followed by ways to address these situations.
An area fails to aggregate to the minimum value because it has no
neighbors eligible for aggregation. This is especially common if you use
the adjacent = TRUE
setting because it forces areas to
aggregate only to areas that it physically touches.
If you used adjacent = TRUE
, which is the default
setting in runGATprogram()
, these situations can all cause
areas to fail to aggregate to the minimum value. These situations hold
true both within enforced boundaries (when an area cannot merge to areas
outside its boundary at all) and when boundaries are not used.
If you manually set adjacent = FALSE
, #4 above may still
occur. In addition, a fifth situation is possible.
Inspect the areas that were flagged and consider which issues above apply to your data. Review which settings are essential and which could be modified, including the merge method. The solutions you choose depend largely on your available shapefiles and your ultimate goal.
Most of these issues could be addressed in one of four ways. Whether any or all of these solutions are appropriate depends on what you are trying to do. You may want to try a few of them before settling on a final solution.
adjacent = TRUE
.adjacent = FALSE
. This is not recommended in most cases,
but it may be appropriate for your data and purpose.These areas may include long, narrow shapes or windy shapes that look like spiders or snakes. They can be identified using the compactness ratio map. In this map, areas with a low compactness ratio (longer or stringier, or sometimes donut loops) will be a lighter color than areas with a high compactness ratio (rounder or more square). For a description of compactness ratio, review the final section in the Technical Notes.
Compactness ratio may be important for your project. If so,
aggregating to geographic centroids using the function option
adjacent = TRUE
is the most likely merge method to result
in highly compact areas.
This two-part process is optional. Your goal here is to create the final crosswalk file and address any flagged areas. If your aggregation has two steps (see example below), the first part of post-processing is to create a master crosswalk that contains both the original and the final area IDs.
Example:
You wish to aggregate census tracts within each county to a specific population. However, for counties with populations below your minimum, you wish to aggregate those counties to other counties within each state.
This would require you to run the aggregation twice. First, you would process your shapefile with the boundary set to county. Then you would process the aggregated shapefile from the previous step with the boundary set to state. Afterward, you would run the included function
combineGATcrosswalk()
. This function combines the crosswalk tables from the first and second aggregations and assigns the combined table to the original crosswalk shapefile.
For the second part of this process, run project-specific code to handle any areas that were still flagged after selecting the final aggregation settings. This code may include redefining GATid for flagged areas in the original crosswalk and reaggregating all areas based on the revised GATid variable to create final crosswalk and aggregation shapefiles.