GAT Results Evaluation

GAT outputs

The Geographic Aggregation Tool (GAT) generates and saves several files in the save folder defined by the user. Generated files include:

Log: The log includes all of your settings and a brief list of variables created by GAT, including a data dictionary for the GATflag variable, which corresponds with the highlighted areas displayed in the plots.

PDF of plots: The plots may also indicate potential issues in your merge, depending on the selections you made. If relevant, the plots will highlight the following geographic areas that are:

Excluded by the user
Below the minimum aggregation value
Above the maximum aggregation value

In addition, the compactness ratio plot will display a lighter color for areas with low compactness ratios.

Crosswalk shapefile and associated DBF: The crosswalk shapefile, which ends with “*in.shp”, is identical to your input file with a few variables added to the table by GAT. These variables are noted in the log. Note the variable GATflag, which identifies any areas excluded by the aggregation process.

Aggregated shapefile and associated DBF: The aggregated shapefile is your final product. The variable GATid links the crosswalk and aggregated shapefiles. Again, note the variable GATflag.

Optionally, aggregated KML file: This file will only be written if you request it and is not used in assessment.

Figuring out why areas were flagged

If you notice highlighted areas in places you do not expect, there are a few things you can do. First, review your settings. Note especially aggregation minimum and maximum values, exclusion criteria, and boundary settings. All basic settings are written to the log. In addition, the advanced setting adjacent = TRUE in the function options for runGATprogram() may cause unexpected behavior. These will be covered in the next few subsections.

Areas excluded by the user

These areas meet one or more of the exclusion criteria you provided. They will be highlighted in blue-grey on the plots and labeled “1” in the variable GATflag in both shapefiles. Check that the areas you excluded are the areas you expected to exclude. Refer to the log or to the subtitles in the choropleth plots to verify how you defined your exclusion criteria.

GAT has no mechanism to include areas that meet some, but not all, exclusion criteria. For more complex exclusion rules, you should create a numeric Boolean flag (e.g. 0 = true/include, 1 = false/exclude) prior to running your shapefile through GAT. Then use that Boolean flag as your exclusion variable.

Areas above maximum aggregation value

These areas had a value higher than your maximum value for one or both of your aggregation variables. They will be highlighted in magenta on the plots and labeled “5” in the variable GATflag in both shapefiles. Check that the areas highlighted are those you expected to have high values.

Areas below minimum aggregation value

These areas may have aggregated at least once. However, they did not aggregate enough to meet your minimum aggregation value for one or both aggregation variables. They will be highlighted in cyan on the plots and labeled “10” in the variable GATflag in the aggregated shapefile. In general, any areas with GATflag = 10 are problematic. Some of the situations that cause variables to be flagged are listed below, followed by ways to address these situations.

Causes

An area fails to aggregate to the minimum value because it has no neighbors eligible for aggregation. This is especially common if you use the adjacent = TRUE setting because it forces areas to aggregate only to areas that it physically touches.

If you used adjacent = TRUE, which is the default setting in runGATprogram(), these situations can all cause areas to fail to aggregate to the minimum value. These situations hold true both within enforced boundaries (when an area cannot merge to areas outside its boundary at all) and when boundaries are not used.

The area is an island. It is not physically touching any other areas.
The area is isolated by excluded areas. For example, the area is the tip of a peninsula, but you excluded the peninsula’s base or merging with the peninsula’s base would cause the resulting area to surpass the maximum value for one or both aggregation variables.
The area is at the edge of the boundary and neighbors are not eligible. The area may be unable to merge to neighbors because they are excluded or because merging with them would cause the resulting area to surpass the maximum value for one or both aggregation variables.
The largest possible area is smaller than the minimum aggregation value. This occurs most commonly when using enforced boundaries, when all areas within the boundary merge into a single area, but the total for that area is still smaller than the minimum aggregation value for one or both aggregation variables.

If you manually set adjacent = FALSE, #4 above may still occur. In addition, a fifth situation is possible.

All neighbors within the boundary either were excluded or have high enough values for one or both aggregation variables that merging with them would cause one or both aggregation values in the resulting area to exceed the maximum aggregation values.

Solutions

Inspect the areas that were flagged and consider which issues above apply to your data. Review which settings are essential and which could be modified, including the merge method. The solutions you choose depend largely on your available shapefiles and your ultimate goal.

Most of these issues could be addressed in one of four ways. Whether any or all of these solutions are appropriate depends on what you are trying to do. You may want to try a few of them before settling on a final solution.

Use a shapefile with contiguous areas. Shapefiles that include cutouts for rivers, lakes, and shorelines are more likely to cause problems with aggregation, especially when using adjacent = TRUE.
If you do not need contiguous aggregated areas, use adjacent = FALSE. This is not recommended in most cases, but it may be appropriate for your data and purpose.
Relax boundary restrictions. If you checked the box to enforce boundaries, but your task does not absolutely require that you remain within boundaries, try running the aggregation without checking that box. If you defined a boundary, GAT will still try to merge within that boundary before looking for neighbors outside it.
Increase or remove the restriction for maximum aggregation value. Consider the highest acceptable value or try running the aggregation without defining maximum aggregation values. After aggregation is complete, check the log for the ranges of values for each aggregation variable before and after aggregating.

Areas with low compactness ratio

These areas may include long, narrow shapes or windy shapes that look like spiders or snakes. They can be identified using the compactness ratio map. In this map, areas with a low compactness ratio (longer or stringier, or sometimes donut loops) will be a lighter color than areas with a high compactness ratio (rounder or more square). For a description of compactness ratio, review the final section in the Technical Notes.

Compactness ratio may be important for your project. If so, aggregating to geographic centroids using the function option adjacent = TRUE is the most likely merge method to result in highly compact areas.

Post-processing

This two-part process is optional. Your goal here is to create the final crosswalk file and address any flagged areas. If your aggregation has two steps (see example below), the first part of post-processing is to create a master crosswalk that contains both the original and the final area IDs.

Example:

You wish to aggregate census tracts within each county to a specific population. However, for counties with populations below your minimum, you wish to aggregate those counties to other counties within each state.

This would require you to run the aggregation twice. First, you would process your shapefile with the boundary set to county. Then you would process the aggregated shapefile from the previous step with the boundary set to state. Afterward, you would run the included function combineGATcrosswalk(). This function combines the crosswalk tables from the first and second aggregations and assigns the combined table to the original crosswalk shapefile.

For the second part of this process, run project-specific code to handle any areas that were still flagged after selecting the final aggregation settings. This code may include redefining GATid for flagged areas in the original crosswalk and reaggregating all areas based on the revised GATid variable to create final crosswalk and aggregation shapefiles.

Abigail Stamm

2023-03-15