Geography 481 Intro to GIS
Project Two: Mapping Continuous Data


This project introduces the concept of thematic mapping using continuous data read from the attribute database. Unlike the geology map, where you set patterns for discrete categories of data, a map of continuous data requires the use of data ranges. These ranges provide a generalized view of the underlying variable where similar values are collapsed into a small number of categories. When you work with classified data you exchange the detail of the original values for a "big picture" view of the data. How you distill patterns represents a form of analysis that can either illuminate or hide important spatial patterns.

In GIS, the relationship between cartography and analysis is a close one. In many cases, you can analyze data by simplifying or classifying your map into a small number of categories so that spatial patterns become more apparent.  If care is not taken by the GIS analyst/cartographer (that's you!), the resulting maps may not communicate effectively, or worse, may offer a misleading interpretation of the data. This project introduces alternative techniques for classifying continuous data, each of which conveys different information to the map user. In the exercise, you will produce different views of population density data from the U.S. You will see that there is no “correct” view; different approaches are useful for conveying different aspects of the data distribution.

 


Setup


Creating a Map of a Continuous Variable

There are three basic steps to mapping a continuous variable: 1) identify the variable to map, 2) set up the data classification ranges, and 3) assign symbology to the individual classes.

The default classification is based on 'natural breaks' with 5 classes. We'll stay with these for now.

As you assign colors and shading to continuous data, be certain to maintain the intensity gradient from low values to high values.

There are many default settings in ArcMap and most are adequate. However, you may wish to modify these settings. Throughout the exercises in this course you will be instructed to change various settings that might have seemed just fine to you. The purpose is to introduce you to the mind numbing array of options available. The following is one such instance.

The symbols used to display your data have certain properties, two of which are the outline and the fill. These properties can be modified for each category individually or globally, that is, all at once. We are going to try a global change. The default outline color for each category is gray. Let's make them all black.

With the Layer Properties dialog still open:

BUG ALERT: There was a bug in version 10.4 that seems to have made its way to the current version.

So, instead of this:

Do this:

 

The Symbol Selector dialog will look the same if you were only changing one symbol, but in this case, the change will be applied to all symbols in the layer.

When you are finished, close the Properties dialog and view the map. The results are difficult to interpret because you didn't control for the different areas of the individual states. You will correct that shortcoming by creating a map of population density.


Adding a New Field to the Database

When polygons have different sizes, it is usually better to map "density" of a variable rather than "absolute values". But does your database contain population density data? Click on any state with the "Identify" tool (the “i” icon) and note the information in the database. You will see data for area (Sqmiles) and population (Pop2020), but none for population density. However, since density is defined as population divided by area, you can add a new density field to the database, calculate density values, and map the newly created data.

 

The first step is to add a new Popden field to the database.

Now define and add the new field:

In the table, note the new field added to the right of the existing fields. The next operation is to calculate data into the new field.

Note: part of the expression is already provided for you: Popden = You will provide the rest of the expression by clicking on the appropriate fields and operators:

You have now created a formula for populating the populations density field

If the new data are missing or incomplete, try to figure out what you did wrong and rerun the operation. Since you can't "undo" you will have to delete the field.

Note: Some of you with previous GIS experience know that you can also map density by "normalizing" your data. While your map will look the same, creating a new field in the database allows you to search and sort and perform various analyses based in this new field. Sometimes you need this added capability, sometimes you don't.


Creating an Equal Interval Map of Population Density

For your first population density map, you will use an equal interval classification method. The equal interval method provides a view of the data which attempts to preserve the original data ratios. Consider the following nine data values:

4, 5, 6, 14, 15, 16, 44, 45, 46

The difference between a high value like 45 and a low one like 5 (40 units) is four times the difference between a moderately low value like 15 and a small one like 5 (10 units). Now group them into five classes: 0 - 9, 10 - 19, 20 - 29, 30 - 39, 40 - 49. The original values 4, 5, and 6 are now combined into Class One; 14, 15, and 16 are combined into Class Two; 44, 45 and 46 are combined into the Class Five. In effect, the three lowest original values are now all treated as if they have the value one, the three moderately low original values are all treated as if they have the value 2, and the three highest original values are all treated as if they have the value 5. The interval of 4 classes between the highest (class 5) and the lowest (class 1) is still four times the interval of 1 class between the moderately low (class 2) and the lowest (class 1). The moderately low values from the original distribution are still closer to the low end of the new scale than they are to the high end. For this reason, equal interval classifications are viewed as the least "biased" approach to grouping values.

What the reclassification eliminates is our ability to distinguish minor differences among nearby similar values. The original values 4, 5 and 6 are now all "ones"; 14, 15 and 16 are now all "twos", etc. We are willing to give up those minor distinctions in exchange for a clearer view of the "big picture". The original distribution had an equal number of low values (3), moderately low values (3), and high values (3). It had no middle values and no moderately high values. The reclassification clarifies that relationship, permitting us to see patterns in the data distribution that might otherwise have been hidden.

You can create an equal interval map as follows:

Close the Layer Properties dialog and note the resulting map: it appears to be solid yellow. The original distribution of data values must be highly skewed to produce this view. Let's take a look.


Viewing the Distribution of Population Density Values

Just as a "Data View" provides a means of displaying geographic information in ArcMap, a "Table" view provides the mechanism for displaying tabular attribute information. You can see the actual distribution of data values by opening a layer's table and sorting the values from lowest to highest.

Scroll down while noting the density values. The lowest value is approximately 6 while the highest (District of Columbia) is over 10,400. Are the remaining values evenly distributed between those values? Do they fall more toward the low end or the high end? Are they clustered around the middle?

It should be apparent that the data are skewed toward the low end of the data range. There are a large number of small values and one very large value. Because the area of the one very large value is so small, it failed to show up at the scale of the US map (or just barely). Thus the map appeared to be monochrome yellow. The map produced by the equal interval approach does accurately reflect the underlying data distribution. Unfortunately, it also hides important variation within the vast group of values (48 of the 49) at the low end of the equal interval scale.


Creating a Quantile View of the Data

The concept of "Quantiles" provides an alternative method for classifying data. Instead of dividing the data into classes based on equal data intervals, the quantiles approach divides the data into classes based on equal numbers of observations. If you wanted to divide 100 observations into four quantiles, the first class would contain the 25 lowest values, the second class the next 25 lowest values, etc. Quantiles provide a useful view of skewed data; class value ranges tend to be small where data values are clustered and large where data values are spread out. This preserves some of the local variation between adjacent classes, though it also presents an unreliable picture of variation across the entire data range.

To produce a five-class quantile map:

Let's take a look at some of the other items in the Classification dialog window. In the Classification Statistics box in the upper right you can see the number of observations (count), the range (min, max) and other information. The histogram is particularly useful. As you can see, with the exception of Washington, D.C. all of the states are in the lowest category.

Quantile classifications are often used as a first approximation for systems based on increasing (or decreasing) intervals. To produce a more readable map, you can edit the range limits so that class breaks occur at round numbers. This "Manual" classification will change the categories of some states, but may make the map and data ranges easier to interpret.

 


Modifying the Quantile classification using the Manual method

In this section you will manually change the range limits to create easier to read categories. Note, once you do this, your classification scheme will no longer be quantile or equal-interval. It will be, as they say, user-defined.

Highlight the appropriate class cell (Range) in the Layer Properties and enter the following range limits (note: you cannot change the lowest value in the data set. It is easiest to set the upper limit of the lowest range and proceed, in order, to the highest, setting the upper limit each time. The software will automatically set the lower range limit based on the upper limit of the previous category):

While we are at it, we might as well get rid of all those decimal places:

Note that only the labels reported in the legend have been rounded. The actual values dividing the categories are unchanged.

Notice how different the map looks. In the equal interval version, nearly all the states appeared to have relatively low values compared to the extremely high value for the District of Columbia. In the quantile version, the uniqueness of the District of Columbia is sacrificed so that you can better see differences among the remaining 48 states. In this third, manual classification, we have created arbitrary class breaks at neat round intervals.


Using a Logical Filter to Redefine a Layer

One reason why the District of Columbia skewed the data range is that Washington DC is not actually a "state" whereas the other 48 polygons are states. Since the District of Columbia is really not a “state”, it is appropriate to exclude it from further consideration by applying a logical filter to your data. The filter acts like a True - False test: only the observations which evaluate as True are included in subsequent mapping operations. Once the District of Columbia is removed from the analysis, you should get a clearer view of the population densities of the actual "states".

The revised view (minus the District of Columbia) provides a fourth valid picture of “state” densities, one that shows the cluster of higher densities in the Northeast while preserving the density ratios among the 48 states.

There is no "right" or "wrong" classification method. Your goal is to convey accurate, unbiased information about the distribution of data values. The Equal Interval approach maintains the numerical relationships among all the values. But a Quantile approach works well in situations where the effect of a small number of unusual values is to hide important variation among the vast majority of observations. And with either method, you must be certain that you have selected the correct target group of observations. It is up to you to understand your data and to choose an appropriate classification method based on the purpose of your map.


Final Product

On one (1) "printed" page, please display, in a cartographically pleasing manner, both an equal interval map and a quantile map of the 48 conterminous states. This process can be a little confusing at first, so follow along carefully. When you create a map layout, you are adding map elements to a page. One element is the data frame. It turns out you can add as many data frames as you like. Let's try it:

Notice that your map appears blank and a new data frame, called "New Data Frame," has been added to the TOC. Your map appears blank because there aren't any data layers in the data frame... yet.

You can toggle between data frames by making them "active"

The name of the active data frame appears in bold in the TOC. Right-clicking on the data frame name also allows you to change the data frame name to something more descriptive.

Note: You can also add data frames in the Layout View. In many instances I find this easier. You can also copy/paste data frames. I find this technique especially helpful in many circumstances.

 Configure your maps to meet the following criteria:

Switch to layout view by clicking the layout view button, if you haven't already. Notice that you have two data frames on your layout this time. Selecting a data frame in the layout view also activates that data frame in the TOC. When you add legends, you need to have the appropriate data frame active.

Finally, "print" your layout to Adobe PDF and then submit it using the appropriate link on Canvas

Get "excited!"

If you would like "extra" excitement, try mapping the Census data for Fullerton included in the "Extra" directory. <optional>


Last modified 09/07/2021