Supervised and Unsupervised Land Use Classification

Supervised and Unsupervised Land Use Classification

Abstract

In order to make use of the multitude of digital data available from satellite imagery, it must be processed in a manner that is suitable for the end user. For many projects this processing includes categorizing the land into its various use functions. There are two main categories that can be used to achieve this outcome and they are called Supervised and Unsupervised Classification techniques. In supervised land classification, the individual processing the imagery guides the image processing software to help it decide how to classify certain features. This is done by the use of a vector layer containing training polygons. In unsupervised classification, the software does most of the processing on its own generally resulting in more use categories than the user is interested in. This is the point where the user has to make decisions on which categories can be grouped together into a single land use category. In either case additional image processing may be used to help determine which method is better for a given situation. It must be kept in mind that maps are simple attempts to represent what actually exists in the world and are never completely accurate.

Contents

Introduction

The image used in this project was extracted from a Landsat 5 Thematic Mapper (TM) scene taken over northeast Kansas on May 23, 1994. Ground resolution for this image is 30 meters. Landsat TM records data in seven different bandwidths. These bandwidths are broken down into portions of the visible, infrared, and thermal infrared regions of the electromagnetic spectrum. From these various bandwidths, a great deal of information about the land cover can be displayed and analyzed.

Landsat 5 TM Band Descriptions
(Jensen 2000)
BandWavelength (µm)Spectral Region
10.45 - 0.52Visible Blue
20.52 - 0.60Visible Green
30.63 - 0.69Visible Red
40.76 - 0.90Reflective Infrared
51.55 - 1.75Mid-Infrared
610.40 - 12.50Thermal Infrared
72.08 - 2.35Mid-Infrared




This composition is a window from the Landsat image. This scene is centered on Perry Reservoir located in Jefferson county, northeastern Kansas and is shown in natural color.




This composition is from the same Landsat scene shown using bands 3, 4, and 5 (displayed in BGR respectively). In this composition vegetation appears in bright green while some of the agricultural fields, still with a good portion of bare soil, are displayed in a pink color. Also, there is a considerable amount of grassland in this scene that is displayed with a combination of the green and pink colors.



By examining the second composition, the land use patterns can be seen quite well. A good portion of the bare soil is located along the Kansas River in the bottom part of the image. There is also the presence of tilled fields located throughout the image with larger concentrations in the northwest corner. From the enlarged image numerous small lakes and ponds can be identified (dark blue). Most of these small ponds can be seen in the grassland areas. These ponds are most likely located in pastures and used as watering holes by cattle. Other important land features include the forested areas as well as the dam and outflow of Perry Reservoir. The forest sprawls outward from the reservoir and follows small drainages, while the man made dam lies at the south end of Perry Reservoir and releases water into the Kansas River. In order to get this image into a more usable format several attempts were made to classify the land uses into separate categories.

Supervised Classification

Top

The first attempt was made to classify the various land uses in Idrisi GIS and image processing software using supervised classification techniques. In supervised classification, spectral signatures are developed from specified locations in the image. These specified locations are given the generic name 'training sites' and are defined by the user. Generally a vector layer is digitized over the raster scene. The vector layer consists of various polygons overlaying different land use types. The image below shows the raster image seen earlier with the addition of several training sites outlined on top of it. The training sites will help Idrisi develop spectral signatures for the outlined areas. The land use categories of interest in this example are water, agriculture, grassland, and forest. Multiple polygons are created for each land use category to help ensure that Idrisi has sufficient information to create the spectral signatures.





Once the training sites are developed Idrisi can use this information, along with the various images of different bandwidths, to create spectral signatures from the specified areas. These signatures will then be used to classify all pixels in the scene. Landsat TM bands used were bands 1-5 and 7. There are two basic groups of classifiers that are used in Idrisi's supervised classification module. They are called hard classifiers and soft classifiers. Hard classifiers assign each pixel in the scene a discrete value or category based on the training sites in the vector layer. For example, if four different land use types were identified with the training sites, each pixel in the scene will take on the value of one of those four land uses. If soft classifiers are used, probabilities may be listed as to which category each pixel may or may not belong to. In other words, definitive decisions about land categories are not made.


This image was created using supervised classification and the various training sites that were outlined in the previous image. The hard classifier called MAXLIKE was used to re-classify each pixel. MAXLIKE assigns each pixel in the image to the class that it has the maximum likelihood of belonging to. Other hard classifiers may use statistics such as pixel location and proximity of other features to help make classification decisions.




This image shows a scatterplot of the area using TM bands 4 and 3. Signatures created from the supervised classification are outlined on this image.





The AREA module in Idrisi was then used to view the statistics concerning each land cover group. This information will later be compared to data generated from the unsupervised classification example.


Area of each Land Use category as assigned using MAXLIKE classifier in Idrisi.
CategoryHectaresLand Use
17225Water
252377Grassland
323583Bare Soil
441891Forest
-125076Total



Zooming in on the dam and outlet area of Perry Reservoir, it is evident that the classification technique used has some error in classifying. This image identifies areas on the dam of the reservoir as agricultural areas. This area is most likely made up of riprap or other non-vegetative materials and fall in the same category as the bare soil. This is a common problem and is most likely due to the similar reflectance properties held by non-vegetative surfaces. The spectral reflectance properties of these surfaces, such as rock and concrete, were classified in the same category as bare soil.




Unsupervised Classification

Top

The second attempt made to classify the various land uses in Idrisi was done using unsupervised classification techniques. Unsupervised classification techniques do not require the user to specify any information about the features contained in the images. This example was conducted using the ISOCLUST module in Idrisi. With ISOCLUST, the user simply identifies which bands Idrisi should use to create the classifications, and how many classes to categorize the land cover features into. Again Landsat TM bands 1-5 and 7 were used. The resulting image is seen below.





At this point, the image is difficult to interpret. Decisions need to be made concerning which land cover types each category falls within. To make these decisions, other materials and knowledge of the area are useful. Ground truthing what is seen in the digital image with what was actually present at the time the image was recorded makes this task more efficient and more accurate. If this knowledge is not available, scientific reasoning may be used to group the various categories together into land use categories. Six land cover types were identified from the original 16 categories and are shown in the image below.





Area of each Land Use category as assigned using ISOCLUST classifier in Idrisi.
CategoryHectaresLand Use
16167Water
26410Aquatic Vegetation
310946Urban/Edge
432976Grassland
529926Bare Soil
638651Forest
-125076Total



The image below shows a close-up of the dam and outflow area of Perry Reservoir. Again it can be seen that areas classified as Agricultural fall in areas where agriculture is not likely present. Similar examples can be found throughout the image.






Combined Classification Methods

Top

The final approach involed utilizing several different band ratios and other techniques of supervised and unsupervised classification. The steps used and how they were conducted are outlined below.

  1. Isolate water and remove from image.
  2. Isolate bare soil and remove from image.
  3. Classify remaining vegetation in image with water and soil removed.

Step 1. Using the overlay image of bands 7 and 3, water was determined to have the smallest of ratios in the image. This was verified by viewing the histogram of the image and sampling numerous points on the image. This image was reclassified removing water from it.

Step 2. Using the overlay image of bands 7 and 3, bare soil was determined to have the greater ratios in the image. This was verified by viewing the histogram of the image and sampling numerous points on the image. This image was reclassified removing bare soil from it.

Step 3. With water and soil identified, the remaining areas of the image were overlayed with a band 4/3 image. This image (band 4/3 ratio) was chosen since at this point vegetation was the focus of the classification efforts. This image contained vegetation only. Using the ISOCLUST module in Idrisi, 6 clusters were formed and grouped into 2 new groups. These groups were identified as forest and other vegetation.

Continuous sampling of pixel values and comparison to other images aided in this process. The results of this final classification attempt are included in the table below.

Summary

Top

Looking at the data generated from the two classification attempts side by side, it can easily be seen that differences in land use areas are present. Furthermore, the data generated with the unsupervised technique contains additional land use classes. The reason that additional classes are present in the unsupervised method is due to the many classes created (16) and the fact that they did not fall into one of the classes used in the first example.

Land Classification Area Comparison.
Land UseSupervised (ha)Unsupervised (ha)Combined Methods (ha)
Water722561675667
Aquatic Vegetation-6410-
Urban/Edge-10946-
Other Vegetation523773297665938
Bare Soil235832992626295
Forest418913865125175
Total125076125076123075



By simply visually analyzing the images created using the different classification techniques, the method of unsupervised classification appears to be the better choice in this example. Human error digitizing, lack of knowledge of study area, and other factors all contribute to inaccurate results in the supervised classification method. In any case, the resulting images are useful for some applications such as generating estimates on relative presence of water bodies, agricultural land use, and forested areas. If more accurate results are desired, additional processing to tease out specific land use patterns may be possible by detailed examination of the image and data. This technique requires more work and may not produce results that better represent what is actually present in the field. When using any classification technique, it is best to use additional references of the study area rather than only the satellite imagery. Without comparing these images to maps, aerial photographs, and actual visits to the study area, features actually present cannot be determined. The use of USGS Digital Line Graphs (DLG) (line map data in digital form) would be helpful in isolating out features such as asphalt and concrete. DLG hydrological maps contain information on transportation, flowing water, standing water, and wetlands further easing the job of classification. Also available from the USGS are Multi-resolutional Land Classification (MRLC) maps. MRLC data are derived from Landsat 7 TM data. Landsat 7 TM has several advantages over pervious Landsat satellites including better resolution and an additional thermal band. These maps are available at reasonable price and already have land use classified into 21 different land use classes.

References