Internet and the
World Wide Web

Geospatial Analysis

James S. Aber

Table of Contents
Origin of Internet Technology of Internet
World Wide Web RS/GIS on the Web
Searching the Web Geospatial issues

Origin of Internet

The Internet is a network that connects local or regional computer networks (LAN or RAN). The Internet system has undergone explosive development during the past decade. For the public, business, and society at large, the Internet came to fruition in 1995 (Castells 2001). In that year, there were about 16 million users worldwide (note 1). By 2001, Internet usage had increased to 400 million, and it exceeded one billion by 2005, a 60-fold increase in only a decade, and it continues to expand rapidly today. However, the origins of Internet go back at least five decades.

What became Internet was conceived originally as a U.S. military "fail-safe" network that would continue to function even though it might become severely damaged during an atomic war. It was initially developed by the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense. This first network was based on the revolutionary communications technique called packet switching. In the early 1960s, the concept of packet switching was developed and tested by Leonard Kleinrock at MIT, and the first message was sent between two computers over the ARPANET in 1969 (Norman 2005). In this approach, a digital file is broken down into packets, which are separately routed from the source to the destination, where they are reassembled. The movement of packets is controlled by routers that forward packets through the system.

Ironically, the concept of packet switching was rejected at first by the Pentagon for military purposes and was ignored by telephone companies. In fact, AT&T rejected an offer to privatize the network in 1972 (Castells 2001). ARPANET was promoted and developed initially by computer scientists working at research institutions and universities. It became international in 1973 when connections were established with England and Norway. The goal of many researchers was to create a new kind of computer-based, digital communications network that was built for non-military and non-commercial reasons. ARPANET became operational in 1975 when it was transferred to the Defense Communication Agency. An uneasy alliance between military and research users led to separation of the network into MILNET (military) and APRA-INTERNET (research) in 1983, and National Science Foundation network (NSFNET) was created in 1984.

The number of Internet host computers grew quite slowly at first, reaching only about 10,000 in 1987 (Branscomb 2003). ARPANET had become operationally obsolete by 1990 and was decommissioned. NSF quickly began privatizing the Internet which was completed by 1995. Rapid growth followed, and today there are 100s of millions of Internet host computers around the world.

Technological basis of the Internet

Internet is based on packet switching, a concept developed independently by Kleinrock (MIT), Baran (California), and Davies (United Kingdom) in the 1960s. The packets travel over communication cables and optical fibers, which are operated nowadays mainly by telephone companies. The modern Internet functions on three basic principles inherited from the original ARPANET (Castells 2001).

  1. Decentralized network structure, in which there is no single "headquarters" that controls the whole system.

  2. Distributed computing power throughout many nodes of the network.

  3. Redundancy of control and functions of the network to minimize risk of disruption in service.

LANs and RANs connected to Internet generally use a set of communication protocols called TCP/IP (Transmission Control Protocol/Internet Protocol). TCP was devised in 1973 by computer scientists Cerf, Lelann, and Metcalfe, and IP was added by Cerf, Postel and Crocker in 1978. The preferred computer operating system was UNIX and later Linux and Apache. All these innovations took place in the tradition of the open source movement of the 1970s and '80s, which fostered experimentation and rapid dissemination of ideas (Castells 2001; Norman 2005).

A decision was made early on, in the 1970s, to make the Net "stupid." In other words, the only function of the Internet would be to transfer files; applications, encryption, searching, and other functions would be left to the computers connected to the Internet (Abelson 2008). This allowed great flexibility and freedom to innovate devices and applications that could not be imagined at the time. However, this stupidity also left the Internet vulnerable to unscrupulous users, a problem that has become quite challenging.

Each active device (computer) within a network connected to Internet is assigned a unique numerical IP address. Server addresses usually also have text codes (such as that are more familiar to Internet users. A name server is a computer, like a telephone book, that matches text and numeric codes for identifying Internet sites. Originally developed as a U.S. system, the text codes had endings that marked the nature of a site, such as: edu, com, net, mil, gov, and org.

One of the most difficult problems for globalization of Internet was agreement on an international standard for computer communication. The European Union favored a scheme linking national networks that was not directly compatible with TCP/IP. Eventually the flexibility and openness of TCP/IP prevailed as the common standard for the global Internet. The IP naming scheme now includes country and state codes, along with additional use codes.

Internet provides a large and rapidly increasing variety of resources and services—software, data archives, library catalogs, bulletin boards, directory services, etc. Among the most popular functions of the Internet is electronic mail. Conceived in 1970 by Tomlinson, e-mail is still the most widely used application online. Other major applications are Telnet (remote login), file transfer (FTP), and the World Wide Web.

World Wide Web

The World Wide Web (WWW) is among the most exciting and rapidly growing developments on Internet. The Web may be described as the "universe of network-accessible information, an embodiment of human knowledge." (Hayes 1994) In other words, the Web is the assemblage of all information available anywhere within the Internet system.

The vast dimensions of the WWW are surprisingly easy to navigate by using software called a browser. A browser travels Internet to retrieve a document or image. Within the document, words, phrases, or graphic elements are highlighted. Clicking (with a mouse) on a highlighted feature takes the user to another document, and so on—a hypermedia system. With each hypermedia link, the user moves effortlessly within the Internet system. For the average user, geography disappears and so do network topology and site names. The Web is a seamless, transparent interface to the entire network.

The World Wide Web was invented by Berners-Lee, an Englishman working at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland in 1990. Web function was enhanced dramatically in 1993 by Andreessen and Bina at the U.S. National Center for Supercomputing Applications. They included an advanced graphics capability in Mosaic, the first hypermedia brower to attain widespread use. In 1994, the World Wide Web Consortium was formed to lead development of the web on a global and vendor-neutral basis.

The technological basis for the Web is HyperText Transport Protocol (HTTP), which determines communication between a browser (client) and a Web server computer. At a higher level, HyperText Markup Language (HTML) is the notation for writing documents and creating links on the Web. The locations of files and other resources on remote computers are identified by a Uniform Resource Locator (URL) of this general type.

The user need not understand or be aware of HTTP, HTML, or even URLs. These are built into the infrastructure of the Web and are not normally visible. Client software is the visible window to the Web. The huge popularity of the Web results from availability of browers, which allow the user to move through the Web with text plus graphic, video and audio capability. Netscape became the most popular commercial browser in the 1990s, developed by the same people who created Mosaic. However, Microsoft Internet Explorer has obtained market dominance as the Internet browser most utilized in recent years.

Growth of the Web was phenomenal during the 1990s. According to various estimates, the Web doubled in size every nine to 18 months (Gibbs 1996; Hayes 1997), and its growth continued without pause in first decade of the 21st century. Many scientists and philosophers believe Internet communication represents the beginning of a major revolution in human society—an advance comparable in magnitude to the invention of printing with movable type by Gutenberg in the 15th century.

Development of the Internet is a collective phenomenon, diverse and dynamic in nature, and not controlled by any one company, country, or segment of society. As such, it does represent a fundamental shift in the way people communicate and share information. It is a proven lesson from the history of technology that users are key producers of the technology, by adapting it to their uses and values, and ultimately transforming the technology itself (Castells 2001, p. 28).

RS/GIS on the Web

Remote sensing and GIS-related subjects are well represented on the Web for several reasons. Much RS/GIS data exist in computer-compatible format, and many users are familiar and comfortable with working in a networked, computer-based environment. The subject matter is image rich, which lends itself to what the Web does best—delivering text and graphic imagery. Internet also has the ability to move large files quickly (FTP)—a great advantage for transferring GIS databases from one location to another. And lastly, the Web offers user interaction, so that a distant user can access, manipulate, and display geographic databases from a GIS server computer. Here are some examples of web-interactive GIS.

Searching the Web
for Geospatial Data

The Web continues to grow rapidly, and so does the volume of information. The explosion of information seems daunting for many people, old and young alike, and this is not a new problem. The Roman philosopher Seneca and the seventeenth-century French scholar Adrien Baillet both warned about the rapid increase of information in their days. Those concerns seem trivial in the early twenty-first century. We have nearly instantaneous access to huge libraries and datasets via computers, smartphones, and other devices. However, as Grunwald (2014) emphasized, information is neither knowledge, nor is it wisdom.

The web is filled with "click bait" to distract, divert, and grab your attention. Fortunately Web search engines have become more sophisticated and specialized in organizing and seeking out selected information (Pethokoukis 2003). Billions of searches are executed daily on the World Wide Web for everything from the latest medical research to local restaurant menus.

One of the early well-known Web directories was Yahoo. Originally it was an indexed listing created by human entry of web sites, but this approach has long since been replaced by automated procedures. Search strategies were developed in which web links are analyzed in combination with key words or phrases (Chakrabarti et al. 1999). Based on link analysis webpages can be classified as authorities, to which other webpages refer, or as hubs that link to other webpages. Within the past few years the Google search engine has risen to dominance on the information superhighway. This happened because Google offered two key advantages (Mostafa 2005).

  1. It could handle web crawling on a vast scale. Software crawlers probe and retrieve text from billions of online webpages.
  2. Its indexing and weighting strategy gave superior results. Link analysis is used to refine initial results of text searching.

The success of Google spawned expansion into geospatial data and imagery such as Google Map and Google Earth. These ventures proved so hugely popular that Google has moved into the professional marketplace for geospatial analysis—see Google Earth Pro—and now is incorporated into field geologic mapping and other survey methods (Whitmeyer et al. 2010). Much of Google is in the public domain for anyone to use; however, its core methods and proprietary techniques remain closely guarded trade secrets. Google's decision to pull out of mainland China early in 2010 was, in fact, as much about protecting itself from sophisticated hacking as Chinese censorship—see NYT opinion.

Google has pervaded society to such extent, in fact, that it is changing how we think. In the past, information was distributed in a so-called transactive memory system developed through face-to-face interactions (Wegner and Ward 2013). When a person wanted to find out something, he or she ordinarily went to friends, family, and acquaintances or to professional specialists, such as doctors, lawyers, engineers, and librarians. That model is being replaced rapidly and increasingly by the Internet, which we view as another transactive memory partner. In face-to-face transactions, we may know to take the advice of "uncle Joe" with a grain of salt. However, that caution does not extend to the anonymous nature of the Internet. Quality control and evaluation of sources are just as important online as they are in person.

In general, federal and state governmental sources are most reliable for geospatial information, such as the U.S. Geological Survey, NASA, NOAA, BLM, etc. A good source for all types of geographic information is the USGS Geographic Names Information System—see GNIS. For example, enter Emporia for the "feature name," Kansas, Lyon County, and highlight "populated place" under feature class. Click on "send query," and basic information will be returned about Emporia. Click on "Emporia" again, and links are provided for many other geographic information services.

Geospatial issues

In spite of these advances, much of the Web remains difficult or impossible to search. Many webpages are stored in non-text format, for example GIS map servers and datasets that allow user interactive query and map construction. Much geographic information is given in quite general terms, such as "60 miles southwest of Wichita," which makes actually locating sites nearly impossible. Searching for images or audio files with specific visual or sound content is likewise difficult without text metadata, although a number of promising techniques are under development (Mostafa 2005).

The Internet and World Wide Web have brought enormous resources and possibilities into the hands of billions of people, and it seems like all known information and answers are readily available. But, of course, we do not have all the answers. Nor does anybody have the means to understand the huge and rapidly increasing volume of available geospatial data. There is a danger of losing serendipity, in other words, the accidental discoveries made through seemingly random connections that only human minds can imagine. Seredipity leads to unexpected scientific understanding. As Norwegian geologist Jan Mangerud, one the foremost Quaternary scientists of the late twentieth century, stated in 2001:

Surprising [scientific] conclusions are the most interesting.

The term cyberspace is closely related to the Internet. Originally coined in 1948 by Norbert Wiener of MIT, cyberspace is the collection of interconnected electronic and digital technologies that enable control and communications of all systems underpining modern life (Elazari 2015). Cyberspace includes most commercial, military, and aerospace technologies ranging from medical devices to smart phones to GPS. Every device connected via Internet is part of cyberspace, and the number of networked devices is expected to reach 50 billion by 2020. However, cyberspace is not a public commons (Elazari 2015). It is dispersed among networks that are owned by large multinational companies that built it and run it for profit. The public and individual governments have limited control over its operation or security.

So-called cloud computing has become popular lately. People increasingly access software (apps) and datasets online as well as store their personal information and data online, especially for mobile devices that have limited data storage and computing capabilities. This is another extension of basic Internet technology. Large tech companies encourage users to link their devices and sync their personal information via wireless connections. While this may be quite convenient, it means that commercial enterprise now controls much of a person's private data and information.

The big computer companies are quietly, slowly forcing us to entrust our life's
data to them. That's a scary and dangerous development.
(Pogue 2014)

Related sites


Note 1: In this same year (1995), your instructor began creating webpages and designing online curriculum. The first web-based, distance-learning course was offered by the ESU earth science department in spring semester 1996.

Return to ES 351 schedule, ES 551 syllabus
or ES 771 schedule. © J.S. Aber (2016).