Previous | Table of Contents | Next |
John P. Slone
The World Wide Web gives users almost instant access to hyperlinked documents, images, and graphics from sites around the globe. The key technologies used to define, deploy, and retrieve Web-based information are discussed in this chapter.
The World Wide Web has been described as the worlds most successful client/server application. The WWW, also known to its supporters as the Web, is a tool for retrieving network-accessible information residing on servers all over the world.
The WWW can best be exploited when it is accessed through LAN-attached workstations and when it is used to access information held by LAN-attached servers. Similarly, Web technology can be used to enhance the usefulness of a local area network, whether or not the LAN is connected to the Internet.
This chapter gives an overview of the World Wide Web, including its fundamental concepts, the protocols and specifications that make Web hyperlinks possible, and the software requirements for accessing Web-related information.
The wealth of information available on the WWW is far greater than what can be contained within a single computer, which creates the need to link information from different sources. That linkage mechanism is known as hypermedia. Hypermedia is an extension of the simpler concept known as hypertext, or linked text.
Exhibit 1. Examples of Text Linkages.
Linked text is nothing new. Text linkages have existed for centuries in such written forms as footnotes, bibliographies, and tables of contents. Hypertext is simply a mechanism for providing these links in a computerized environment.
A familiar example to most end users is the set of Help screens provided with Windows applications. When a user selects Help, a list of topics is presented. The user then selects a topic by clicking on a word or phrase. Within the resulting text, certain words or phrases are highlighted, indicating that more information related to the highlighted text is available for further exploration. Hyperlinks let a user jump to different documents by following the highlighted entries that lead to other Web sites when the user clicks on them. Exhibit 1 illustrates how text linkages flow between documents.
In the WWW, links may point to information objects such as pictures, movies, sound clips, or practically anything else that can be represented digitally. Similarly, links may be established between nontext objects and other objects. For example, it is not uncommon to find links from speaker icons to sound clips, or from small pictures to large pictures.
Hyperlinks in WWW documents often extend around the globe. A</n>single document may contain links to documents in several different countries. It is, in fact, the expansion of these links from document to document on a global scale that enhances the popularity of the Web.
The original designers of the Web recognized that for a global hypermedia system to be effective, it had to be possible for users to execute hypertext links in as little as 100 milliseconds. The World Wide Web accomplishes global Hypermedia linking through the HTTP that is used for information sharing. In its simplest form, an HTTP transaction consists of four phases: connection, request, response, and close.
TCP/IP is the most widely used protocol set for internetwork communications. On a TCP/IP network, a TCP/IP connection is established between the client application and the server. A connection request is sent from the client to the server, followed by the return of a connection acknowledgment.
Upon receipt of the connection acknowledgment, the client issues a request. The request, which is an ASCII string, consists of the wordGET followed by a space and the address of the requested document. The convention used for the document address is called a URL. The response is then returned by the server in the form of an HTML document.
When the server has returned the entire response, it closes the process by breaking the TCP/IP connection. Optionally, the client may abort the transfer before this point by breaking the connection, in which case the server will not indicate that an error has occurred. This protocol exchange is illustrated as a time-flow diagram in Exhibit 2.
Exhibit 2. HTTP Time-Flow Diagram.
Web clients are considered well-behaved if they read the response as rapidly as possible without requiring human intervention. Error messages are sent as text in HTML syntax and can only be distinguished from normal messages by their human-readable content. There are no provisions within HTTP for error control or recovery, nor are there any provisions for flow control.
These two facets of the protocol, lack of error control and lack of flow control, allow the protocol to operate efficiently and without the need for either entity to maintain state information. Furthermore, each HTTP transaction is an independent action, unrelated to any transactions that occurred before it or that will occur subsequently.
Extensions to HTTP continue to be developed. Each extension is to be backward compatible with basic HTTP. For example, the capability to perform a search or to allow fill-in forms is accomplished by embedding the variables within the address field.
Exhibit 3. Universal Resource Identifier (URI) Reserved Characters | |
---|---|
Reserved Character | Meaning |
Percent sign (%) | Used as an escape sequence identifier. The two characters that follow must represent the hexadecimal value of the character represented by the sequence. |
Slash (/) | Used to identify hierarchical relationships among the path components. Significance is left to right, with entities to the left representing elements closer to the root. This allows the construction of partial URIs representing relative paths. |
Hash mark (#) | Used to separate the URI of an object from a fragment identifier that represents a specific portion of that object. |
Question mark(?) | Used to separate the URI of a object that can be queried (e.g., a database) from the elements that make up a query applied to the object. The complete URI stands for the object that is returned as a result of the query. |
Asterisk (*) and exclamation point (!) | Reserved for definition as reserved characters within specific schemes. |
The WWW uses an addressing structure known as a URL, an instantiation of the generic concept of a URI. URI globally identify resources that exist within the scope of any registered name or address space, now or in the future.
Despite its impressive capability, the URI concept is remarkably simple. For any given name or address that exists within a naming or addressing scheme, a URI can be created by a simple encapsulation method that tags the address with the schemes registered identifier. The structure of the resulting URI is as follows: prefix:path.
The prefix is an arbitrary character string registered as the identifier of the particular naming or addressing scheme. The colon is simply a delimiter, and the path follows the convention of the name or address scheme identified by the prefix.
To ensure effective operation across diverse systems and globally interconnected networks, URI have been defined more precisely to identify reserved characters and escape sequences for handling reserved characters. Reserved characters are identified in Exhibit 3.
Any characters that normally appear within the name or address of an object must be encoded as escape sequences within a URI. In addition, certain characters are considered unsafe characters, and it is recommended that they be encoded to ensure consistent operation across a wide range of systems and networks. Unsafe characters include white-space characters, control characters, and language-specific characters, among others. Once all reserved characters and unsafe characters have been properly encoded and the string has been tagged with the appropriate scheme identifier prefix, the resulting string is referred to as the canonical form of a URI.
Previous | Table of Contents | Next |