Previous Table of Contents Next


URL Path Structure

A number of formal syntax and semantics specifications have been defined for URL, and more types continue to be defined. Types defined for IPs include HTTP, Gopher, and FTP.

Most URL for objects found on the Internet share a common structure. As with any URI, the structure consists of a scheme identifier prefix followed by a path. The path is subdivided such that the first part represents information specific to the IP.

The Internet protocol part is identified by a leading double slash (//) and terminates at the next slash (/). This part of the path contains at least one element and may contain up to four.

The element that is required is the host’s Internet domain name or IP address. (Users will find that most people use the domain name.) A second element that is fairly common in practice is the host’s port number. If it is not present, the default port for the protocol specified by the scheme identifier is assumed. For example, port 80 is assumed for HTTP. If the port number is present, it follows the host name, separated by a colon.

The other two elements, which are used much less frequently, are the user name and password. If present, these precede the host name and are separated from the host name by the commercial “at” sign (@). If both elements are present, the user name comes first, and the two are separated by a colon.

The remainder of the path structure uses a hierarchical scheme, with elements (presumably directories or some logical equivalent) separated by slashes. An example of a URL path is as follows:

http://www.ncsa.uiuc.edu/SDG/Software/WinMosaic/HomePage.html

The individual components of the URL are read from left to right. First, note that the scheme identifier, which terminates at the first colon, indicates that HTTP is the protocol that will be used to access the object in question. The part that appears between the double slash and the next occurrence of a slash tells the user that the host on the Internet is known by the name of www.ncsa.uiuc.edu. Because this is an HTTP URL, user names and passwords are not supported, thus it would be meaningless (and perhaps error-producing) to include them. Because no port is identified, the default port of 80 will be used. As such, it would have been equally valid for the URL to be:

http://www.ncsa.uiuc.edu:80/SDG/Software/WinMosaic/HomePage.html

In this case, though, it would be superfluous to include the port. Ports are typically specified when the server is listening to a nondefault port, as is commonly the case when, for example, it is desirable to operate a server on a nonprivileged port (above 1024).

The remainder of the URL breaks down simply as a hierarchical file structure. Within the root context of the server is a directory called SDG, which in turn has a directory called Software, which has a directory called WinMosaic. Under that directory is a file called HomePage.html, which is the object of interest.

Other Internet URL types (e.g., Gopher and FTP) have similar structures up to the end of the IP part. An example of a Who is gateway that operates on a Gopher server at MIT is as follows:

gopher://sipb.mit.edu:70/1B%3aInternet%20whois%20servers

The part of the path following the single slash is radically different from what is seen in an HTTP URL. This allows the specification of a path understandable by a Gopher server and serves to illustrate the flexibility of the URL concept.

HYPERTEXT MARKUP LANGUAGE

Documents returned in HTTP responses are formatted in the HTML syntax. HTML is a proper subset of the more familiar SGML. Documents written in HTML are simple text files with special tags that specify formatting directions.

Tags in HTML are delineated by a pair of angle brackets (<>). Some tags appear in opening/closing pairs. In this case, the closing tag usually has a slash immediately following the left bracket. Exhibit 4 shows tags commonly found in Web documents.

HTML is specified in three versions, two of which have been finalized. Version 1 is the basic, minimal subset required to be understood by all Web clients. It provides basic document formatting, hypermedia linking, and embedded images.

Version 2 also supports inline forms, allowing fill-in forms capabilities including text input fields, scroll-bar option selections, radio buttons, checkboxes, and other basic form element types. Version 2 HTML is the level most commonly used on the World Wide Web.

Exhibit 4. Commonly Used HTML Tags
Tag Type Tag Identifier Usage Example

Title Title Specifies the title that appears at the top of a client screen. <title>This is a Document Title</title>
Header size n Hn Indicates a header line appearing within the text area of the client screen. The n is an inverse relative size control; smaller numbers indicate larger size fonts. <h2>Size 2 Header</h2>
Paragraph P Indicates a new paragraph. Usually prints a blank line. ...end of paragraph<p>Start of next paragraph...
Italics I Indicates utilized print. <i>utilized print</i> normal print
Bold B Indicates bold print. <b>bold print</b>normal print
Anchor reference a href Indicates hyperlink point of reference. <ahref = “http.//your.machine.com/directory/
structure/filename.html”>highlighted phrase</a>
Anchor name a name Names a document fragment. Other references identify this fragment with a hash mark (#) URL structure. <a aname-“z50”>First line of identified fragment.</a>
Image Img Identifies an embedded image. <img src = “[URL of image file]”>

Version 3 includes more sophisticated features such as tables, figures, text that wraps around images, and mathematical equations. It is not yet finalized.

WWW clients and servers support other protocols in addition to HTTP, including Gopher, FTP, Telnet, and SMTP.

WEB SOFTWARE COMPONENTS

Client Software

Access to the World Wide Web depends on the use of a client application called a browser. The widespread popularity of the Web was brought about largely through the development and no-cost distribution of a browser called Mosaic, developed at the NCSA at the University of Illinois.

Mosaic, which to many people has become synonymous with the Web, is available in versions for Windows, Macintosh, or X-windows platforms. It is available by anonymous FTP from NCSA (ftp.ncsa.uiuc.edu) and numerous other sites.

Besides Mosaic, there are approximately 30 different browser applications available for a variety of platforms, including line-mode terminal browsers and E-mail-based browsers. Most of these browsers are free, although several commercial browser products are gaining popularity because of enhanced features and value-added services, such as help-desk support. Among the more popular browsers are Netscape, from Netscape Communications Corp., and Enhanced Mosaic, from Spyglass, Inc.

Servers

Considering that the WWW is a client/server application, the availability of no-cost servers is a critical factor in the Web’s popularity. The most widely used of the free servers are those provided by NCSA and by EPPL, the ELPP in Geneva, Switzerland. The NCSA server is available for UNIX machines; the CERN server runs on UNIX or VMS.

As with browsers, many servers are available, both as freeware and commercially, and operate on a variety of platforms including Windows and Macintosh. Although it is designed around the client/server paradigm, the World Wide Web is envisioned by many of its proponents as being a peer-to-peer network. Users of the Web, in this vision, are both information providers and consumers.


Previous Table of Contents Next

Copyright © CRC Press LLC