Previous | Table of Contents | Next |
Nathan J. Muller
Because there are many more links than host computers, there are more opportunities for failure on the network than in the hosts themselves. Consequently, a disaster recovery plan that takes into account such backup methods as the use of hot sites or cold sites without giving due consideration to link-restoration methods ignores a significant area of potential problems.
Fortunately, corporations can use several methods to protect their data networks against downtime and data loss. These methods differ mostly in cost and efficiency.
A reliable network continues operations despite the failure of a critical element. The critical elements are different for each network topology.
With respect to link failures, the star topology is highly reliable. Although the loss of a link prevents communications between the hub and the affected node, all other nodes continue to operate as before unless the hub suffers a malfunction.
The hub is the weak link in the star topology; the reliability of the network depends on the reliability of the central hub. To ensure a high degree of reliability, the hub has redundant subsystems at critical points: the control logic, backplane, and power supply. The hubs management system can enhance the fault tolerance of these redundant subsystems by monitoring their operation and reporting anomalies. Monitoring the power supply, for example, may include hotspot detection and fan operation to identify trouble before it disrupts hub operation. Upon the failure of the main power supply, the redundant unit switches over automatically or manually under the network managers control without disrupting the network.
The flexibility of the hub architecture lends itself to variable degrees of fault tolerance, depending on the criticality of the applications. For example, workstations running noncritical applications may share a link to the same local area network (LAN) module at the hub. Although this configuration might seem economical, it is disadvantageous in that a failure in the LAN module puts all the workstations on that link out of commission.
A slightly higher degree of fault tolerance may be achieved by distributing the workstations among two LAN modules and links. That way, the failure of one module would affect only half the number of workstations. A one-to-one correspondence of workstations to modules offers an even greater level of fault tolerance, because the failure of one module affects only the workstation connected to it; however, this configuration is also a more expensive solution.
A critical application may demand the highest level of fault tolerance. This can be achieved by connecting the workstation to two LAN modules at the hub with separate links. The ultimate in fault tolerance can be achieved by connecting one of those links to a different hub. In this arrangement, a transceiver is used to split the links from the applications host computer, enabling each link to connect with a different module in the hub or to a different hub. All of these levels of fault tolerance are summarized in Exhibit 1.
In its pure form, the ring topology offers poor reliability to both node and link failures. The ring uses link segments to connect adjacent nodes. Each node is actively involved in the transmissions of other nodes through token passing. The token is received by each node and passed on to the adjacent node. The loss of a link not only results in the loss of a node but brings down the entire network as well. Improvement of the reliability of the ring topology requires adding redundant links between nodes as well as bypass circuitry. Adding such components, however, makes the ring topology less cost-effective.
The bus topology also provides poor reliability. If the link fails, that entire segment of the network is rendered useless. If a node fails, on the other hand, the rest of the network continues to operate. A redundant link for each segment increases the reliability of the bus topology but at extra cost.
Availability is a measure of performance dealing with the LANs ability to support all users who wish to access it. A network that is highly available provides services immediately to users, whereas a network that suffers from low availability typically forces users to wait for access.
Exhibit 1. Fault Tolerance of the Hub Architecture.
Availability on the bus topology depends on the load, the access control protocol used, and length of the bus. With a light load, availability is virtually ensured for any user who wishes to access the network. As the load increases, however, so does the chance of collisions. When a collision occurs, the transmitting nodes back off and try again after a short interval. The chance of collisions also increases with bus length.
With its multiple paths, a mesh topology, which is a variation of the bus topology, provides the highest degree of interconnectivity, which implies that the network is always available to users who require access.
A network based on a star topology can only support what the central hub can handle. In any case, the hubs LAN module can handle only one request at a time, which can shut out many users under heavy load conditions. Hubs equipped with multiple processors and LAN modules can alleviate this situation somewhat, but even with multiple processors, there is not usually a one-to-one correspondence between users and processors. Such a system would be cost-prohibitive.
The ring topology does not provide the same degree of availability as does a mesh topology but still represents an improvement over the star topology. The ring has a lower measure of availability than the mesh topology because each node on the ring must wait for the token before transmitting data. As the number of nodes on the ring increases, the time interval allotted for transmission decreases.
In todays distributed computing environments, with so much information traversing public and private networks, network managers must be acquainted with the available protection methods to ensure uninterrupted data flow and guard against data loss. On a wide area network (WAN), the choices include carrier-provided redundancy and protection services, customer-controlled reconfiguration, bandwidth on demand using ISDN, and dial backup. On the LAN, the choices include various recovery and reconfiguration procedures, the use of fault-tolerant servers and wiring hubs, and the implementation of redundant arrays of inexpensive disks. All these methods are discussed in detail in the following sections.
Previous | Table of Contents | Next |