survivable networks systems full report
computer science technology|
Active In SP
Joined: Jan 2010
22-01-2010, 07:08 AM
Survivable Networks Systems.doc (Size: 217 KB / Downloads: 77)
Survivable Network Systems-An Emerging Discipline-material.pdf (Size: 140.92 KB / Downloads: 83)
Survivable Network Systems-An Emerging Discipline.doc (Size: 69.5 KB / Downloads: 62)
Society is growing increasingly dependent upon large-scale, highly distributed systems that operate in unbounded network environments. Unbounded networks, such as the Internet, have no central administrative control and no unified security policy. Furthermore, the number and nature of the nodes connected to such networks cannot be fully known. Despite the best efforts of security practitioners, no amount of system hardening can assure that a system that is connected to an unbounded network will be invulnerable to attack. The discipline of survivability can help ensure that such systems can deliver essential services and maintain essential properties such as integrity, confidentiality, and performance, despite the presence of intrusions. Unlike the traditional security measures that require central control or administration, survivability is intended to address unbounded network environments. This report describes the survivability approach to helping assure that a system that must operate in an unbounded network is robust in the presence of attack and will survive attacks that result in successful intrusions. Included are discussions of survivability as an integrated engineering framework, the current state of survivability practice, the specification of survivability requirements, strategies for achieving survivability, and techniques and processes for analyzing survivability.
1. SURVIVABILITY IN NETWORK SYSTEMS
Contemporary large-scale networked systems that are highly distributed improve the efficiency and effectiveness of organizations by permitting whole new levels of organizational integration. However, such integration is accompanied by elevated risks of intrusion and compromise. These risks can be mitigated by incorporating survivability capabilities into an organizationâ„¢s systems. As an emerging discipline, survivability builds on related fields of study (e.g., security, fault tolerance, safety, reliability, reuse, performance, verification, and testing) and introduces new concepts and principles. Survivability focuses on preserving essential services in unbounded environments, even when systems in such environments are penetrated and compromised.
1.1 The New Network Paradigm: Organizational Integration
From their modest beginnings some 20 years ago, computer networks have become a critical element of modern society. These networks not only have global reach, they also have impact on virtually every aspect of human endeavor. Network systems are principal enabling agents in business, industry, government, and defense. Major economic sectors, including defense, energy, transportation, telecommunications, manufacturing, financial services, health care, and education, all depend on a vast array of networks operating on local, national, and global scales. This pervasive societal dependency on networks magnifies the consequences of intrusions, accidents, and failures, and amplifies the critical importance of ensuring network survivability.
As organizations seek to improve efficiency and competitiveness, a new network paradigm is emerging. Networks are being used to achieve radical new levels of organizational integration. This integration obliterates traditional organizational boundaries and transforms local operations into components of comprehensive, network-resident business processes. For example, commercial organizations are integrating operations with business units, suppliers, and customers through large-scale networks that enhance communication and services. These networks combine previously fragmented operations into coherent processes open to many organizational participants. This new paradigm represents a shift from bounded networks with central control to unbounded networks. Unbounded networks are characterized by distributed administrative control without central authority, limited visibility beyond the boundaries of local administration, and lack of complete information about the network. At the same time, organizational dependencies on networks are increasing and risks and consequences of intrusions and compromises are amplified.
1.2 The Definition of Survivability
We define survivability as the capability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents. We use the term system in the broadest possible sense, including networks and large-scale systems of systems. The term mission refers to a set of very high-level (i.e., abstract) requirements or goals.
Missions are not limited to military settings since any successful organization or project and implimentation must have a vision of its objectives whether expressed implicitly or as a formal mission statement. Judgments as to whether or not a mission has been successfully fulfilled are typically made in the context of external conditions that may affect the achievement of that mission. For example, assume that a financial system shuts down for 12 hours during a period of widespread power outages caused by a hurricane. If the system preserves the integrity and confidentiality of its data and resumes its essential services after the period of environmental stress is over, the system can reasonably be judged to have fulfilled its mission. However, if the same system shuts down unexpectedly for 12 hours under normal conditions (or under relatively minor environmental stress) and deprives its users of essential financial services, the system can reasonably be judged to have failed its mission, even if data integrity and confidentiality are preserved.
Timeliness is a critical factor that is typically included in (or implied by) the very high-level requirements that define a mission. However, timeliness is such an important factor that we included it explicitly in the definition of survivability. The terms attack, failure, and accident are meant to include all potentially damaging events; but these terms do not partition these events into mutually exclusive or even distinguishable sets. It is often difficult to determine if a particular detrimental event is the result of a malicious attack, a failure of a component, or an accident. Even if the cause is eventually determined, the critical immediate response cannot depend on such speculative future knowledge.
Attacks are potentially damaging events orchestrated by an intelligent adversary. Attacks include intrusions, probes, and denial of service. Moreover, the threat of an attack may have as severe an impact on a system as an actual occurrence. A system that assumes a defensive position because of the threat of an attack may reduce its functionality and divert additional resources to monitoring the environment and protecting system assets.
We include failures and accidents as part of survivability. Failures are potentially damaging events caused by deficiencies in the system or in an external element on which the system depends. Failures may be due to software design errors, hardware degradation, human errors, or corrupted data. Accidents describe the broad range of randomly occurring and potentially damaging events such as natural disasters. We tend to think of accidents as externally generated events (i.e., outside the system) and failures as internally generated events.
With respect to system survivability, a distinction between a failure and an accident is less important than the impact of the event. Nor is it often possible to distinguish between intelligently orchestrated attacks and unintentional or randomly occurring detrimental events. Our approach concentrates on the effect of a potentially damaging event. Typically, for a system to survive, it must react to (and recover from) a damaging effect (e.g., the integrity of a database is compromised) long before the underlying cause is identified. In fact, the reaction and recovery must be successful whether or not the cause is ever determined.
Our primary focus in this report is to help systems survive the acts of intelligent adversaries. This bias is based on the nature of the organization to which the authors belong. Our Survivable Network Technology Team is an outgrowth of the CERT. Coordination Center, which has been helping users respond to and recover from computer security incidents since 1988.
Finally, it is important to recognize that it is the mission fulfillment that must survive not any particular subsystem or system component. Central to the notion of survivability is the capability of a system to fulfill its mission, even if significant portions of the system are damaged or destroyed. We will sometimes use the term survivable system as less than perfectly precise shorthand for a system with the capability to fulfill a specified mission in the face of attacks, failures, or accidents. Again, it is the mission, not a particular portion of the system that must survive.
1.3 The Domain of Survivability: Unbounded Networks
The success of a survivable system depends on the computing environment in which the survivable system operates. The trend in networked computing environments is toward largely unbounded network infrastructures. A bounded system is one in which all of the systemâ„¢s parts are controlled by a unified administration and can be completely characterized and controlled. At least in theory, the behavior of a bounded system can be understood and all of its various parts identified. In an unbounded system there is no unified administrative control over its parts.
We use the term administrative control in the strictest sense, which includes the power to impose and enforce sanctions and not simply to recommend an appropriate security policy. In an unbounded system, each participant has an incomplete view of the whole, must depend on and trust information supplied by its neighbors, and cannot exercise control outside its local domain.
An unbounded system can be composed of bounded and unbounded systems connected together in a network. Figure 1 illustrates an unbounded domain consisting of a collection of bounded systems in which each bounded system is under separate administrative control. Although the security policy of an individual bounded system cannot be fully enforced outside of the boundaries of its administrative control, the policy can be used as a yardstick to evaluate the security state of that bounded system.
Of course, the security policy can be advertised outside of the bounded system; but administrators are severely limited in their ability to compel or persuade outside individuals or entities to follow it. This limitation is particularly true when an unbounded domain spans jurisdictional boundaries, making legal sanctions difficult or impossible to impose.
Figure 1: An Unbounded Domain Viewed as a Collection of Bounded Systems
When an application or software-intensive system is exposed to an environment consisting of multiple, unpredictable administrative domains with no measurable bounds, the system have an unbounded environment. An unbounded environment exhibits the following properties:
Â¢ Multiple administrative domains with no central authority
Â¢ An absence of global visibility (i.e., the number and nature of the nodes in the network cannot be fully known)
Â¢ Interoperability between administrative domains determined by convention
Â¢ Widely distributed and interoperable systems
Â¢ Users and attackers can be peers in the environment
Â¢ Cannot be partitioned into a finite number of bounded environments
The Internet is an example of an unbounded environment with many client-server network applications. A public Web server and its clients may exist within many different administrative domains on the Internet; yet there exists no central authority that requires all clients to be configured in a way expected by the Web server. In particular, a Web server can never rely on a set of client plug-ins to be present or absent for any function that the server may want to provide.
For a Web server providing a financial transaction (e.g., for a Web-based purchase), the Web server may require that the user install a plug-in on the client to support a secure transaction. However, due to the unbounded nature of the environment, previously installed plug-ins from a competitor may be present on the client that may corrupt, subvert, or damage the Web server during the transaction. For the Web server to be survivable there must be built-in protection from malicious client interactions and these protections must make no assumptions about the configuration or features of the remote client.
In this example, the Web server and its clients make up the system. The multiple administrative domains are the variety of site domains on the Internet. Many of these domains have legitimate users. Other sites are used for intrusions in an anonymous setting. These latter sites cannot be distinguished by their administrative domain, but only by client behavior. The interoperability between the server and its clients is defined by http (hypertext transfer protocol), a convention agreed upon between the server and clients. The system, composed of Web servers and clients, is widely distributed both geographically and logically throughout the Internet. Legitimate users and attackers are peers in the environment and there is no method to isolate legitimate users from the attackers. In other words, there is no way to bind the environment to legitimate users using only a common administrative policy.
Unbounded systems are a significant component of todayâ„¢s computing environment and will play an even larger role in the future. The Internet non-hierarchical network of systems, each under local administrative control onlyis a primary example of an unbounded system. While conventions exist that allows the parts of the Internet to work together, there is no global administrative control to assure that these parts behave according to these conventions. Therefore, security problems abound. Unfortunately, the security problems associated with unbounded systems are typically underestimated.
1.4 Characteristics of Survivable Systems
A key characteristic of survivable systems is their capability to deliver essential services in the face of attack, failure, or accident. Central to the delivery of essential services is the capability of a system to maintain essential properties (i.e., specified levels of integrity, confidentiality, performance, and other quality attributes) in the presence of attack, failure, or accident. Thus, it is important to define minimum levels of quality attributes that must be associated with essential services. For example, a launch of a missile by a defensive system is no longer effective if the system performance is slowed to the point that the target is out of range before the system can launch.
These quality attributes are so important that definitions of survivability are often expressed in terms of maintaining a balance among multiple qualities attributes such as performance, security, reliability, availability, fault-tolerance, modifiability, and affordability. The Architecture Tradeoff Analysis project and implimentation at the Software Engineering
Institute is using this attribute-balancing (i.e., tradeoff) view of survivability to evaluate and synthesize survivable systems. Quality attributes represent broad categories of related requirements, so a quality attribute may contain other quality attributes. For example, the security attribute traditionally includes the three attributes: confidentiality, integrity, and availability.
The capability to deliver essential services (and maintain the associated essential properties) must be sustained even if a significant portion of the system is incapacitated.
Furthermore, this capability should not be dependent upon the survival of a specific information resource, computation, or communication link. In a military setting, essential services might be those required to maintain an overwhelming technical superiority and essential properties may include integrity, confidentiality, and a level of performance sufficient to deliver results in less than one decision cycle of the enemy. In the public sector, a survivable financial system is one that maintains the integrity, confidentiality, and availability of essential information and financial services, even if particular nodes or communication links are incapacitated through intrusion or accident, and that recovers compromised information and services in a timely manner. The financial systems survivability might be judged by using a composite measure of the disruption of stock trades or bank transactions (i.e., a measure of the disruption of essential services).
Key to the concept of survivability, then, is identifying the essential services (and the essential properties that support them) within an operational system. Essential services are defined as the functions of the system that must be maintained when the environment is hostile or failures or accidents are detected that threaten the system.
There are typically many services that can be temporarily suspended when a system is dealing with an attack or other extraordinary environmental condition. Such a suspension can help isolate areas affected by an intrusion and free system resources to deal with its effects. The overall function of a system should adapt to preserve essential services.
We have linked the capability of a survivable system to fulfill its mission in a timely manner to its ability to deliver essential services in the presence of attack, accident, or failure. Ultimately, mission fulfillment must survive not any portion or component of the system. If an essential service is lost, it can be replaced by another service that supports mission fulfillment in a different but equivalent way. However, we still believe that the identification and protection of essential services is an important part of a practical approach to building and analyzing survivable systems. As a result, we define essential services to include alternate sets of essential services (perhaps mutually exclusive) that need not be simultaneously available. For example, a set of essential services to support power delivery may include both the distribution of electricity and the operation of a natural gas pipeline.
To maintain their capabilities to deliver essential services, survivable systems must exhibit the four key properties illustrated in Table 1:
Table 1: The Key Properties of Survivable Systems
1.5 Survivability as an Integrated Engineering Framework
As a broadly-based engineering paradigm, survivability is a natural framework for integrating established and emerging software engineering disciplines in the service of a common goal. These established areas of software engineering, which are related to survivability, include security, fault tolerance, safety, reliability, reuse, performance, verification, and testing. Research in survivability encompasses a wide variety of research methods, including the investigation of
Â¢ Analogues to the immunological functioning of an individual organism
Â¢ Sociological analogues to public health efforts at the community level
1.5.1 Survivability and Security
The discipline of computer security has made valuable contributions to the protection and integrity of information systems over the past three decades. However, computer security has traditionally been used as a binary term that suggests that at any moment in time a system is either safe or compromised. We believe that this use of computer security engenders viewpoints that largely ignore the aspects of recovery from the compromise of a system and aspects of maintaining services during and after an intrusion. Such an approach is inadequate to support necessary improvements in the state of the practice of protecting computer systems from attack. In contrast, the term survivable system refers to systems whose components collectively accomplish their mission even under attack and despite active intrusions that effectively damage a significant portion of the system.
Robustness under attack is at least as important as hardness or resistance to attack.
Hardness contributes to survivability, but robustness under attack (and, in particular, recoverability) is the essential characteristic that distinguishes survivability from traditional computer security. At the same time, survivability can benefit from computer security research and practice, and survivability can provide a framework for integrating security with other disciplines that can contribute to system survivability.
1.5.2 Survivability and Fault Tolerance
Survivability requires robustness under conditions of intrusion, failure, or accident. The concept of survivability includes fault tolerance, but is not equivalent to it. Fault tolerance relates to the statistical probability of an accidental fault or combination of faults, not to malicious attack. For example, an analysis of a system may determine that the simultaneous occurrence of the three statistically independent faults (f1, f2, and f3) will cause the system to fail. The probability of the three independent faults occurring simultaneously by accident may be extremely small, but an intelligent adversary with knowledge of the systemâ„¢s internals can orchestrate the simultaneous occurrence of these three faults and bring down the system. A fault-tolerant system most likely does not address the possibility of the three faults occurring simultaneously, if the probability of occurrence is below a threshold of concern. A survivable system requires a contingency plan to deal with such a possibility.
Redundancy is another factor that can contribute to the survivability of systems. However, redundancy alone is insufficient since multiple identical backup systems share identical vulnerabilities. A survivable system requires each backup system to offer equivalent functionality, but significant variance in implementation. This variance thwarts attempts to compromise the primary system and all backup systems with a single attack strategy.
1.6 The Current State of Practice in Survivable Systems
Much of todayâ„¢s research and practice in computer-systems survivability takes a perilously narrow, security-based view of defense against computer intrusions. This narrow view is dangerously incomplete because it focuses almost exclusively on hardening a system (e.g., using firewall technology or an orange book approach to host protection) to prevent a break-in or other malicious attack. This view does little about how to detect an intrusion or what to do once an intrusion has occurred or is under way.
This view is also accompanied by evaluation techniques that limit their focus to the relative hardness of a system, as opposed to a systemâ„¢s robustness under attack and ability to recover compromised capabilities.
The architecture of secure bounded systems is built upon the existence of a security policy and its enforcement, which is imposed by the exercise of administrative control. In contrast, an unbounded system has no administrative control with which to impose global-security policy. For instance, on the Internet today the backbone architecture exists independent of security policy considerations because there is no global administrative control.
Affordability is always a significant factor in the design, implementation, and maintenance of systems that support the national infrastructure (e.g., the power grid, the public switched communications networks, and the financial networks) and our national defense. In fact, the trend toward increased sharing of common infrastructure components in the interest of economy virtually ensures that the civilian networked information infrastructure and its vulnerabilities will always be an inseparable part of our national defense.
Practical, affordable systems are almost never 100% customized, but rather are constructed from commonly available off-the-shelf components with internal structures that are well known. The trend toward developing systems through integration and reuse instead of customized design and coding efforts is a cornerstone of modern software engineering. Unfortunately, the intellectual complexity associated with software design, coding, and testing virtually ensures that exploitable bugs can and will be discovered in commercial and public domain products with internal structures that are available for analysis. When these products are incorporated as components of larger systems, those systems become vulnerable to attack strategies based on the exploitable bugs. Popular commercial and public-domain components offer attackers a ubiquitous set of targets with well-known and typically unvarying internal structures. This lack of variability among components translates into a lack of variability among systems. These systems potentially allow a single attack strategy to have a wide-ranging and devastating impact.
The natural escalation of offensive threats versus defensive countermeasures has demonstrated time and again that no practical systems can be built that are invulnerable to attack. Despite best efforts, there can be no assurance that systems will not be breached. Thus, the traditional view of information systems security must be expanded to encompass the specification and design of system behavior that helps the system survive in spite of active intrusions. Only then can systems be created that are robust in the presence of attack and are able to survive attacks that cannot be completely repelled.
In short, the nature of contemporary system development dictates that even hardened systems can and will be broken. Therefore, survivability must be designed into systems to help avoid the potentially devastating effects of system compromise and failure due to intrusion.
1.6.1 Incident Handling Has Enhanced Survivability
Although applying the term survivability to computer systems is relatively new, the practice of survivability is not. Much of the survivability practice to date has been in the realm of incident response (IR) teams. In fact, the CERTÃ‚Â® Coordination Center
(CERT/CC) has, throughout its history, enhanced system survivability in the Internet community. The CERT/CC provides incident response services (helping organizations respond to and recover from incidents) and publishes and distributes vulnerability advisories (akin to public health notices). Traditionally, the CERT/CC has been concerned about survivability and has been successful in helping sites with risk mitigation and recovery.
The experience of the CERT Coordination Center has shown that how organizations respond to and recover from computer intrusions is at least as important as the steps they take to prevent them. We believe that widespread availability and use of survivable systems by the Internet community and throughout the Internet infrastructure will provide the best hope for the dramatic improvements necessary to transform the Internet into a survivable, networked information system of systems. Survivable systems will help make the Internet a viable medium for the conduct of commerce, defense, and government.
This medium will also enable the support of major elements of the national infrastructure
(e.g., power grid, public switched network, and air traffic control).
1.6.2 Firewalls Embody the Current State of Practice
Currently, little of the basic technology in security engineering and system integration applies to unbounded systems. Instead, current practice assumes that the capability exists to identify, define, and characterize the extent of administrative control over a system, all access points to that system, and all signals that may appear at those access points. In unbounded systems, such as the current Internet and the future National Information Infrastructure, these boundary conditions cannot be fully determined. The current state of practice in survivability and security evaluation tends to treat systems and their environments as static and unchanging. However, the survivability and security of systems in fact degrades over time as changes occur in their structures, configurations, and environments, and as knowledge of their vulnerabilities spreads throughout the intruder community.
On the Internet today, the cornerstone of security is the notion of a firewall, a logically bounded system within a physically unbounded one. We assert that bounded-system thinking within unbounded domains leads to security designs and architectures that are fundamentally flawed from a survivability perspective. One notable example is the use of a firewall as the basic security component of the Internet. This approach is severely limited and can be readily circumvented by exploiting the fundamental differences between bounded and unbounded systems. Traditional firewalls are the state of the art for security architectures, but not for survivable systems, because they are passive, filter-only devices. The addition of active components, such as detection and a dynamic response capability, will allow firewalls to play a role in survivable systems.
2. DEFINING REQUIREMENTS FOR SURVIVABLE SYSTEMS
Survivability requirements can vary substantially depending on system scope, criticality, and the consequences of failure and interruption of service. Categories of requirements definitions for survivable systems include function, usage, development, operation, and evolution. In this section, we present definitions of survivability requirements, ways in which these requirements can be expressed, and their impact on system survivability. The new paradigm for system requirements definition and design is characterized by distributed services, distributed logic, distributed code (including executable content), distributed hardware, a shared communications and routing infrastructure, diminished trust, and a lack of unified administrative control. Assuring the survivability of mission critical systems developed under this new paradigm is a formidable high-stakes effort for software engineering research. This effort requires that traditional computer security measures be augmented by new and comprehensive system survivability strategies.
2.1 Expressing Survivability Requirements
The definition and analysis of survivability requirements is a critical first step in achieving system survivability. Figure 2 depicts an iterative model for defining these requirements. Survivability must address not only requirements for software functionality, but also requirements for software usage, development, operation, and evolution. Thus, five types of requirements definitions are relevant to survivable systems in the model. These requirements are discussed in detail in the following subsections.
Figure 2: Requirements Definition for Survivable Systems
System/Survivability Requirements: The term system requirement refers to traditional user functions that a system must provide. For example, a network management system must provide functions to enable users to monitor network operations, adjust performance parameters, etc. System requirements also include nonfunctional aspects of a system, such as timing, performance, and reliability. The term survivability requirement refers to the capabilities of a system to deliver essential services in the presence of intrusions and compromises and to recover full services.
Figure 3 depicts the integration of survivability requirements with system requirements at node and network levels.
Figure 3: Integrating Survivability Requirements with System Requirements
Survivability requires that system requirements be organized into essential services and non-essential services. Essential services must be maintained even during successful intrusions; non-essential services are recovered after intrusions have been handled. Essential services may be stratified into any number of levels, each embodying fewer and more vital services as the severity and duration of intrusion increases. Thus, definitions of requirements for essential services must be augmented with appropriate survivability requirements.
As shown in Figure 2, survivable systems may also include legacy and acquired COTS components that were not developed with survivability as an explicit objective. Such components may provide both essential and non-essential services and may require functional requirements for isolation and control through wrappers and filters to permit their safe use in a survivable system environment.
Figure 3 show that survivability itself imposes new types of requirements on systems. These new requirements include the resistance to, recognition of and recovery from intrusions and compromises, and adaptation and evolution to diminish the effectiveness of future intrusion attempts. These survivability requirements are supported by a variety of existing and emerging survivability strategies, as noted in Figure 2 and discussed in more detail below.
Finally, Figure 3 depicts emergent behavior requirements at the network level. These requirements are characterized as emergent because they are not associated with particular nodes, but rather emerge from the collective behavior of node services in communicating across the network. These requirements deal with the survivability of overall network capabilities (e.g., capabilities to route messages between critical sets of nodes regardless of how intrusions may damage or compromise network topology).
We envision survivable systems that are capable of adapting their behavior, function, and resource allocation in response to intrusions. For example, when necessary, functions and resources devoted to non-essential services could be reallocated to the delivery of essential services and to intrusion resistance, recognition, and recovery. Requirements for such systems must also specify how the system should adapt and reconfigure itself in response to intrusions.
Systems can exhibit large variations in survivability requirements. Small local networks may require few or no essential services and recovery times measured in hours. Conversely, large-scale networks of networks may require a core set of essential services, automated intrusion detection, and recovery times measured in minutes. Embedded command and control systems may require essential services to be maintained in real time and recovery times measured in milliseconds. The attainment and maintenance of survivability consume resources in system development, operation, and evolution. The resources allocated to a systemâ„¢s survivability should be based on the costs and risks to an organization associated with the loss of essential services.
Usage/Intrusion Requirements: Survivable-system testing must demonstrate the correct performance of essential and non-essential system services as well as the survivability of essential services under intrusion. Because system performance in testing (and operation) depends totally on the systemâ„¢s use, an effective approach to survivable-system testing is based on usage scenarios derived from usage models.
Usage models are developed from usage requirements. These requirements specify usage environments and scenarios of system use. Usage requirements for essential and non-essential services must be defined in parallel with system and survivability requirements. Furthermore, intruders and legitimate users must be considered equally. Intrusion requirements that specify intrusion-usage environments and scenarios of intrusion use must be defined as well. In this approach, intrusion use and legitimate use of system services are modeled together.
Figure 4 depicts the relationship between legitimate and intrusion use. Intruders may engage in scenarios beyond legitimate scenarios, but may also employ legitimate use for purposes of intrusion if they gain the necessary privileges.
Figure 4: The Relationship Between Legitimate and Intrusion Usage
Development Requirements: Survivability places stringent requirements on system development and testing practices. Inadequate functionality and software errors can have a devastating effect on system survivability and provide opportunities for intruder exploitation. Sound engineering practices are required to create survivable software.
The following five principles (four technical and one organizational) are example requirements for survivable-system development and testing practices:
Â¢ Precisely specify the systemâ„¢s required functions in all possible circumstances of system use.
Â¢ Verify the correctness of system implementations with respect to the functional specifications.
Â¢ Specify function usage in all possible circumstances of system use, including intruder use.
Â¢ Test and certify the system based on function usage and statistical methods.
Â¢ Establish permanent readiness teams for system monitoring, adaptation, and evolution.
Sound engineering practices are required to deal with legacy and COTS software components as well.
Operations Requirements: Survivability places demands on requirements for system operation and administration. These requirements include defining and communicating survivability policies, monitoring system use, responding to intrusions, and evolving system functions as needed to ensure survivability as usage environments and intrusion patterns change over time.
Evolution Requirements: System evolution responds to user requirements for new functions. However, this evolution is also necessary to respond to increasing intruder knowledge of system behavior and structure. In particular, survivability requires that system capabilities evolve more rapidly than intruder knowledge. This rapid evolution prevents intruders from accumulating information about otherwise invariant system behavior that they need to achieve successful penetration and exploitation.
2.1.1 Requirements Definition for Essential Services
The preceding discussion distinguishes between essential and non-essential services. Each system requirement must be examined to determine whether it corresponds to an essential service. The set of essential services must form a viable subsystem for users that is complete and coherent. If multiple levels of essential services are required, each set of services provided at each level must also be examined for completeness and coherence. In addition, requirements must be defined for making the transition to and from essential-service levels.
When distinguishing between essential and non-essential services, all of the usual requirements-definition processes and methods can be applied. Elicitation techniques such as those embodied in Software Requirements Engineering can help to identify essential services. Tradeoff and cost/benefit analysis can help to determine the sets of services that sufficiently address business survivability risks and vulnerabilities. Provisions for tracing survivability requirements through design, code, and test must be established. As previously mentioned, simulation of intrusion through intruder-usage scenarios is included in the testing process.
2.1.2 Requirements Definition for Survivability Services
After specifying requirements for essential and non-essential services, a set of requirements for survivability services must be defined. These services can be organized into four general categories: resistance, recognition, recovery, and adaptation and evolution. These survivability services must operate in an intruder environment that can be characterized by three distinct phases of intrusion: penetration, exploration, and exploitation.
Penetration Phase. In this phase, an intruder attempts to gain access to a system through various attack scenarios. These scenarios range from random inputs by hobbyist hackers to well-planned attacks by professional intruders. These attempts are designed to capitalize on known system vulnerabilities.
Exploration Phase. In this phase, the system has been penetrated and the intruder is exploring internal system organization and capabilities. By exploring, the intruder learns how to exploit the access to achieve intrusion objectives.
Exploitation Phase. In this phase, the intruder has gained access to desired system facilities and is performing operations designed to compromise system capabilities.
Penetration, exploration, and exploitation create a spiral of increasing intruder authority and a widening circle of compromise. For example, penetration at the user level is typically a means to find root-level vulnerabilities. User-level authorization is then employed to exploit those vulnerabilities to achieve root-level penetration. Finally, compromise of the weakest host in a networked system allows that host to be used as a stepping-stone to compromise other more protected hosts.
Requirements definitions for resistance, recognition, recovery, and adaptation and evolution services help select survivability strategies to deal with these phases of intrusion. Some strategies, such as firewalls, are the product of extensive research and development and currently are used extensively in bounded networks. New survivability strategies are emerging to respond to the unique challenges of unbounded networks.
Resistance Service Requirements. Resistance is the capability of a system to deter attacks. Resistance is thus important in the penetration and exploration phases of an attack, before actual exploitation. Current strategies for resistance include the use of firewalls, authentication, and encryption. Diversification is a resistance strategy that will likely become more important for unbounded networks.
Requirements for diversification must define planned variation in survivable system function, structure, and organization, and the means for achieving it. Diversification is intended to create a moving target and render ineffective the accumulation of system knowledge as an intrusion strategy. Diversification also eliminates intrusion opportunities associated with multiple nodes that execute identical software and typically exhibit identical vulnerabilities. Such systems offer tempting economies of scale to intruders, since when one node has been penetrated, all nodes can be penetrated. Requirements for diversification can include variation in programs, retained data, and network routing and communication. For example, systematic means can be defined to randomize software programs while preserving functionality.
Recognition Service Requirements. Recognition is the capability of a system to recognize attacks or the probing that precedes attacks. The ability to react or adapt during an intrusion is central to the capacity of a system to survive an attack that cannot be completely repelled. To react or adapt, the system must first recognize it is being attacked. In fact, recognition is essential in all three phases of attack.
Current strategies for attack recognition include both state-of-the-art intrusion detection and mundane but effective techniques such as logging and frequent auditing as well as follow-up investigations of reports generated by ordinary error detection mechanisms. Advanced intrusion-detection techniques are generally of two types: anomaly detection and pattern recognition. Anomaly detection is based on models of normal user behavior. These models are often established through statistical analysis of usage patterns. Deviations from normal usage patterns are flagged as suspicious. Pattern recognition is based upon models of intruder behavior. User activity that matches a known pattern of intruder behavior raises an alarm.
Requirements for future survivable networks will likely employ additional strategies such as self-awareness, trust maintenance, and black-box reporting. Self-awareness is the process of establishing a high-level semantic model of the computations that a component or system is executing or has been asked to execute. A system or component that understands what it is being asked can refuse requests that would be dangerous, compromise a security policy, or adversely impact the delivery of minimum essential services.
Trust maintenance is achieved by a system through periodic queries among its components (e.g., among the nodes in a network) to continually test and validate trust relationships. Detection of signs of intrusion would trigger an immediate test of trust relationships.
Black-box reporting is a dump of system information that can be retrieved from a crashed system or component for analysis to determine the cause of the crash (e.g., design error or specific intrusion type). This analysis can help to prevent other components from suffering the same fate.
A survivable-system design must include explicit requirements for recognition of attack. These requirements ensure the use of one or more of the preceding strategies through the specification of architectural features, automated tools, and manual processes. Since intruder techniques are constantly advancing, recognition requirements should be frequently reviewed and continuously improved.
Recovery Service Requirements. Recovery is a systemâ„¢s ability to restore services after an intrusion has occurred. Recovery also contributes to a systemâ„¢s ability to maintain essential services during intrusion.
Requirements for recoverability are what most clearly distinguish survivable systems from systems that are merely secure. Traditional computer security leads to the design of systems that rely almost entirely on hardening (i.e., resistance) for protection. Once security is breached, damage may follow with little to stand in the way. The ability of a system to react during an active intrusion is central to its capacity to survive an attack that cannot be completely repelled. Recovery is thus crucial during the exploration and exploitation phases of intrusion. Recovery strategies in use today include replication of critical information and services, use of fault-tolerant designs, and incorporation of backup systems for hardware and software. These backup systems maintain master copies of critical software in isolation from the network. Some systems, such as large-scale transaction processing systems, employ elaborate, fine-grained transaction roll-back processes to maintain the consistency and integrity of state data.
Adaptation and Evolution Service Requirements. Adaptation and evolution are critical to maintaining resistance to ever-increasing intruder knowledge of how to exploit otherwise unchanging system functions. Dynamic adaptation permanently improves a systemâ„¢s ability to resist, recognize, and recover from intrusion attempts. For example, an adaptation requirement may be an infrastructure that enables the system to inoculate itself against newly-discovered security vulnerabilities by automatically distributing and applying security fixes to all network elements. Another adaptation requirement may be that intrusion detection rule sets are updated regularly in response to reports of known intruder activity from authoritative sources of security information, such as the CERTÃ‚Â® Coordination Center. Adaptation requirements ensure that such capabilities are an integral part of a systemâ„¢s design. As in the cases of resistance, recognition, and recovery requirements, the constant evolution of intruder techniques requires that adaptation requirements be frequently reviewed and continuously improved.
3. SURVIVABILITY DESIGN AND IMPLEMENTATION STRATEGIES
In this section we examine strategies that support the survivability of critical system functions in unbounded networks. Strategies for survivability in networked systems depend on several assumptions and constraints. Although they may seem obvious, these assumptions and constraints must be made explicit. The assumptions differ radically from the implicit assumptions traditionally made for the unprocessed, multiprocessor, and bounded network systems on which most previous research and development has been based.
For unbounded networks, we assume that
Â¢ Any individual node of the network can be compromised
Â¢ Survivability does not require that any particular physical component of the network be preserved
Â¢ Only the essential services of the network as a whole must survive
Â¢ For reasons of reliability, design error, user error, and intentional compromise, the trustworthiness of a network node or any node with which it can communicate cannot be guaranteed
In this report, we primarily discuss unbounded networks. The term unbounded has a slightly different meaning depending on the purpose and situation involved. In all cases, unbounded networks relate to three principal characteristics that are present in each definition: a lack of central physical or administrative control, absence of insight or vision into all parts of the network, and no practical limit on growth in the number of nodes in the network.
These assumptions impose the following constraints on the architecture of survivable networks and on the form of feasible survivability strategies:
Â¢ There must not be a single point of failure within the network. Essential services are distributed in a manner that is not critically dependent on any particular component or node.
Â¢ Global knowledge is impossible to achieve in a distributed system. There are no all-seeing global oracles. Instead, protocols define the interaction and knowledge shared between nodes.
Â¢ Each node must continuously validate the trustworthiness of itself and those with which it communicates.
Â¢ Computations within a given node of an unbounded network, whether for essential service communication or trust validation, must have costs that are less than proportional to the number of nodes in the network.
3.1 Four Aspects of Survivability Solution Strategies
As introduced in Section 2, there are four aspects of the survivability solution which can serve as a basis for survivability strategies. These four aspects are: resistance, recognition, recovery, and system adaptation and evolution. This section summarizes the approaches in each of these four areas.
There are many techniques for dealing with these four aspects. Any or all of the techniques may apply to survivable systems. We do not list all of these techniques but instead categorize them within the broader aspects. Table 2 contains the four aspects of the survivability solution and representative taxonomies of respective strategies.
Table 2: A Taxonomy of Strategies Related to Survivability
3.2 Support of Strategies by the Computing Infrastructure
The rapid growth of the Web and other Internet-based applications has encouraged the growth of a computing infrastructure to support distributed applications. While the initial Web efforts concentrated on information publishing, the application domain has expanded to encompass a much wider spectrum of an organizationâ„¢s computing needs. a technical focus of this growth has moved from tools such as Web browsers or servers to the development of a set of Internet-compatible, commercially provided services. Examples of these services are file, print, transaction, messaging, directory, security, and object services such as CORBA (Common Object Request Broker Architecture) and DCOM (Distributed Component Object Model).
The commercially available distributed infrastructures are in the early phases of their development and do not yet directly support system survivability. Recognition is not a supported service and recovery is indirectly supported by a transaction server. Typically, an organization adopts such an infrastructure to lower costs by using a common infrastructure for intranets, extranets, and Internet applications and to simplify application development by capturing the complexity of distributed computing in the infrastructure rather than in each application.
Managing user-profile data is an example of a service that a distributed infrastructure can assume. One general requirement of system survivability is to provide user authentication and manage the authority given to that user for data and systems access. Authentication can be implemented using passwords and authorizations that are validated by access-control lists. However, in many existing systems, such as database applications, access-control lists are maintained by the application.
When system users, data, and applications are geographically distributed, the maintenance of user-profile data in an application is difficult. A shared directory service, which is part of a distributed infrastructure, can provide the data storage capability and a protocol such as LDAP (Lightweight Directory Access Protocol) for application access and replace the application-specific access-control mechanisms. These infrastructure security services can provide the mechanisms for user authentication such as a public key interface, mechanisms to describe access control, and the means to define a security policy. The use of shared services for user authentication and authorization should reduce application and overall system complexity as well as provide the means to define an organizational security policy.
When this strategy is implemented, the system architecture is constrained by the infrastructure-supplied services and the protocols supported. For example, a survivability strategy may be to exchange a primary service with an alternate implementation of that service if the primary service has been compromised. At this stage of infrastructure deployment there is some interoperability supported among services provided by different vendors, however, there is also significant integration of services that makes it difficult or impossible to replace a service, such as a directory service, with one from a different vendor.
Using shared directory services also raises general survivability issues. A widely used infrastructure should develop a robust set of services. However, their wide use develops a large and knowledgeable intruder community and a wide dissemination of information about system vulnerabilities and security solutions. A compromised or inaccessible directory can affect multiple applications and multiple sites.
An essential part of providing system survivability is establishing operational and administrative procedures for system directories so that system administrators can monitor service and provide recovery. The design tradeoff is that implementing monitoring and recovery procedures is less costly using shared components than using an application-specific architecture. Infrastructure services provide generic support for replication and maintenance of consistency across distributed sites. However, achieving overall mission survivability requires not only understanding the impact of compromised access control data and of the design of a recovery policy, but also knowledge of the systemâ„¢s applications.
Commercially available infrastructure products provide general services that are independent of application domain. Some of the services listed in Figure 3, however, require application-domain knowledge. For example, recognition of an intrusion or maintenance of trust among nodes requires knowledge of expected behavior. A protocol can ensure that information is delivered, but cannot validate the appropriateness of the data. Simple recovery mechanisms can include transaction logs or file restorations; but use of transactions, rollback strategies, and more advanced techniques require domain expertise to identify consistent application states and the impact of compromised data.
The successful use of such recovery strategies has been in application-centered products, such as relational database systems that manage relatively homogeneous data structures. Applying such techniques to general distributed-computing systems is more difficult.
3.3 Survivability Design Observations
We can draw a number of observations about the questions and issues that must be addressed concerning system survivability in networked systems.
3.3.1 Survivability Requires Trust Maintenance
An open issue is how to determine the basis of trust and how an individual node of a network contributes to the survivability of the systemâ„¢s essential services when
Â¢ Any node can be unreliable or rogue
Â¢ There is no global view or global control
Â¢ Nodes cannot completely trust themselves or their neighbors
Depending on the application, it may be possible through architectural design or dynamic action within the system to increase the reliability, visibility, and control of components or the trustworthiness of participants. The only absolute basis for trust maintenance, however, is the consistency of behavioral feedback from interactions with other nodes and independent verification of claimed actions from nodes not directly involved in the transactions.
A closely related point is the absence of global view and control. If unreliable and untrustworthy components are found to be present in a system, determining whether the critical functions have been compromised may be extremely difficult without global view and control. If global view and control are absent (and, in general, they will be) this condition does not preclude effective survivable-network architectures. In particular, it should be possible for individual nodes to generally contribute to the survivability goals and at worst not interfere with these goals.
Genetic algorithms, for example, achieve their effects through the collective action of the individual participants. These participants, however, cannot measure overall effectiveness or determine whether their contribution is positive. This example suggests that survivability solutions can exist among emergent algorithms that depend on continuous interaction with neighboring nodes but do not require feedback for indications of progress and success.
3.3.2 Survivability Analysis Is Protocol-Based Not Topology-Based
Another implication for networked systems is that the important aspects of their architecture from the viewpoint of survivability relate to the conventions and rules of interaction between neighboring nodes and that the network topology is largely irrelevant. That is, network architectures must be specified, compared, and measured in terms of their interactions and not the topology of their interconnection.
Survivability requires tradeoff analysis between the responsibilities of the servers and the clients and between end-to-end protocol monitoring by the application and general protocol monitoring provided by the infrastructure. For such a recovery strategy, the application level may be the appropriate level in which to analyze application-state and user behavior and select appropriate recovery actions.
3.3.3 Survivability Is Emergent and Stochastic
Survivability goals are emergent properties that are desired for the systems as a whole, but do not necessarily prevail for individual nodes of the system. This approach contrasts with traditional system designs in which specialized functions or properties are assured for particular nodes and the composition of the system must ensure that those properties and functional capabilities are preserved for the system as a whole. For survivability, we must achieve system-wide properties that typically do not exist in individual nodes. A survivable system must ensure that desired survivability properties emerge from the interactions among the components in the construction of reliable systems from unreliable components.
Survivability is inherently stochastic. If survivability properties are emergent, they are present only when the number of contributing component nodes of a system is sufficiently large. If the number or arrangement of nodes falls below a critical threshold, the attendant survivability property fails. An example of this type of critical survivability property is connectivity in a communications system.
You can design the architecture of the system to maximize the number of paths between any two nodes; but if enough links are compromised to partition the network, communication between arbitrary nodes will no longer succeed. Thus, survivability properties, algorithms, and architectures should be specified, viewed, and assessed to determine the probability of their success under given conditions of use and not determined as discrete quantities.
3.3.4 Survivability Requires a Management Component
The design of a survivable system also includes management operations and administration. Poor system administration is a frequent source of vulnerabilities at centrally administered sites. In unbounded network systems, system administration must be coordinated across multiple sites. Existing system administration procedures typically assume a bounded environment and full administrative control over the required services. The complexity of infrastructure and the use of services outside an organizationâ„¢s immediate control require expanding the administrative services and providing a monitoring function as part of the infrastructure.
4. RESEARCH DIRECTIONS
There is a number of promising research areas in survivable systems. The plans for the Survivable Network Technology team at the SEI include
Â¢ Adapting and developing architectural description techniques to adequately describe large-scale distributed systems with survivability attributes
Â¢ Representing intruder environments through intruder usage models
Â¢ Creating an analysis method to evaluate survivability as a global emergent property from architectural specification
Â¢ Refining the analysis technology and instruments through pilot tests of real distributed systems
Security requirements are best refined in parallel with the system operations and design in a spiral-type development process. Considering the possible avenues of attack during this process, i.e., making the process intrusion-aware, is critical to ensuring sufficient protection against and recovery from malicious attack. The methodological requirements described above, therefore, apply to methods for engineering security requirements. Attack trees, and the use of attack patterns for reuse, help to achieve these requirements. They organize related intrusion scenarios in a compact way that relates back to the survivability of the enterprise mission. Attack trees allow the refinement of attacks to a level of detail chosen by the developer. The developer is free to explore certain attack paths in more depth than others while still being able to generate intrusion scenarios that make sense. Refining the leaves of the attack tree simply generates new leaves resulting in intrusion scenarios at the new lower level of abstraction. An attack pattern provides a structure to encode expert security knowledge for reuse. In addition, asking resistance, recognition, and recovery questions at attack tree nodes may suggest improvements to both requirements and design. Attack trees provide a powerful mechanism to document the multitude of diverse types of attacks, to abstract from intrusion details as a buffer against attack volatility, and to suggest improvements to requirements and design. They are, however, only a relatively small part of the answer as to how to use intrusion scenarios to improve security requirements engineering. The lack of accurate adversary models and risk analysis methods obstructs the progress on this front. A rich attack pattern library populated with attack patterns at the right level of abstraction is needed to build enterprise attack trees more systematically. Finally, the lack of robust resistance, recognition, and recovery countermeasures hampers our ability to improve designs and requirements. Overcoming these obstacles will require a truly interdisciplinary effort.
Â¢ Cyberspace Security Issues and the Concept of a U.S. Minimum,
Â¢ Bass, L.; Clements, P.; & Kazman, R. Software Architecture in Practice.
Reading, Mass.: Addison Wesley Longman, 1998
Â¢ Birman, Kenneth P. Building Secure and Reliable Network Application.
Greenwich, Conn : Manning, 1996.
1. SURVIVABILITY IN NETWORK SYSTEMS
1.1 The New Network Paradigm: Organizational Integration
1.2 The Definition of Survivability
1.3 The Domain of Survivability: Unbounded Networks
1.4 Characteristics of Survivable Systems
1.5 Survivability as an Integrated Engineering Framework
1.6 The Current State of Practice in Survivable Systems
2. DEFINING REQUIREMENTS FOR SURVIVABLE SYSTEMS
2.1 Expressing Survivability Requirements
3. SURVIVABILITY DESIGN AND IMPLEMENTATION STRATEGIES
3.1 Four Aspects of Survivability Solution Strategies
3.2 Support of Strategies by the Computing Infrastructure
3.3 Survivability D
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion