Secure and Highly Available Architecture for SNMP Based Monitoring of Network Device
computer science topics|
Active In SP
Joined: Jun 2010
08-06-2010, 03:37 PM
A Secure and Highly Available Architecture for SNMP Based Monitoring of Network Devices.docx (Size: 102.22 KB / Downloads: 64)
A Secure and Highly Available Architecture for SNMP Based Monitoring of
Lalit K. Awasthi Ankur Gupta Nimrita Koul
Professor & HOD, Computer Science and Engineering National Institute of Technology (NIT)
2Assistant Professor, Computer Science and Engineering, Model Institute of Engineering and Technology (MIET), Jammu, India,
Lecturer, Computer Science and Engineering, Model Institute of Engineering and Technology (MIET), Jammu, India
The SNMP Master Agent, which is responsible for providing management information for network devices, presents a single point-of-failure for possible denial-of-service attacks. By sending badly encoded SNMP packets to the SNMP Master Agents implemented on various devices in a network, it is possible to cripple the network management framework. Till recently the typical vendor response to any such reported vulnerability was to provide fixes to the portion of code that implemented the SNMP decoding and parsing functionality. However, there are extremely large number of possible permutations and combinations for maliciously encoding a SNMP packet. As such, fixing the SNMP decoding logic for every possible failure scenario is extremely time consuming and may not be foolproof. This research paper presents a novel architecture for the SNMP Master Agent and associated sub-agents, which makes the SNMP framework on individual devices impervious to denial-of-service attacks, while identifying the origin of the malformed SNMP packets and discarding packets from the identified source.
A Network Management Station (NMS) typically gathers device and network related information from the various entities in the network using the Simple Network Management Protocol (SNMP) . SNMP is an application layer protocol that enables exchange of management information between the device and the management station. All network devices which need to be managed remotely, implement the SNMP Master Agent and various sub-agents that provide some specialized information. The master agent is responsible for the task of decoding incoming SNMP packets and passes it on to the sub-agent that implements the requested management information. It then encodes the response from the sub-agents and sends it back to the management station. Fig. 1 represents the typical SNMP agent based architecture. Further devices can signal any malfunctions or abnormal conditions by sending out asynchronous SNMP traps to the NMS, allowing the NMS to take appropriate action.
The SNMP agent based architecture, as discussed above in Fig.1, is however susceptible to denial-of-service attacks as demonstrated by the SNMP security test cases designed by the University of Oulu, Finland . The test suite comprising large number of test cases is designed to test the robustness of a given SNMP implementation. By using the injecting vector technique and creating SNMP request Protocol Data Units (PDUs) with exceptional elements inserted in them, the test suite managed to discover a host of vulnerabilities in SNMP implementations of many vendors, requiring these vendors to hurriedly provide security fixes for their implementations, while issuing security advisories for affected products. However, such an approach to handling security related an issue is not efficient. Given the possible number of combinations that can be employed to badly encode a SNMP reply or trap, it is not always possible for the management station to build bulletproof SNMP decoding logic.
Other approaches, for instance , to deal with maliciously encoded SNMP packets rely on discarding all incoming SNMP PDUs, except those from trusted sources within the network. However, this classification could be extremely time-consuming if there are a large number of devices within the network. Moreover, the list of trusted sources needs to be communicated to all the devices which need to be monitored using SNMP and also any changes to the list, which renders this approach infeasible.
Traditionally high-availability solutions have been designed for physical infrastructure like routers via Hot Standby Routing Protocol (HSRP) [4, 5] and servers/services via MC ServiceGuard [6, 7]. However, such commercial solutions are expensive to implement and are mostly vendor dependent. Such solutions are bas ed on physical redundancy of the underlying hardware and are overkill for achieving high-availability of software components such as the SNMP agents.
This research paper proposes a novel architecture that ensures the high-availability and operational efficiency of the SNMP agents in the face of denial-of-service attacks. It also enables identification and isolation of the source of the maliciously encoded SNMP PDUs. Thus, it helps alleviate the security risks that the network management framework is exposed to while security related software patches are made available by the vendor. The rest of this paper is organized as follows: Section 2 provides a detailed overview of the system model, while Section 3 utilizes a custom built simulator to look at the some of the performance implications of the proposed architecture and its comparison with the traditional approach. Finally, Section 4 provides the conclusions and presents some directions for future work.
2. SYSTEM MODEL
The proposed architecture achieves high-availability of the SNMP monitoring framework by removing the SNMP master agent from the line of fire, introducing a distribution agent instead for receiving SNMP requests. Further, there are 2 instances of the SNMP master agent and the associated sub-agents, to ensure that SNMP requests are responded to even if one instance of the master agent and sub-agents crashes while processing a badly formed SNMP PDU. The core component of the proposed architecture is thus the distribution agent, which binds to the standard UDP port no. 161. Its primary objective is to receive SNMP requests from the management station and forward it to one of the two instances of the SNMP Master Agents to which it maintains two independent connections. However, only one instance of the SNMP Master Agent and its associated sub-agents is active, while the other is in standby mode. Fig. 2 provides a schematic of the proposed architecture at the individual device level. A background thread within the distribution agent sends keepalive messages, as proposed in , to the active instance of the SNMP master agent. As soon as it detects that the active instance has gone down (due to non-receipt of acknowledgement message from the active instance) it designates the standby instance as the active instance. We assume that whenever the master agent goes down, it also takes the registered sub-agents down with it. The distribution agent also executes a script that brings up the recently crashed instance of the Master Agent and its associated sub-agents in standby mode. It then proceeds to route all subsequent SNMP requests to the new active instance of the SNMP agent, thereby ensuring that SNMP framework continues to be operational without any perceptible loss of service .
Another feature of this architecture requires the SNMP Master Agent to add the source IP address of the SNMP request being currently serviced into a shared file  or
the BAD_SNMP database. If the SNMP request is
serviced normally and a response is sent out, it deletes the IP address added to the shared file. Incase of a failure during servicing of the SNMP request, the IP address of the device which sent the malformed SNMP packet is recorded in the file for the now active instance of the SNMP Master Agent to match against the source of future incoming SNMP PDUs, dropping potentially malformed packets before processing the complete packet. It also sends out a trap to the Network Management Station informing it of potential denial-of-service attacks passing the list of source IP addresses from where malformed SNMP packets originated, allowing the network administrator to initiate any corrective action. Fig. 3 provides the activity diagram for the proposed architecture.
The proposed architecture requires no major modifications to the SNMP Master Agent and sub-agents code and their interfaces and interactions. The only changes that are required is to move the UDP communication code to the distribution agent and add code to the SNMP master agent which maintains and checks the BAD_SNMP database. Moreover, creating two instances of the SNMP master agent and associated sub-agents, one of which is in standby mode at all times does put any great extra demand on the computing resources of the device. Hence, the proposed architecture is feasible for implementation.
Another area of vulnerability in the SNMP based monitoring framework is the servicing of SNMP traps, which are indicative of faults at the device and the network level, by the NMS. The trap agent that binds to
UDP port 162, receives all incoming SNMP traps from devices all over the network being monitored, is again susceptible to denial-of-service attacks due to malformed SNMP trap PDUs. The proposed architecture for the SNMP master agent on individual devices is easily extensible to the trap agent as well, with a distribution agent at the NMS taking care of routing incoming trap PDUs to the active instance of the trap agent and devices sending malformed PDUs being recorded in the
3. PERFORMANCE ANALYSIS
The feasibility of the proposed framework has been established by building a simulator, which exchanges dummy protocol packets instead of implementing the SNMP protocol, for quick deployment and testing. The simulation results were obtained by having the simulator client (mimicking the NMS) and the distribution agent along with the two instances of the SNMP master agent within the same intranet, though on different LAN segments. The simulator makes use of the distribution agent to route incoming packets to the active dummy SNMP master agent. The client program sends dummy packets to UDP port 161 of the device on which the distribution agent is running. The client programs keep track of the messages they have sent and received from the device on which the distribution agent is deployed, also maintaining a list of packets for which no responses were received. Moreover, the client program also associates a timer with each packet sent to measure the timeout value. If no response is received within the timeout value (set to 1 sec), the packet is resent according to the configured number of retries (set to 2 by default). Client programs have various configuration options allowing control over the generated packet rate, which was fixed at 3 packets per second The client randomly introduces a special packet to be sent to the simulator which in turn executes a script to bring down the active dummy SNMP master agent to mimic a failover scenario, within each 10 minute interval of its operation. The distribution agent takes timing measurements for the time taken for failover to complete. The client programs that keep track of the lost packets report the resultant packet loss. Since, the testing was performed in the local network during times of low network usage; the probability of packet loss due to network latency or congestion was minimized, hence the lower timeout and retry values in the client simulator program. The simulator also calculates the additional overhead incurred due to the introduction of the distribution agent and its impact on the packet service rate of the system. Over a 5 hour run, the average failover time between the two instances of the dummy SNMP master agents was found to be 2 sec and 57 microseconds. This time was measured between the instance that the background thread in the distribution agent detected that the active instance crashed and the marking of the standby instance as active and execution of script to bring up the crashed instance in standby mode.
The background thread concludes that the active instance of the master agent has gone down when it does not receive responses to two consecutive keepalive messages that it sends to the active instance of the dummy master agent. The failover scenario was executed approximately 30 times during the run. The average number of packets lost during the failover, as reported by the client program, was 4, measured by keeping a track of packets for which the client received no responses. This packet loss can be overcome by appropriately setting the SNMP timeout and retry values depending on the latency of the network in question. Typical SNMP timeout values in commercial NMSs is upwards of 5 seconds (depending on the network latency) while retry value is typically fixed at 3. These values are sufficient to overcome the failover time as measured in the distribution agent. With the same values implemented in the client program, responses were received for all the packets,
albeit on the second or third retry. The proposed architecture does introduce some processing and communication overheads. Each incoming packet is received by the distribution agent and forwarded to the active instance of the SNMP master agent, introducing extra Inter-Process Communication between the distribution agent and the SNMP master agent. It also incorporates the failover logic , incase it detects that the active instance has gone down. Moreover, the SNMP master agent requires to lookup the BAD_SNMP database while processing each request PDU. Fig.4 provides a graphical representation of the average time taken in processing a packet by the proposed architecture versus that of the traditional architecture, during normal operation, giving an idea of the overhead introduced by the proposed architecture, while Fig. 5 provides a comparative analysis of the two approaches in term of the response times as measured in the simulator client programme as can be seen from the two figures below.
/ Send Keep I alive To \ Active
f Wait for I Response from Active snmpd
Receive msg from NMS
Forward to active snmpd
Fig 4: Comparative Processing Times For Normal And HA-SNMP
If no response
If no response
I Crash of J
Wait for the
300 250 200 150 100 50
Response Times For Get Request with One Variable
"Normal Response Time
Response Time With Proposed Architecture
I to Bad SNMP J
\ Database /
/ Forward response to
Fig 5: Comparative Response Times For Normal And HA-SNMP
Execute Script To Bring Up the crashed snmpd instance
Fig 3: Activity Diagram for Distribution Agent
The proposed architecture does have an impact on the overall system performance to the tune of 10-15%, which is offset by the increased security that it provides along with acceptable performance.
From the above analysis, there seems to be no comparison between the proposed architecture and the traditional one, in the face of denial of service attacks. The traditional architecture is just not able to cope up with the maliciously encoded SNMP packets. The proposed architecture is extremely simple and requires no changes to be made to the SNMP parsing and decoding logic. Further, in real-world implementations, the NMS typically has polling frequency of 5 minutes or so per device. As such, the network device in question is never inundated with SNMP packets during normal operation. As such the failover mechanism in the proposed architecture may not impact its responsiveness at all, unless it occurs at precisely the same time as the servicing of the SNMP requests from the NMS. The additional overhead incurred in the proposed architecture and the tradeoff between system performance and the security that it provides is therefore worth it.
4. CONCLUSIONS AND FUTURE WORK
This research paper presents a novel approach to solving the problem of susceptibility to denial-of-service attacks of the SNMP agent based architecture. The proposed architecture is entirely feasible since the additional overheads introduced by the architecture are compensated by marginally adjusting the SNMP response timeout and retry values. Moreover, this additional cost is justified by the increased security that is provided, which is nonÃ‚Â¬existent in the traditional approach even if the SNMP master agent is restarted at every crash. Moreover, the proposed architecture is easy to implement requiring simple modifications to the existing architecture. Future work shall involve securing the architecture against IP address spoofing and taking care of scenarios involving mobile agents, which can potentially move from node to node in the network sending badly encoded SNMP packets and traps to various network devices etc. An actual implementation with the SNMP framework shall also be built and tested in a live environment to validate the simulation results.
 W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 & 2, Third Edition, Addison-Wesley Longman, 1999.
 PROTOS SNMP test suite, website:
 Cisco SNMP security fixes description, ciscowarp/public/707/cisco-malformed-snmp -msgs-pub.shtml
 Using HSRP for Fault-Tolerant IP Routing,
 Cisco Hot Standby Router Protocol (HSRP),
javvinprotocol/rfc2281.pdf High Availability with MC Service-Guard, h71028.www7.hpenterprise/cache/6468-0-0-0-121.aspx
 Managing MC Service-Guard
docs.hpen/B3936-90073/index.html  High availability solutions for HP-UX 11i--h71028.www7.hpenterprise/cache/4174-0-0-0-121.htm
 R. Guerin and V. Peris, Quality-of-Service in Packet Networks: Basic Mechanisms and Directions, Computer Networks, vol. 31, no. 3, pp. 169-179, Feb. 1999.
 High Availability Architectures For Linux on IBM Systems
www-03.ibmservers/eserver/zseries/library/ whitepapers/pdf/HA_Architectures_for_Linux_on_Syste m_z.pdf
 Software Architectures for High-Availability Ethernet Services Interworking Frame Relay, ATM and Ethernet over IP/MPLS Networks.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion