randomised instuction set emulation
Active In SP
Joined: Oct 2009
30-10-2009, 04:07 PM
CS2K707.doc (Size: 70 KB / Downloads: 38)
RISE.pdf (Size: 215.02 KB / Downloads: 59)
presented by:Tushar Karayil Ravindran
Randomized Instruction Set Emulation
This topic deals with the security issue facing the modern computer systems.All the systems follow a specific standard to make them compatible and independent.So it is easy for an attacker to hit the vulnerability since he knows the standard that the system follows .Moerover the same attack will be successfull on al the systems that flow this standard.So by writing a single code he is able to exploit thousands.This topic deals with a way to de-fend against these type of attacks.We basically diversify the system at the machine level,a kind of destandardisation at the same time making it compatible.RISE(Randomized In-struction Set Emulation) shows how this diversification can be achieved at the machine instruction set level.
1.1 What is Code Injection Attack
Its either arbitrary code execution for spawning a remote shell or infecting it with a worm
Steps that go into a buffer overflow:- Inject attack code into a buffer and redirect the control
flow into the attack code.the system will then automatically execute the attack code completing
the attack process. The main targets are usually stack,heap,staic area,parameter modification
The basic form defense is guard all the doors type of defense.RISE is a complemetary method of defense against code injection attack.
1.2 Why do we need Randomization
Standardised interfaces between software and hardware are implemented to increase the com-patiblity of the system. Although they do lead to huge productivity because of the kind of independence they have, they do invite intruders who can easily formulate an attack because everything si standardized adnd hence a single code can affect millions of system in the same manner because they all follow a definite standard.
1.3 RISE:-Raridornized Instruction Set Emulation
The basic idea is to design a unique and private instruction set for each executuing program so that it would be difficult to design an attack for an outsider.each program has a diff and secret instruction set and we use a translator to randomize instructions at load time.so if the number of instruction sets is very large and randomized the cost of designing an attack is very large and attack should be different for each system.each byte of protected code in the program is invidually scrambled using pseudo random numbers Each byte of protected code in the program is individually scrambled using pseudorandom numbers seeded with a random key that is unique to each program execution. With the scrambling constants it is trivial to transform the obfuscated code back to normal instructions executable on the physical machine, but without knowledge of the key it is infeasible to produce even a short code sequence that implements any given behavior. Foreign binary code that reaches the path of execution will be descrambled without ever having been correctly scrambled producing some random bits that will crash the program uder attack.
2 Threat model
The threat that RISE deals with is the one where binary code is injected into a executing pro¬gram from the network.This does not include macro viruses that inject something other than the binary code or data injection attacks that do not operate on machine level. This threat model includes any attack in which native code is injected into a running binary, including misallo-cated malloc headers, footer tags and format string attacks that can write a byte to arbitrary memory locations without actually overflowing a buffer RISE will protect against injected code arriving by any of these methods. On the other hand, other buffer overflow defenses, such as the address obfuscation mentioned earlier, can prevent attacks that are specifically excluded from our code-based threat model. RISE provides NO DEFENSE against data-only attacks, which can range from the modifi- cation of jump addresses and parameters to call an existing library function .
When binary attack code, arriving over the network, exploits a bug and manages to interpose itself into the emulator execution path, the injected code will not have been scrambled by the loader. Consequently, when the attack code is fetched and unscrambled by the emulated instruction unit, it will appear as an essentially random string of bits. Despite the density of the x86 instruction set, we present data suggesting that the vast majority of random code sequences will encounter an address fault or illegal instruction quickly, aborting the program. Thus with RISE, an attack that would otherwise take control of a program is downgraded into a
denial-of-service attack against the exploitable program. Regardless of what flaw is exploited in a protected programwhether well-known or entirely novelthe network binary code injection attack will fail with very high probability.
The RISE strategy is to provide each program copy its own unique and private instruction set. The following are the implementation issues: 1. what is the most appropriate machine abstrac¬tion level. 2. how to scramble and descramble instructions, 3.when to apply the randomization and when to descramble, 4.how to protect interpreter data. Etc
3.1 Machine Abstraction Level
Valgrind is primarily used as a tool for detecting memory leaks and other program errors, it contains a complete x86-tox86 binary translator. The primary drawback of Valgrind is that it is very slow, largely due to its extensive access checking.
4 Instruction set randomization
Instruction set randomization could be as radical as developing a new set of opcodes, instruc¬tion layouts, and a key-based toolchain capable of generating the randomized binary code. And, it could take place at many points in the compilation-to-execution spectrum. Although performing randomization early could help distinguish code from data, it would require a full compilation environment on every machine, and recompiled randomized programs would likely have one fixed key indefinitely. RISE randomizes as late as possible in the process, scrambling each byte of the trusted code as it is loaded into the emulator, and then unscrambling it before execution by the virtual machine. Deferring the randomization to load time makes it possible to scramble and load existing files in the Executable and Linking Format (ELF)directly, without recompilation or source code. The unscrambling process needs to be fast, and the scrambling process must be as hard as possible for an outsider to deduce. The current approach is to generate at load time a pseudo-random sequence the length of the overall program text using the Linux /dev/ urandom device . The resulting bytes are simply XORed with the instruction bytes to scramble and unscramble them. If the underlying truly random key is long enough, then its almost sure that an attacker could not break the entire sequence.
5 Design decisions
Two important aspects of the RISE implementation are how it handles shared libraries and how it protects the plaintext executable. Much of the code executed by modern programs resides in shared libraries. This form of code sharing can significantly reduce the effect of the diversification, as processes must use the same instruction set as the libraries they require. When our load-time randomization mechanism writes to memory that belongs to shared objects, the Operating System does a copy-on-write, and a private copy of the scrambled code is stored in the virtual memory of the process. This significantly increases memory requirements, but increases interprocess diversity and avoids having the plaintext code mapped in the protected processes memory. Protecting the plaintext instructions is a second concern. During the fetch cycle when the next byte(s) are read from program memory, RISE intercepts the bytes and unscrambles them; the scrambled code in memory is never modified. Eventually, however, a plaintext piece of the program (semantically equivalent to a basic block) is written to Systems cache. From a security point of view, it would be best to separate the RISE address space completely from the protected program address space, so that the plaintext is inaccessible from the vulnerable program, but as a practical matter this would slow down emulator data accesses to an extreme and unacceptable degree. For efficiency, the RISE interpreter is best located in the same address space as the target binary, but of course this introduces some security concerns. A RISE-aware attacker could aim to inject code into a RISE data area, rather than that of the vulnerable process. This is a problem because the cache cannot be encrypted. To protect it, cache pages are kept as read and execute only. When a new translated block is ready to be written to the cache, we mark the affected pages as writable, execute the write action, and return them to their original non-writable permissions.
6 Experimental Results
This section contains some of the experiments and results that were performed after imple¬menting RISE. This section is taken directly from the references without any change at all. We have tested RISEs ability to run programs successfully under normal conditions and its ability to disrupt a variety of machine code injection attacks . In addition, we have tested the safety of executing instruction sequences after they have been randomized and concluded that programs randomized under RISE can execute with very low probability of doing damage. Finally, we make some observations about the performance of RISE concluding that the approach could be used in a production system if ported to a more efficient emulator.
Two synthetic and a dozen real attacks were tested on the system. The synthetic attacks, published in create a vulnerable bufferin one case on the heap and in the other case on the stackand inject shellcode into it. Without RISE, both attacks successfully spawned a shell, and with RISE, the attacks were stopped. The real attacks were launched from the CORE Impact attack toolkit . We selected twelve attacks that satisfied the following requirements of our threat model and the chosen emulation tool: the attack is launched from a remote site; the attack injects binary code at some point in the execution; the attack succeeds on a Linux OS. Valgrind is specifically designed to run under Linux, and we tested several different Linux distributions, reporting data from two (RedHat from 6.2 to 7.3 and Mandrake 7.2). All of the attacks were tested to make sure they were successful in the vulnerable application before retesting with RISE. The attacks were all successfully defeated by RISE (column 4 of Table 1). When we analyzed the logs generated by RISE, however, we discovered that 9 of the 14 tested attacks failed without ever executing the injected attack code. This class of attacks is notoriously fragile, and the mere fact of emulation can often disrupt them; one could imagine modifying the attacks to overcome the perturbations of the emulator, and in the future we hope to test these modified attacks against RISE. The synthetic attacks and more robust real attacks (Bind NXT, Samba trans2, and rpc.statd), were unaffected by the emulators presence and all managed to establish a shell successfully when
7 How safe is it to execute random instructions?
Defenses such as RISE depend on randomization to prevent an attacker from knowing precisely what an attack will do. If foreign machine code is injected into a RISE protected program without scrambling, then when it is unscrambled for execution it will be mapped to essentially random bytes and will not perform any specific function. If such random code does not behave as intended, what does it do? The expectation is that random code strings will cause the attacked program to crash quickly, but we dont know a priori what will happen. The RISE prototype produces randomized instruction sets that are in byte-for-byte correspondence with actual x86 instructions, so the transformation process does not affect code size or layout. This avoids complexity and allows us to defer randomization until load time. But, with so much of the x86 opcode space already defined, there is a significant chance that a randomly scrambled opcode will be something other than an illegal instruction. To test the safety of random instructions, the following tests were performed: a small program that contained a rotshell was built and exploit coded in x86 machine code . When the program ran, it first randomized the exploit code in place using a random number seed supplied on the command line. It then returned into the randomized attack code following the pattern that could happen in an attack. Out of the 30000 program tests almost 99.8 percent programs were aborted by the following signals:-SIGILL is an illegal instruction, SIGFPE is a floating point exception (such as division by zero), and SIGSEGV and SIGBUS are two varieties of addressing problems. In the remaining cases, the program entered an (apparently) infinite loop. In none of the 30,000 test cases did the attack code manage to access the command interpreter /bin/sh as intended by the attacker.
Nonetheless, this case study suggests that the vast majority of randomizations of a genuine attack do indeed simply cause a program crash. Another caveat in this test is that we dont know exactly how many instructions were executed before the signal occurred. Random control transfers occur frequently, so the location of a signal does not correlate directly with number of instructions executed. A cumulative total of about 6 percent of the signaling cases occurred at addresses below the starting point of the attack. Using RISE itself, we can address the question of how many instructions are executed, because it is easy for an emulator to count how many instructions it has emulated.After running the two synthetic attacks (described earlier) one hundred times each (with a new seed each time) and it was discovered that neither attack ever executed successfully. On average, each synthetic attack instance executed 2.35 bytes of instructions before process death. Within the RISE approach, one could avoid the problem of accidentally viable code by mapping to a larger instruction set. The size could be tuned to reflect the desired percentage of incorrect unscramblings that will likely lead immediately to an illegal instruction.
There is a significant cost introduced by the memory checking engine of Valgrind. However, RISE adds only a modest performance penalty beyond that. In terms of execution time, a RISE-protected program executes about 5 percent more slowly than the same program running under Valgrind; it is believed much of that slowdown is due to the relatively high cost of the mprotect system calls used to control modifications of the trace cache. In terms of space, signifi- cant impacts come from the scrambling information and the private copies of shared libraries, each of which requires about as much space as the protected code. It has been possible to RISE-protect every one of the services used in the experiments (httpd, named, cvs pserver, smbd, sshd, rpc.statd, sendmail, wuftpd) on a 200 MHz Pentium computer with 128 MB RAM, and run it with reasonable response time. This is a far smaller and slower machine than any modern x86-based server system, which gives us confidence that the memory expense does not make the scheme impractical and would be a reasonable tradeoff for increased security.
Diversity in software engineering is quite different from diversity for security. In software engineering, the basic idea is to generate multiple independent solutions to a problem (e.g., multiple versions of a software program) with the hope that they will fail independently, thus greatly improving the chances that some solution out of the collection will perform correctly in every circumstance. The different solutions may or may not be produced manually, and the number of solutions is typically quite small, around ten. Diversity in security is introduced for a different reason. Here, the goal is to reduce the risk of widely replicated attacks, by forcing the attacker to redesign the attack each time it is applied. For example, in the case of a buffer overflow attack, the goal is to force the attacker to rewrite the attack code for each new computer that is attacked. Here the number of different diverse solutions is very high, potentially equal to the total number of program copies for any given program. Manual methods are infeasible here, and the diversity must be produced auomatically. A classification of diversity methods applied to security (called security adaptations) which classifies adaptations based on what is being adapted, either the interface or the implementation . Interface adaptations modify code layout or access controls to interfaces, without changing the underlying implementation to which the interface gives access. Implementation adaptations, on the other hand, do modify the underlying implementation of some portion of the system to make it resistant to attacks. RISE can be viewed as an interface randomization at the machine code level. Earlier work in automated diversity for security has experimented with diversifying data layouts ,file systems and systemcall interfaces . In addition, several project and implimentations address the codeinjection threat model directly.
Developers of buffer overflow attacks have developed a variety of workaroundssuch as ramps and landing zones of no-ops and multiple return addressesaimed at coping with variations across different versions or different compilations of the vulnerable software. Deliberate diversification via random stack padding coerces an attacker to use such generalization techniques; it also necessitates larger attack codes in proportion to the size range of random padding employed. The StackGuard system provides a counter-defense against landing zones and similar attack techniques by interposing a hardto- guess canary word before the return address, the value of which is checked before the function returns. An attempt to overwrite the return address via linear stack smashing will almost surely change the canary value and thus be detected.
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion