Introduction
Today I would like to let us touch the area of self-modifying and obfuscated code. It’s not a secret that obfuscation is a basis “survival” technique for every modern malware. The business is simple: if your “kung-fu” is good enough – the probability that your malicious software would be able to survive (and successfully do its dirty job) is much higher. On the other side of the barricade there are countless AVs, IDSs, IPSs, firewalls, etc., which are claiming to be fully-automated intelligent malware detection tools. Both sides are motivated and very engaged in the battle so winners and losers are changing their places quite often. So today we would take a closer look to the "dark side".
The main reason for the article is my impression that so many people (surprisingly, the ones working in IT security and forensic) have really no idea how malware is working It is also is not clear how the tools for malware detection are dealing with it. And honestly, how you can fight with something if you don't know what it actually is? Know Your Enemy, dude! I want to demonstrate that self-modifying code is actually quite smart, but not necessarily too complicated mechanism. I also want us to become a malware investigators and for the very beginning try to hide the only one, single (and simple) function in our binary.
Oh yes, there are actually two articles. The first (this one) is a kind of introduction into code obfuscation. The second one will be an attempt to write fully working cryptor – the software which would allow us to execute malicious code and bypass antivirus protection! There is a small application prepared which illustrates everything below so you are welcomed to [download] it before the training. I will be intentionally omitting many technical details, just to give a broad view. Everything you need can be found later in thousand sources in the Internet.
Prerequisites:
- Some knowledge of programming for Windows (examples will be in Delphi, but they are quite simple, so may be easily implemented in any other programming language). You definitely have to know what pointers are. It is also highly recommended to know what the Windows API is.
- Basic knowledge of debugging binary applications for Windows would also be desired. We will be using famous and free OllyDBG.
- Basic knowledge of assembler (I mean it: a very basic one. Don't run away now!). :-)
Ok, it's time to dance.
Part 1. Five Flavours of MessageBox
Definition
Self-modifying code is the kind of code that literally changes its own instructions while it is executing. Self-modification can be accomplished in a variety of ways depending upon the programming language and its support for pointers and/or access to dynamic compiler or interpreters. Here are some examples:
- Overlay of existing instructions (or parts of instructions such as opcode, register, flags or address).
- Direct creation of whole instructions or sequences of instructions in memory.
- Creating or modification of source code statements followed by a 'mini compile' or a dynamic interpretation (see eval statement).
- Creating an entire program dynamically and then executing it.
In our tutorial we will be playing with the code modification throughout execution ('on-the-fly') with emphasis on direct creation of instructions in memory.
Experiments
Traditionally, we would start from something easy. And what can be more simple than showing a message box. :-)
procedure TForm1.Button1Click(Sender: TObject); begin application.MessageBox('This is my first text', 'Info', mb_ok); //--- parameters: message, caption, style end;
Note: we provided three parameters to the Delphi function application.MessageBox. And the execution of this gives us the following (very predictable) result.
Nice. It would be interesting to see what the processor is doing when the binary is executed. Our good-old OllyDbg is coming with help. Ok, we are loading our .exe into the debugger and assuring the breakpoint on the API function MessageBoxA is set up.
Now run the binary and press the button nr 1 in our sweet application. Woop! Debugger stopped at the breakpoint and we may observe some interesting things.
Executed instructions (opcodes and disassembled code):
And the processor's stack:
What we may actually see is that the Delphi application.MessageBox function is nothing but a kind of a "wrap-up" for the Windows API function: MessageBoxA defined in the libraryuser32.dll. This function has not three but four parameters:
- parent window's handle (address)
- the text (address of the text in memory)
- window caption (address of the caption in memory)
- window style (buttons used, etc)
Last but not least: we also must have the address of the MessageBoxA function to be able to call it. Generally Windows API functions are used more-or-less the following way: their parameters are PUSHed in processor stack in reversed order (see the illustration above), then address of the function is copied to EAX and CALL is executed.
Once we know what is going on - we may try to call the same function (user32.MessageBoxA) directly in Assembler. Let's define some global variables first (I will tell you later why we use globals):
var str1, str2: pchar; //--- our strings pAddr: integer; //--- address of the function
And now write the function itself:
procedure CallMessageBox(); asm push dword ptr 0 //--- push style: 0 push dword ptr str1 //--- push DWORD parameter (caption) push dword ptr str2 //--- push DWORD parameter (message) push dword ptr 0 //--- push hOwner: 0 mov eax, pAddr call eax //-- call address of the function, which is currently in EAX end;
This is the way we may call this function (when pressing button nr 2):
procedure TForm1.Button2Click(Sender: TObject); begin str1 := 'Info'; str2 := 'This is my second text'; pAddr := dword(GetProcAddress(GetModuleHandle('User32.dll'), 'MessageBoxA')); CallMessageBox; end;
The way we may get the address of the function exported by user32.dll is highlighted in red.Style = 0 (not defined) so the only "OK" button is shown. hOwner can also be 0 (new window's owner is a current application). Press the button nr 2 - everything works. Now let's check our executed code in the debugger:
...and the stack:
All is clear, right? Now maybe there should be a word said about why the "global variables" are used. This is because we want to have a "far" addresses: full DWORDs. If we would use a local variables - the code will be different because of optimization made by a compiler. And we need a simple and very clear picture now, isn't it?
Ok, everything up to this point was just a list of instructions, written in high-level programming language (Delphi) or low-level (Assembler), compiled and executed as a static binary code (a sequence of processor instructions).
Now let's think: we know how those processor instructions looks like. Why not to create exactly the same sequence of instructions but at runtime? So we can create kind of a "program A" which at runtime would write to the memory a sequence of some instructions (kind of "program B") and then execute them.
But before that let's check if we may access the opcodes of any existing function (read it from the memory). Because it would be much easier to copy the opcodes from existing functions not being using OllyDBG. (We are naturally lazy hackers, aren't we...) :)
This or that way, I prepared some simple function which can do that:
//---------------------------it allows to get opcodes from any existing function function copyOpcodesFromExistingFunction(sourceFunc: pointer; maxLength: integer): string; var i: integer; p: pointer; code: string; begin result := ''; for i := 0 to maxLength do //--- now we can see the opcodes of this function begin p := ptr(dword(sourceFunc) + dword(i)); //--- craft pointer directly. code := inttohex(byte(p^), 2); result := result + code; if code = 'C3' then break; end; end;
Function copies everything from certain place in the memory (by given pointer to an analysed object/function) and reads the commands until it would meet C3 opcode, which is "return from the function" (or RET). Quck illustration what the pointer is here:
The similar approach is used for any other structures in memory (except maybe the fact that the data may not necessarily finish with a null byte). So let's check the opcodes of our functionCallMessageBox.
procedure TForm1.Button3Click(Sender: TObject); var myAsm: string; begin str1 := 'Info'; str2 := 'This is my third text'; pAddr := dword(GetProcAddress(GetModuleHandle('User32.dll'), 'MessageBoxA')); caption := inttohex(pAddr, 8); memo1.Text := copyOpcodesFromExistingFunction(@CallMessageBox, 40); //---take opcodes from existing function end;
This is what you would see when press the button nr 3:
Looks familiar, isn't it? :-) If we would add some line breaks - we would see the opcodes, previously shown by the debugger.
The technique described above, by the way, is the very nice way to review the opcodes for any function in Delphi. Kind of "self-debugger" or something... Great, now let's do the same, but the opposite way: now we would construct the same code at runtime. See this:
//---------------------------------construct our function manually (MessageBoxA) procedure TForm1.Button4Click(Sender: TObject); var myAsm: string; t: TByteArray; oldProtect: DWORD; dummyFunc: function: Integer; //--- dummy function begin str1 := 'Info'; str2 := 'This is my fourth text'; pAddr := dword(GetProcAddress(GetModuleHandle('User32.dll'), 'MessageBoxA')); myAsm := ' 6800000000' + //--- push style: 0 ' FF35' + inttohex(swapEndian(dword(@str1)), 8) + //--- push dword: caption address ' FF35' + inttohex(swapEndian(dword(@str2)), 8) + //--- push dword: message address ' 6800000000' + //--- push hOwner: 0 ' B8' + inttohex(swapEndian(pAddr), 8) + //--- mov EAX, dword: API function address ' FFD0' + //--- call EAX ' C3'; //--- RET (return from function) t := StringToArrayOfByte(myAsm); @dummyFunc := @t; //--- point the inline function to our byte array VirtualProtectEx(0, @dummyFunc, SizeOf(dummyFunc), PAGE_EXECUTE_READWRITE, @oldProtect); //--- allow the page where the code resides - treated as executable dummyFunc; //--- run our hand-crafted function end;
This is what's happening when you press the button nr 4.
- We assign values to our string variables ("info", "fourth text").
- We are retrieving the address of the function we will be calling (MessageBoxA).
- We create the string (myAsm) which is ASCII-hex-encoded list of our instructions to be executed.
- Our instructions list (myAsm) is written as bytes into memory TByteArray.
- We assign the pointer of the dummy function to the TByteArray.
- We mark the memory page where our TByteArray (actually - our new function) resides as code (PAGE_EXECUTE_READWRITE), so our innocent bytes (data) becomes a code and can be executed.
- The function (our brand new code: dummyFunc) is successfully executed by user.
The result:
Hurray, it works! Here some important things:
- Keep attention to this definition: TByteArray = array[1..200] of byte (Important!)
- We must keep in mind the reversed sequence of parameters.
- We must swap endians for all addresses (A1B1C1D1 becomes D1C1B1A1). Don't ask me why - processor just works this way.
Important remark: we know perfectly that all strings explicitly defined in source code will be clearly visible in our binary. All instructions after compilation are also visible in our binary. You will find them in a one second, being armed with a hex editor. And this is also exactly what all antiviruses are doing: checking for certain combinations of bytes in binaries(on hard drive) and/or in memory, before execution of the program. See our code inside of the binary:
Of course AV's don't care about our innocent MessageBoxA, but for sure may keep much more attention e.g. to the following functions: VirtualProtect, SetThreadContext,GetThreadContext, WriteProcessMemory, ReadProcessMemory, CreateProcess, etc.
So what we can do to make it more difficult to detect what our program will be doing?
- We may encrypt all strings (all our data, our code which will be executed and all keywords like "User32.dll" or "MessageBoxA".
- We may use undocumented API functions.
Whaaat? How you can use something which is not documented? And what the hell it is? In short: it's kind of a "box inside the box". :-) Something like this:
So we already know that the Delphi's Application.MessageBox is actually calling MessageBoxA. But there is more: the MessageBoxA is internally calling another function: MessageBoxExA. This inner function may have different (or additional) parameters. We can easily proof it with our debugger by setting up the break on MessageBoxExA:
The result (clearly MessageBoxA is calling MessageBoxExA):
Parameters on stack:
We clearly see one more additional parameter appeared: languageID. Btw, it's a nice and simple example of API's reverse engineering, isn't it. We may confirm our findings in MSDN(assuming the function is documented) or google for it. Ok, so what we can do now, we may construct our opcodes for the MessageBoxExA instead of MessageBoxA. This is it:
procedure TForm1.Button5Click(Sender: TObject); var myAsm: string; t: TByteArray; oldProtect: DWORD; dummyFunc: function: Integer; //--- dummy function begin str1 := pchar(simpleDecryption('B6919990')); //--- encrypted text: "Info" str2 := pchar(simpleDecryption('AB97968CDF968CDF9286DF9996998B97DF8B9A878B')); //--- encrypted text: "This is my fifth text" pAddr := dword(GetProcAddress(GetModuleHandle(pchar(simpleDecryption('AA8C9A8DCCCDD19B9393'))), //--- encrypted text: "User32.dll" pchar(simpleDecryption('B29A8C8C9E989ABD9087BA87BE')))); //--- encrypted text: "MessageBoxExA" myAsm := ' 6800000000' + //--- push languageId: 0 ' 6800000000' + //--- push style: 0 ' FF35' + inttohex(swapEndian(dword(@str1)), 8) + //--- push dword: caption address ' FF35' + inttohex(swapEndian(dword(@str2)), 8) + //--- push dword: message address ' 6800000000' + //--- push hOwner: 0 ' B8' + inttohex(swapEndian(pAddr), 8) + //--- mov EAX, dword: API function address ' FFD0' + //--- call EAX ' C3'; //--- RET (return from function) memo1.Text := myAsm; t := StringToArrayOfByte(myAsm); @dummyFunc := @t; //--- point the inline function to our byte array VirtualProtectEx(0, @dummyFunc, SizeOf(dummyFunc), PAGE_EXECUTE_READWRITE, @oldProtect); //--- allow the page where the code resides - treated as executable dummyFunc; //--- run our hand-crafted function end;
Note that strings are also encrypted now. The myAsm variable may also be initially encrypted (I left it as it is - to make everything a little bit more clear). There are some primitive encryption functions used:
Encryption:
function simpleEncryption(dataIn: string): string; var i: integer; begin for i := 1 to length(dataIn) do result := result + inttohex(255 - ord(dataIn[i]), 2); end;
Decryption:
function simpleDecryption(dataIn: string): string; var i: integer; begin for i := 1 to length(dataIn) div 2 do result := result + chr(255 - strToInt('$' + copy(dataIn, i * 2 - 1, 2))); end;
When you press the button nr 5 - result is exactly as expected: Message box is popping-up.
Even if our suspicious antivirus for some reason is sensitive to the opcodes of MessgeBoxA (assuming this function may be used for something bad) - it may not necessarily be so sensitive to MessageBoxExA. And this is exactly what we want to achieve!
Conclusion
The simple self-modifying and obfuscating mechanism described above may be surprisingly good to omit antivirus protections. Now, being knowing all this, we may try to do something more serious: write our own cryptor, which should allow us to "smuggle" a 3rd party malicious code to any target system. Stay tuned and w8 for the 2nd part of the tutorial.
I would like to encourage you for feedback, which would be very useful. This would help us to adjust the texts and illustrations, avoid possible uncertainties and make everything even more easy to read and understand. :)
RefLink: http://itsecuritylab.eu/index.php/2010/08/25/self-modifying-code-%E2%80%93-short-overview-for-beginners/