-
03-02-2012, 11:52 AM #1Security Researcher
- Join Date
- Oct 2010
- Location
- Bangalore
- Posts
- 14
- Thanks
- 2
- Thanked 5 Times in 4 Posts
Bloodshed Dev-C++ 4.9.9.2 Compiler Analysis [Reversing Engineering Tips]
Okay folks,
This is one idea that came to my mind. Its pretty simple, I'm trying to write some sample programs in a particular compiler, this time it is Bloodshed Dev-C++ 4.9.9.2, and then analyse the compiled binary in IDA to figure out the code that compiler adds to the binary automatically and create a pattern out of it (if possible) through multiple analysis. This IMHO shall definitely help while reversing binaries that are compiled with that particular compiler so that the reverse engineer can save some time ignoring the unnecessary codes. I don't know if this has been done before or there are any tools that do this but its just my try. Hope it may be useful to some of you.
So lets get started. I have 3 C programs.
File: 1.c
PEiD Signature for 1.exeCode:#include <stdio.h> #include <conio.h> int main() { int a,b,c; a=4; b=5; c=a+b; printf("c=%d",c); return 0; }

And its assembler file generated by IDA, for 1.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)
File: 1.asm
File: 2.cCode:; Attributes: bp-based frame ; int __cdecl main(int argc,const char **argv,const char *envp) _main proc near var_18= dword ptr -18h var_14= dword ptr -14h var_10= dword ptr -10h var_C= dword ptr -0Ch var_8= dword ptr -8 var_4= dword ptr -4 argc= dword ptr 8 argv= dword ptr 0Ch envp= dword ptr 10h push ebp mov ebp, esp sub esp, 18h ; char * and esp, 0FFFFFFF0h mov eax, 0 add eax, 0Fh add eax, 0Fh shr eax, 4 shl eax, 4 mov [ebp+var_10], eax mov eax, [ebp+var_10] call sub_401730 call sub_4013D0 mov [ebp+var_4], 4 mov [ebp+var_8], 5 mov eax, [ebp+var_8] add eax, [ebp+var_4] mov [ebp+var_C], eax mov eax, [ebp+var_C] mov [esp+18h+var_14], eax mov [esp+18h+var_18], offset aCD ; "c=%d" call printf mov eax, 0 leave retn _main endp
PEiD Signature for 2.exeCode:#include <stdio.h> #include <conio.h> void main() { char *a; a = "hello there"; printf("%s",a); }

And its assembler file generated by IDA, for 2.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)
File: 2.asm
File: 3.cCode:; Attributes: bp-based frame ; int __cdecl main(int argc,const char **argv,const char *envp) _main proc near var_18= dword ptr -18h var_14= dword ptr -14h var_8= dword ptr -8 var_4= dword ptr -4 argc= dword ptr 8 argv= dword ptr 0Ch envp= dword ptr 10h push ebp mov ebp, esp sub esp, 18h ; char * and esp, 0FFFFFFF0h mov eax, 0 add eax, 0Fh add eax, 0Fh shr eax, 4 shl eax, 4 mov [ebp+var_8], eax mov eax, [ebp+var_8] call sub_401720 call sub_4013C0 mov [ebp+var_4], offset aHelloThere ; "hello there" mov eax, [ebp+var_4] mov [esp+18h+var_14], eax mov [esp+18h+var_18], offset aS ; "%s" call printf leave retn _main endp
PEiD Signature for 3.exeCode:#include <stdio.h> #include <conio.h> void main() { int b; b = 10; if (b>4) { char *a; a = "hello there"; printf("%s",a); } else { printf("false"); } }

And its assembler file generated by IDA, for 3.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)
File: 3.asm
For readability I will continue the explanation in new thread(s).Code:; Attributes: bp-based frame ; int __cdecl main(int argc,const char **argv,const char *envp) _main proc near var_18= dword ptr -18h var_14= dword ptr -14h var_C= dword ptr -0Ch var_8= dword ptr -8 var_4= dword ptr -4 argc= dword ptr 8 argv= dword ptr 0Ch envp= dword ptr 10h push ebp mov ebp, esp sub esp, 18h ; char * and esp, 0FFFFFFF0h mov eax, 0 add eax, 0Fh add eax, 0Fh shr eax, 4 shl eax, 4 mov [ebp+var_C], eax mov eax, [ebp+var_C] call sub_401740 call sub_4013E0 mov [ebp+var_4], 0Ah cmp [ebp+var_4], 4 jle short loc_4012E3 mov [ebp+var_8], offset aHelloThere ; "hello there" mov eax, [ebp+var_8] mov [esp+18h+var_14], eax mov [esp+18h+var_18], offset aS ; "%s" call printf jmp short locret_4012EF loc_4012E3: ; "false" mov [esp+18h+var_18], offset aFalse call printf locret_4012EF: leave retn _main endp
-
The Following User Says Thank You to nishant For This Useful Post:
abhaythehero (05-31-2012)
-
03-02-2012, 12:18 PM #2Security Researcher
- Join Date
- Oct 2010
- Location
- Bangalore
- Posts
- 14
- Thanks
- 2
- Thanked 5 Times in 4 Posts
Okay now if we closely analyse all the C source codes and their corresponding ASM files we seem to see a pattern in the generated ASM source codes which pretty much same all the time.
What we see is thatthe bold ASM instructions are repeating. And the first variable assignment in a program i.e.Code:var_18= dword ptr -18h var_14= dword ptr -14h var_10= dword ptr -10h var_C= dword ptr -0Ch var_8= dword ptr -8 var_4= dword ptr -4 argc= dword ptr 8 argv= dword ptr 0Ch envp= dword ptr 10h
in 1.c,Code:a=4;
in 2.c andCode:a = "hello there";
in 3.cCode:b = 10;
is translated by IDA as
the value is in HEX if integer or it stores the offset address in case of string.Code:mov [ebp+var_4], <value>
Next thing to note is that
are common to all three of the ASM source codes. And the actual program logic begins right after these instructions. I do understand that the 1st three lines i.e.Code:push ebp mov ebp, esp sub esp, 18h ; char * and esp, 0FFFFFFF0h mov eax, 0 add eax, 0Fh add eax, 0Fh shr eax, 4 shl eax, 4 mov [ebp+var_8], eax mov eax, [ebp+var_8] call sub_40**** call sub_40****
are for preparing the stack as the main() starts to execute and its the standard function prologue of the main() in C.Code:push ebp mov ebp, esp sub esp, 18h
And the
is the standard function epilogue which is there at the end of all the source code and the main() has completed execution and the control is logically returned to the kernel.Code:leave retn _main endp
I did this on Windows 7 x64 and I will try and see on different Windows versions and try to confirm if the pattern remains the same. In the meantime please clarify me if something I assumed is wrong or mistaken. And would request you to share any more information that you may have on the is topic.
Thanks
Nishant
-
The Following User Says Thank You to nishant For This Useful Post:
abhaythehero (05-31-2012)
-
03-05-2012, 07:42 AM #3Security Researcher


- Join Date
- Jul 2010
- Posts
- 254
- Blog Entries
- 2
- Thanks
- 181
- Thanked 141 Times in 73 Posts
Namaste
Yes this has been done earlier and is used to finger print the compilers in cracking process.
Every compiler has some specific code characteristics such as in case of VC++ 6.0 the very first function most of the times goes to a fixed offset, likewise in gcc it will overallocate the variables in certain versions and some other specific codes also in each and every compiler.
"vinnu"
-
The Following 3 Users Say Thank You to "vinnu" For This Useful Post:
abhaythehero (05-31-2012), nishant (03-05-2012), prashant_uniyal (06-01-2012)
-
03-05-2012, 11:00 AM #4Security Researcher
- Join Date
- Oct 2010
- Location
- Bangalore
- Posts
- 14
- Thanks
- 2
- Thanked 5 Times in 4 Posts
Thanks vinnu for the insight.
I hope some people here may still find it useful.
-
05-31-2012, 07:28 AM #5Super Commando Dhruv


- Join Date
- Sep 2010
- Location
- Lucknow/Pune,India
- Posts
- 470
- Blog Entries
- 2
- Thanks
- 170
- Thanked 144 Times in 83 Posts
On a related note, I have got an interesting link >>
check how GCC is compiling your C++/C++11 code GCC ExplorerIn the world of 0s and 1s, are you a zero or The One !
-
The Following User Says Thank You to abhaythehero For This Useful Post:
prashant_uniyal (06-01-2012)
LinkBacks (?)
-
#>_HACK SAFE: Bloodshed Dev-C++ 4.9.9.2 Compiler Analysis [Reversing Engineering Tips]
Refback This thread04-07-2012, 04:50 PM -
#>_HACK SAFE
Refback This thread04-03-2012, 04:27 PM -
Thread: Bloodshed Dev-C++ 4.9.9.2 Compiler Analysis [Reversing... | Share on LinkedIn
Refback This thread03-02-2012, 02:24 PM



1Likes
LinkBack URL
About LinkBacks



Reply With Quote

Poizon Web Exploiter 2.0
Yesterday, 10:34 PM in Tools & Scripts