+ Reply to Thread
Results 1 to 5 of 5
Like Tree1Likes
  • 1 Post By abhaythehero

Thread: Bloodshed Dev-C++ 4.9.9.2 Compiler Analysis [Reversing Engineering Tips] Share/Save - My123World.Com!

  1. #1
    Security Researcher nishant is on a distinguished road
    Join Date
    Oct 2010
    Location
    Bangalore
    Posts
    14
    Thanks
    2
    Thanked 5 Times in 4 Posts

    Lightbulb Bloodshed Dev-C++ 4.9.9.2 Compiler Analysis [Reversing Engineering Tips]



    Okay folks,

    This is one idea that came to my mind. Its pretty simple, I'm trying to write some sample programs in a particular compiler, this time it is Bloodshed Dev-C++ 4.9.9.2, and then analyse the compiled binary in IDA to figure out the code that compiler adds to the binary automatically and create a pattern out of it (if possible) through multiple analysis. This IMHO shall definitely help while reversing binaries that are compiled with that particular compiler so that the reverse engineer can save some time ignoring the unnecessary codes. I don't know if this has been done before or there are any tools that do this but its just my try. Hope it may be useful to some of you.

    So lets get started. I have 3 C programs.

    File: 1.c

    Code:
    #include <stdio.h>
    #include <conio.h>
    
    int main()
    {
         int a,b,c;
         a=4;
         b=5;
         c=a+b;
         printf("c=%d",c);
         return 0;
    }
    PEiD Signature for 1.exe

    Name:  1.jpg
Views: 1525
Size:  19.8 KB

    And its assembler file generated by IDA, for 1.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)

    File: 1.asm

    Code:
    ; Attributes: bp-based frame
    
    ; int __cdecl main(int argc,const char **argv,const char *envp)
    _main proc near
    
    var_18=	dword ptr -18h
    var_14=	dword ptr -14h
    var_10=	dword ptr -10h
    var_C= dword ptr -0Ch
    var_8= dword ptr -8
    var_4= dword ptr -4
    argc= dword ptr	 8
    argv= dword ptr	 0Ch
    envp= dword ptr	 10h
    
    push	ebp
    mov	ebp, esp
    sub	esp, 18h	; char *
    and	esp, 0FFFFFFF0h
    mov	eax, 0
    add	eax, 0Fh
    add	eax, 0Fh
    shr	eax, 4
    shl	eax, 4
    mov	[ebp+var_10], eax
    mov	eax, [ebp+var_10]
    call	sub_401730
    call	sub_4013D0
    mov	[ebp+var_4], 4
    mov	[ebp+var_8], 5
    mov	eax, [ebp+var_8]
    add	eax, [ebp+var_4]
    mov	[ebp+var_C], eax
    mov	eax, [ebp+var_C]
    mov	[esp+18h+var_14], eax
    mov	[esp+18h+var_18], offset aCD ; "c=%d"
    call	printf
    mov	eax, 0
    leave
    retn
    _main endp
    File: 2.c

    Code:
    #include <stdio.h>
    #include <conio.h>
    
    void main()
    {
              char *a;
              a = "hello there";
              printf("%s",a);  
    }
    PEiD Signature for 2.exe

    Name:  2.jpg
Views: 1171
Size:  19.3 KB

    And its assembler file generated by IDA, for 2.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)

    File: 2.asm

    Code:
    ; Attributes: bp-based frame
    
    ; int __cdecl main(int argc,const char **argv,const char *envp)
    _main proc near
    
    var_18=	dword ptr -18h
    var_14=	dword ptr -14h
    var_8= dword ptr -8
    var_4= dword ptr -4
    argc= dword ptr	 8
    argv= dword ptr	 0Ch
    envp= dword ptr	 10h
    
    push	ebp
    mov	ebp, esp
    sub	esp, 18h	; char *
    and	esp, 0FFFFFFF0h
    mov	eax, 0
    add	eax, 0Fh
    add	eax, 0Fh
    shr	eax, 4
    shl	eax, 4
    mov	[ebp+var_8], eax
    mov	eax, [ebp+var_8]
    call	sub_401720
    call	sub_4013C0
    mov	[ebp+var_4], offset aHelloThere	; "hello there"
    mov	eax, [ebp+var_4]
    mov	[esp+18h+var_14], eax
    mov	[esp+18h+var_18], offset aS ; "%s"
    call	printf
    leave
    retn
    _main endp
    File: 3.c

    Code:
    #include <stdio.h>
    #include <conio.h>
    void main()
    {
         int b;
         b = 10;
         if (b>4)
         {
              char *a;
              a = "hello there";
              printf("%s",a);   
         }
         else
         {
             printf("false");
         }
    }
    PEiD Signature for 3.exe

    Name:  3.jpg
Views: 1166
Size:  19.3 KB

    And its assembler file generated by IDA, for 3.exe (NOTE: This not the entire ASM file but only the extracted IDA View, which is afterall all we are interested in)

    File: 3.asm

    Code:
    ; Attributes: bp-based frame
    
    ; int __cdecl main(int argc,const char **argv,const char *envp)
    _main proc near
    
    var_18=	dword ptr -18h
    var_14=	dword ptr -14h
    var_C= dword ptr -0Ch
    var_8= dword ptr -8
    var_4= dword ptr -4
    argc= dword ptr	 8
    argv= dword ptr	 0Ch
    envp= dword ptr	 10h
    
    push	ebp
    mov	ebp, esp
    sub	esp, 18h	; char *
    and	esp, 0FFFFFFF0h
    mov	eax, 0
    add	eax, 0Fh
    add	eax, 0Fh
    shr	eax, 4
    shl	eax, 4
    mov	[ebp+var_C], eax
    mov	eax, [ebp+var_C]
    call	sub_401740
    call	sub_4013E0
    mov	[ebp+var_4], 0Ah
    cmp	[ebp+var_4], 4
    jle	short loc_4012E3
    mov	[ebp+var_8], offset aHelloThere	; "hello there"
    mov	eax, [ebp+var_8]
    mov	[esp+18h+var_14], eax
    mov	[esp+18h+var_18], offset aS ; "%s"
    call	printf
    jmp	short locret_4012EF
    
    loc_4012E3:		; "false"
    mov	[esp+18h+var_18], offset aFalse
    call	printf
    
    locret_4012EF:
    leave
    retn
    _main endp
    For readability I will continue the explanation in new thread(s).

  2. The Following User Says Thank You to nishant For This Useful Post:

    abhaythehero (05-31-2012)

  3. #2
    Security Researcher nishant is on a distinguished road
    Join Date
    Oct 2010
    Location
    Bangalore
    Posts
    14
    Thanks
    2
    Thanked 5 Times in 4 Posts
    Okay now if we closely analyse all the C source codes and their corresponding ASM files we seem to see a pattern in the generated ASM source codes which pretty much same all the time.

    What we see is that
    Code:
    var_18=	dword ptr -18h
    var_14=	dword ptr -14h
    var_10=	dword ptr -10h
    var_C= dword ptr -0Ch
    var_8= dword ptr -8
    var_4= dword ptr -4
    argc= dword ptr	 8
    argv= dword ptr	 0Ch
    envp= dword ptr	 10h
    the bold ASM instructions are repeating. And the first variable assignment in a program i.e.

    Code:
    a=4;
    in 1.c,
    Code:
    a = "hello there";
    in 2.c and
    Code:
    b = 10;
    in 3.c

    is translated by IDA as

    Code:
    mov	[ebp+var_4], <value>
    the value is in HEX if integer or it stores the offset address in case of string.

    Next thing to note is that

    Code:
    push	ebp
    mov	ebp, esp
    sub	esp, 18h	; char *
    and	esp, 0FFFFFFF0h
    mov	eax, 0
    add	eax, 0Fh
    add	eax, 0Fh
    shr	eax, 4
    shl	eax, 4
    mov	[ebp+var_8], eax
    mov	eax, [ebp+var_8]
    call	sub_40****
    call	sub_40****
    are common to all three of the ASM source codes. And the actual program logic begins right after these instructions. I do understand that the 1st three lines i.e.

    Code:
    push	ebp
    mov	ebp, esp
    sub	esp, 18h
    are for preparing the stack as the main() starts to execute and its the standard function prologue of the main() in C.

    And the

    Code:
    leave
    retn
    _main endp
    is the standard function epilogue which is there at the end of all the source code and the main() has completed execution and the control is logically returned to the kernel.

    I did this on Windows 7 x64 and I will try and see on different Windows versions and try to confirm if the pattern remains the same. In the meantime please clarify me if something I assumed is wrong or mistaken. And would request you to share any more information that you may have on the is topic.

    Thanks
    Nishant

  4. The Following User Says Thank You to nishant For This Useful Post:

    abhaythehero (05-31-2012)

  5. #3
    Security Researcher "vinnu" is a jewel in the rough"vinnu" is a jewel in the rough"vinnu" is a jewel in the rough "vinnu"'s Avatar
    Join Date
    Jul 2010
    Posts
    254
    Blog Entries
    2
    Thanks
    181
    Thanked 141 Times in 73 Posts
    Namaste
    Yes this has been done earlier and is used to finger print the compilers in cracking process.
    Every compiler has some specific code characteristics such as in case of VC++ 6.0 the very first function most of the times goes to a fixed offset, likewise in gcc it will overallocate the variables in certain versions and some other specific codes also in each and every compiler.

    "vinnu"

  6. The Following 3 Users Say Thank You to "vinnu" For This Useful Post:

    abhaythehero (05-31-2012), nishant (03-05-2012), prashant_uniyal (06-01-2012)

  7. #4
    Security Researcher nishant is on a distinguished road
    Join Date
    Oct 2010
    Location
    Bangalore
    Posts
    14
    Thanks
    2
    Thanked 5 Times in 4 Posts
    Thanks vinnu for the insight. I hope some people here may still find it useful.

  8. #5
    Super Commando Dhruv abhaythehero has a spectacular aura aboutabhaythehero has a spectacular aura aboutabhaythehero has a spectacular aura about abhaythehero's Avatar
    Join Date
    Sep 2010
    Location
    Lucknow/Pune,India
    Posts
    470
    Blog Entries
    2
    Thanks
    170
    Thanked 144 Times in 83 Posts
    On a related note, I have got an interesting link >>

    check how GCC is compiling your C++/C++11 code GCC Explorer
    fb1h2s likes this.
    In the world of 0s and 1s, are you a zero or The One !

  9. The Following User Says Thank You to abhaythehero For This Useful Post:

    prashant_uniyal (06-01-2012)


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts