+ Reply to Thread
Results 1 to 5 of 5
Like Tree3Likes
  • 3 Post By m0nna

Thread: Malicious PDF analysis Share/Save - My123World.Com!

  1. #1
    Garage Newcomer m0nna is on a distinguished road
    Join Date
    Sep 2011
    Posts
    21
    Blog Entries
    2
    Thanks
    3
    Thanked 16 Times in 6 Posts

    Malicious PDF analysis



    Part 1

    This post contains the analysis that i carried out on a malicious pdf file, exploiting a stack based overflow vulnerablity in the adobe acrobat reader. Before analyzing the malicious pdf, it is necessary to understand the structure of the pdf file, so before getting into the analysis i would like to explain the structure of pdf document. Also the PDF document can include images, fonts, text, javascript, flash and other content to dispay the document.


    PDF document structure
    In this document i'l not explain the full pdf structure, but i'l be explaining the parts that is required for the analysis. The pdf document starts with a header which looks like this %PDF-a.b.c.d, this is the version of the pdf language. Without this header the pdf readers will not accept it.
    The pdf document also consists of multiple indirect objects, an example of an indirect object is shown below, the indirect objects have an object id (1 in the below example) and version number (0 in the below example) follwed by the keyword obj, endobj marks the end of the object.

    Code:
    %PDF-1.1        <------ PDF header
    
    1 0 obj            <-------- Indirect object, 1 is the object id and 0 is the version number
    
    ........
    .....
    endobj        <------ This marks the end of object 1

    Inside the object there are a series of tags describing the contents of the object or reference to another object, In the below example, object 7 contains javascript (indicated by /Javascript tag) and the /JS tag indicates the content of the javascript, whereas object 31 has reference to javascript object (object id 34) indicated by /JS 34 0 R, the R stands for Reference. One of the object essential for analyzing malicious pdf files is the stream object. In the example below the object 34 contains stream object. A stream object contains a stream of data between the keyword stream and endstream. This data stream is often compressed and that is the reason it looks like meaningless data. In the example FlateDecode indicates that the data stream is compressed using zlib compression algorithm.
    Also there are different types of tags, for example: /JS and /javascript indicates javascript
    /Richmedia indicates flash,
    /AA, /OpenAction indicates an automatic action to be performed when the document is viewed.

    Code:
    7 0 obj                              <---- object 7
    <<
    /Type /Action
    /S /JavaScript                <---- javascript tag
    /JS <javascript code>      <---- javascript content
    ...............
    .............
    >>
    endobj
    
    
    obj 31 0                                    <----- object 31
    Type:
    Referencing: 34 0 R
    [(2, '<<'), (2, '/S'), (2, '/JavaScript'), (2, '/JS'), (1, ' '), (3, '34'), (1, ' '), (3, '0'), (1, ' '), (3, 'R'), (2, '>>'), (1, '\r')]
    <<
    /S /JavaScript
    /JS 34 0 R                          <------ reference to a javascript object, in this case object 34
    >>
    endobj
    
    
    34 0 obj<</Subtype/Type1C/Length 5416/Filter/FlateDecode
    >>stream                                                                     <--- stream object
    H‰|T}T#W#Ÿ!d&"FI#ʼnNFW#åC                      <---- compressed stream content, in this case zlib compressed data indicated by /FlatDecode
    …
    endstream
    endobj
    Continued
    Last edited by m0nna; 09-11-2011 at 05:04 PM.

  2. The Following User Says Thank You to m0nna For This Useful Post:

    "vinnu" (09-12-2011)

  3. #2
    Garage Newcomer m0nna is on a distinguished road
    Join Date
    Sep 2011
    Posts
    21
    Blog Entries
    2
    Thanks
    3
    Thanked 16 Times in 6 Posts
    Part 2

    Analysis

    Now we know the structure of the pdf document, lets analyze the malicious pdf sample. When i was look through my spam folder in my personal mail id, i got this sample, after submitting this sample to virustotal i found that this is a malicious sample. The virustotal report for the sample is shown below: The operating sytem used for the analysis is BackTrack 5 which is built on ubuntu linux.



    Uploaded with ImageShack.us

    Now that i knew that malicious pdf file is malicious, i decided to analyze this sample.
    First step was to look for the keywords in the malicious pdf document, to do that i used pdfid.py, The below screenshot shows the output of pdfid. Looking at the output you can see that pdf document contains only 1 page (marked by /Page tag) , 2 javascript tags (marked by /JavaScript tag) and 1 javascript content (marked by /JS)



    Uploaded with ImageShack.us

    javascript is mostly used to exploit vulnerabilites in the pdf document, and also the javascript interpreter (modified version of spidermonkey) used by pdf itself was to know to have multiple vulnerabilities. Now that there is presence of javascript tag in the pdf document, i decided to search for the javascript object using the tool “pdf-parser.py”, this tool can parse PDF constructs used in malicious pdf files. The search for javascript using pdf-parser returned two objects as show in the below screenshot



    Uploaded with ImageShack.us

    From the above screenshot you can see that object 3 references object 5 (marked by /JavasScript 5 0 R) which is a javascript object and object 6 references object 111611 which is also javascript object (marked by /JS 111611).
    In this case the relation ship between both the objects are not clear. So i looked at object 5 using pdf-parser, looking at the output below you can see that object 5 refernces object 6 (which in turn referenced object 111611) in the above screenshot.



    Uploaded with ImageShack.us

    Looking at object 111611 shows that this object contains data stream (indicated by stream keyword) and the data stream is zlib compressed indicated by (/FlatDecode), this indicates that this is the object which contains javascript code.



    Uploaded with ImageShack.us

    Now the relationship between the objects are clear. Obj 3 references obj 5 which in turn references obj 6 which references object 111611, this is a technique used to confuse the security researchers.


    obj 3 --> obj 5 --> obj6 --> obj 111611

    Now that i know object 111611 contains the javascript code, next step is to extract the javascript code from object 11161.

    Continued
    Last edited by m0nna; 09-11-2011 at 05:37 PM.

  4. The Following User Says Thank You to m0nna For This Useful Post:

    "vinnu" (09-12-2011)

  5. #3
    Garage Newcomer m0nna is on a distinguished road
    Join Date
    Sep 2011
    Posts
    21
    Blog Entries
    2
    Thanks
    3
    Thanked 16 Times in 6 Posts
    Part 3

    Extracting Javascript

    To extract javascript for object first we need to decompress the zlib compressed data, only then the javascript would make sense. Pdf-parser.py was used to decompress and extract the javascript code. The below screenshots show the pdf-parser otuput.



    Uploaded with ImageShack.us

    The below screenshot shows just the extracted javascript code. But again this code is obfuscated.



    Uploaded with ImageShack.us

    After beautifying the data, the output looks little better , but still obfuscated as show in the below screenshot, the obfuscation technique is used to confuse the analyts and to prevent the security devices from detecting.



    Uploaded with ImageShack.us

    When you look at the javascript code above you will find that all the text in blue (marked by /* and */) is a multiline comment in javascript, removing that will not affect the javascript code in anyway, this is just there to confuse the analysts. After deleting all the multiline comment you will find that the javascript now makes sense as shown in the below screenshost



    Uploaded with ImageShack.us

    now we have the javascript, executing this code with spidermonkey gives an error, as shown below: The reason for this is javascript is trying to access “this.creator” which is adobe specific object, In this case the attacker has split the javascript across multiple objects (multi stage), this is again another technique used to confuse the analysts, this.creator references the creator object (marked by /Creator tag in the pdf document). This /Creator tag usually contains the string which identifies the creator of the pdf document.



    Uploaded with ImageShack.us

    Continued

  6. The Following User Says Thank You to m0nna For This Useful Post:

    "vinnu" (09-12-2011)

  7. #4
    Garage Newcomer m0nna is on a distinguished road
    Join Date
    Sep 2011
    Posts
    21
    Blog Entries
    2
    Thanks
    3
    Thanked 16 Times in 6 Posts
    Part 4

    Searching for the creator object in pdf document using pdf-parser showed that object 8 contians the creator info, but in this case instead of the string which identifies the creator, it contains encoded javascript code.(which is the second stage code)
    The below screenshot shows the extracted content from the creator object.



    Uploaded with ImageShack.us

    After extracting this encoded content, i added this encoded content into the first stage javascript (created a variable called this.creator with encoded javascript as value), the below screenshot shows the modified javascript.



    Uploaded with ImageShack.us

    In this above screeshot, unescape function is used to decode the encoded data, in this case it is replacing “z” from the content of this.creator and replacing with “%” sign and decoding the data, and eval is used to execute the decoded data, so now we need to make sure that instead of executing the data, it has to print the data so that we can see the decoded content.....there are multiple ways to do this, one way is to add “eval=print;” at the beginning to overide the funcionality of eval, After executing the script with spidermonkey we get the decoded javascript as shown below:



    Uploaded with ImageShack.us

    In the above code you can see that exploit target the vulnerability in the util.printf() (CVE-2008-2992) function by passing a big number as the second parameter to the function and you can also see the buffer being allocated and the heapy spray code prior to exploiting the vulnerability and you can also see the shellcode in the unescape funtion.

    Now we know the javascript exploit the vulnerability in the util.printf() function and place the shellcode in the created buffer and executes it after exploiting the vulnerability.

    Continued

  8. The Following User Says Thank You to m0nna For This Useful Post:

    "vinnu" (09-12-2011)

  9. #5
    Garage Newcomer m0nna is on a distinguished road
    Join Date
    Sep 2011
    Posts
    21
    Blog Entries
    2
    Thanks
    3
    Thanked 16 Times in 6 Posts
    Part 5

    Shellcode analysis

    Now its time to analyze the shellcode to know its capabilities, the below screeshot shows the extracted shellcode



    Uploaded with ImageShack.us

    the shellcode was converted to exe for further analysis, the below screenshot shows the conversion, the converted exe is called converted_shellcode.exe



    Uploaded with ImageShack.us

    looks at the strings and opening the convered exe in a debugger/hexeditor, you can say that the shellcode is used as a downloader, in the below screenshots you can also see referecnes to a malicious website (grinchalina8.com), the google search on that website shows that this site is malicious and also you can see references to api calls which is used by downloader and also reference to an exe (pdfupd.exe).
    In this case the “LoadLibraryA” api call is used to load urlmon.dll, then uses the api call “UrlDownloadToFileA” to download the exe file “pdfupd.exe” and then it uses “WinExec” api call to execute the downloaded executable.



    Uploaded with ImageShack.us



    Uploaded with ImageShack.us



    Uploaded with ImageShack.us

    With this i complete my analysis, i hope this will help beginners like me understand the analysis of malicious pdf document. :-)
    AnArKI, prashant_uniyal and neo like this.

  10. The Following 7 Users Say Thank You to m0nna For This Useful Post:

    "vinnu" (09-12-2011), abhaythehero (09-13-2011), AnArKI (09-11-2011), neo (09-12-2011), nop (09-24-2011), prashant_uniyal (09-11-2011), Punter (09-12-2011)

LinkBacks (?)

  1. 09-14-2011, 11:58 PM
  2. 09-11-2011, 09:03 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts