• Malicious PDF analysis

    Part 1

    This post contains the analysis that i carried out on a malicious pdf file, exploiting a stack based overflow vulnerablity in the adobe acrobat reader. Before analyzing the malicious pdf, it is necessary to understand the structure of the pdf file, so before getting into the analysis i would like to explain the structure of pdf document. Also the PDF document can include images, fonts, text, javascript, flash and other content to dispay the document.


    PDF document structure
    In this document i'l not explain the full pdf structure, but i'l be explaining the parts that is required for the analysis. The pdf document starts with a header which looks like this %PDF-a.b.c.d, this is the version of the pdf language. Without this header the pdf readers will not accept it.
    The pdf document also consists of multiple indirect objects, an example of an indirect object is shown below, the indirect objects have an object id (1 in the below example) and version number (0 in the below example) follwed by the keyword obj, endobj marks the end of the object.

    Code:
    %PDF-1.1        <------ PDF header
    
    1 0 obj            <-------- Indirect object, 1 is the object id and 0 is the version number
    
    ........
    .....
    endobj        <------ This marks the end of object 1
    Inside the object there are a series of tags describing the contents of the object or reference to another object, In the below example, object 7 contains javascript (indicated by /Javascript tag) and the /JS tag indicates the content of the javascript, whereas object 31 has reference to javascript object (object id 34) indicated by /JS 34 0 R, the R stands for Reference. One of the object essential for analyzing malicious pdf files is the stream object. In the example below the object 34 contains stream object. A stream object contains a stream of data between the keyword stream and endstream. This data stream is often compressed and that is the reason it looks like meaningless data. In the example FlateDecode indicates that the data stream is compressed using zlib compression algorithm.
    Also there are different types of tags, for example: /JS and /javascript indicates javascript
    /Richmedia indicates flash,
    /AA, /OpenAction indicates an automatic action to be performed when the document is viewed.

    Code:
    7 0 obj                              <---- object 7
    <<
    /Type /Action
    /S /JavaScript                <---- javascript tag
    /JS <javascript code>      <---- javascript content
    ...............
    .............
    >>
    endobj
    
    
    obj 31 0                                    <----- object 31
    Type:
    Referencing: 34 0 R
    [(2, '<<'), (2, '/S'), (2, '/JavaScript'), (2, '/JS'), (1, ' '), (3, '34'), (1, ' '), (3, '0'), (1, ' '), (3, 'R'), (2, '>>'), (1, '\r')]
    <<
    /S /JavaScript
    /JS 34 0 R                          <------ reference to a javascript object, in this case object 34
    >>
    endobj
    
    
    34 0 obj<</Subtype/Type1C/Length 5416/Filter/FlateDecode
    >>stream                                                                     <--- stream object
    H‰|T}T#W#Ÿ!d&"FI#ʼnNFW#åC                      <---- compressed stream content, in this case zlib compressed data indicated by /FlatDecode
    …
    endstream
    endobj
    Continued

    Malicious PDF analysis Tutorial Part 1
    Malicious PDF analysis Tutorial Part 2
    Malicious PDF analysis Tutorial Part 3 Extracting Javascript
    Malicious PDF analysis Tutorial Part 4
    Part 5 Malicious PDF analysis Tutorial Shellcode analysis
    AnArKI and c1ph3r like this.
    This article was originally published in forum thread: Malicious PDF analysis started by m0nna View original post
    Comments 1 Comment
    1. jimmight's Avatar
      jimmight -


      The more I learn about boxing,the more I realise that I know nothing at all about the subject. True story.