Analysis
Now we know the structure of the pdf document, lets analyze the malicious pdf sample. When i was look through my spam folder in my personal mail id, i got this sample, after submitting this sample to virustotal i found that this is a malicious sample. The virustotal report for the sample is shown below: The operating sytem used for the analysis is BackTrack 5 which is built on ubuntu linux.

Uploaded with ImageShack.us
Now that i knew that malicious pdf file is malicious, i decided to analyze this sample.
First step was to look for the keywords in the malicious pdf document, to do that i used pdfid.py, The below screenshot shows the output of pdfid. Looking at the output you can see that pdf document contains only 1 page (marked by /Page tag) , 2 javascript tags (marked by /JavaScript tag) and 1 javascript content (marked by /JS)

Uploaded with ImageShack.us
javascript is mostly used to exploit vulnerabilites in the pdf document, and also the javascript interpreter (modified version of spidermonkey) used by pdf itself was to know to have multiple vulnerabilities. Now that there is presence of javascript tag in the pdf document, i decided to search for the javascript object using the tool “pdf-parser.py”, this tool can parse PDF constructs used in malicious pdf files. The search for javascript using pdf-parser returned two objects as show in the below screenshot

Uploaded with ImageShack.us
From the above screenshot you can see that object 3 references object 5 (marked by /JavasScript 5 0 R) which is a javascript object and object 6 references object 111611 which is also javascript object (marked by /JS 111611).
In this case the relation ship between both the objects are not clear. So i looked at object 5 using pdf-parser, looking at the output below you can see that object 5 refernces object 6 (which in turn referenced object 111611) in the above screenshot.

Uploaded with ImageShack.us
Looking at object 111611 shows that this object contains data stream (indicated by stream keyword) and the data stream is zlib compressed indicated by (/FlatDecode), this indicates that this is the object which contains javascript code.

Uploaded with ImageShack.us
Now the relationship between the objects are clear. Obj 3 references obj 5 which in turn references obj 6 which references object 111611, this is a technique used to confuse the security researchers.
obj 3 --> obj 5 --> obj6 --> obj 111611
Now that i know object 111611 contains the javascript code, next step is to extract the javascript code from object 11161.
Continued
Malicious PDF analysis Tutorial Part 1
Malicious PDF analysis Tutorial Part 2
Malicious PDF analysis Tutorial Part 3 Extracting Javascript
Malicious PDF analysis Tutorial Part 4
Part 5 Malicious PDF analysis Tutorial Shellcode analysis



Menu
Recent Blog Posts





DEP ASLR bypass without ROP JIT : CanSecWest2013 Slides and Analysis
I have my own talk from CanSecwest to blog about but this one is more interesting and the most...
fb1h2s 03-08-2013 05:03 AM