Headline
CVE-2016-8387: TALOS-2016-0212 || Cisco Talos Intelligence Group
An exploitable heap-based buffer overflow exists in Iceni Argus. When it attempts to convert a malformed PDF with an object encoded w/ multiple encoding types terminating with an LZW encoded type, an overflow may occur due to a lack of bounds checking by the LZW decoder. This can lead to code execution under the context of the account of the user running it.
Summary
An exploitable heap-based buffer overflow exists in Iceni Argus. When it attempts to convert a malformed PDF with an object encoded w/ multiple encoding types terminating with an LZW encoded type, an overflow may occur due to a lack of bounds checking by the LZW decoder. This can lead to code execution under the context of the account of the user running it.
Tested Versions
Iceni Argus Version 6.6.04 (Sep 7 2012) NK
Product URLs
http://www.iceni.com/legacy.htm
CVSSv3 Score
8.8 - CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Details
This is a heap-based buffer overflow that occurs in Iceni Argus. This tool is used primarily by MarkLogic Server to convert PDF files to (X)HTML form. While decoding an encoded object that is encoded within a PDF with more than one encoding type where one of them is LZW, the tool will call the ipLZWFeedCreate function to initialize the decoder for an object that is LZW encoded. The object is created by first allocated a 0x545c byte buffer which will set aside 0x1000 bytes of space at the end of it for decoding. This is assigned as a pointer which is written into an object at line 0x80cb2b8.
80cb1d7: c7 04 24 5c 54 00 00 movl $0x545c,(%esp) ; size
80cb1de: 8d 83 a4 5b 49 ff lea -0xb6a45c(%ebx),%eax
80cb1e4: 89 44 24 04 mov %eax,0x4(%esp) ; name
80cb1e8: e8 83 63 0d 00 call 81a1570 <icnMalloc>
80cb1ed: 89 c6 mov %eax,%esi
...
80cb2ad: 8d 86 54 44 00 00 lea 0x4454(%esi),%eax ; 0x1000 bytes
80cb2b3: ba 01 00 00 00 mov $0x1,%edx
80cb2b8: 89 86 54 54 00 00 mov %eax,0x5454(%esi) ; XXX: pointer
Within the same function, ipLZWFeedCreate, the constructor will initialize space for an array containing the decoding-table/code-dictionary. Each entry is initialized with 0x101 (end-of-data) for the value along with an index. This is done all the way up to index 0x100.
80cb303: 66 c7 86 3c 02 00 00 movw $0x101,0x23c(%esi) ; end-of-data constant
80cb30a: 01 01
80cb30c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
80cb310: 66 89 50 02 mov %dx,0x2(%eax)
80cb314: 83 c2 01 add $0x1,%edx
80cb317: 66 c7 00 01 01 movw $0x101,(%eax)
80cb31c: 83 c0 04 add $0x4,%eax
80cb31f: 81 fa 00 01 00 00 cmp $0x100,%edx ; loop 256 times
80cb325: 75 e9 jne 80cb310 <ipLZWFeedCreate+0x150>
80cb327: 66 c7 86 40 42 00 00 movw $0x102,0x4240(%esi)
Inside the following loop, the application will look through the code-dictionary and grab each index along with its respective value. This loop will only terminate if the specified index points to an end-of-data entry. If during decoding the end-of-data entry is not found combined with the loop iterating more than 0x1000 times, a buffer overflow can be made to occur due to a missing boundary check for terminating of the loop.
80cb587: 8d 96 54 44 00 00 lea 0x4454(%esi),%edx ; beginning of 0x1000 byte buffer
80cb58d: 89 55 f0 mov %edx,-0x10(%ebp) ; write pointer
...
80cb706: 0f b7 55 da movzwl -0x26(%ebp),%edx ; starting index
80cb70a: 66 39 96 40 42 00 00 cmp %dx,0x4240(%esi)
80cb711: 0f 8e be 00 00 00 jle 80cb7d5 <loadLZWBuffer+0x3f5>
80cb717: 89 d0 mov %edx,%eax
80cb719: 0f bf d0 movswl %ax,%edx
...
80cb728: 8b 4d f0 mov -0x10(%ebp),%ecx ; write pointer
80cb72b: 0f b7 84 96 3e 02 00 movzwl 0x23e(%esi,%edx,4),%eax
80cb732: 00
80cb733: 88 01 mov %al,(%ecx) ; XXX: crash
80cb735: 0f bf 94 96 3c 02 00 movswl 0x23c(%esi,%edx,4),%edx
80cb73c: 00
80cb73d: 83 c1 01 add $0x1,%ecx
80cb740: 89 4d f0 mov %ecx,-0x10(%ebp)
80cb743: 66 81 bc 96 3c 02 00 cmpw $0x101,0x23c(%esi,%edx,4) ; check if index points at end-of-data
80cb74a: 00 01 01
80cb74d: 75 d9 jne 80cb728 <loadLZWBuffer+0x348>
Crash Information
$ gdb --quiet --args /opt/MarkLogic/converters/cvtpdf/convert ~/config/
Reading symbols from /opt/MarkLogic/Converters/cvtpdf/convert...done.
(gdb) r
Starting program: /opt/MarkLogic/Converters/cvtpdf/convert /home/user/config/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Loading configuration...
Parsing macros...
Macro synth-bookmarks='true'
Macro image-output='true'
Macro text-output='true'
Macro zones='false'
Macro ignore-text='true'
Macro remove-overprint='false'
Macro illustrations='true'
Macro line-breaks='true'
Macro image-quality='75'
Macro page-start=''
Macro page-end=''
Macro document-start=''
Macro document-end=''
features='11140221'
Processing...
Analysing '/home/user/poc.pdf'
Pages 1 to 1
Processing page 1
Catchpoint 4 (signal SIGSEGV), 0x080cb733 in loadLZWBuffer ()
(gdb) bt 5
#0 0x080cb733 in loadLZWBuffer ()
#1 0x080cb9a9 in ipLZWFeedRead ()
#2 0x08084cd7 in ipDataFeedRead ()
#3 0x08257174 in loadFlateBuffers ()
#4 0x0825751b in ipFlateFeedRead ()
(More stack frames follow...)
(gdb) h
-=[registers]=-
[eax: 0x0000004d] [ebx: 0x08f57000] [ecx: 0x09abd000] [edx: 0x00000117]
[esi: 0x09a794d0] [edi: 0x00000009] [esp: 0xfffbf8e0] [ebp: 0xfffbf918]
[eflags: NZ SF OF NC ND NI]
-=[stack]=-
fffbf8e0 | 09a5e7c8 f7fd8000 09a7970c 09a7d719 | ................
fffbf8f0 | 0117b2f8 09a7d924 09a7d919 00000009 | ....$...........
fffbf900 | 098eb38c 09a7d84d 09abd000 08f57000 | ....M........p..
fffbf910 | 00000000 09a794d0 fffbf948 080cb9a9 | ........H.......
-=[disassembly]=-
=> 0x80cb733 <loadLZWBuffer+851>: mov %al,(%ecx)
0x80cb735 <loadLZWBuffer+853>: movswl 0x23c(%esi,%edx,4),%edx
0x80cb73d <loadLZWBuffer+861>: add $0x1,%ecx
0x80cb740 <loadLZWBuffer+864>: mov %ecx,-0x10(%ebp)
0x80cb743 <loadLZWBuffer+867>: cmpw $0x101,0x23c(%esi,%edx,4)
0x80cb74d <loadLZWBuffer+877>: jne 0x80cb728 <loadLZWBuffer+840>
Timeline
2016-10-10 - Vendor Disclosure
2017-02-27 - Public Release
Discovered by Marcin Noga of Cisco Talos.