Tuesday, March 15, 2011

Parsing Malware XML - By Riley Porter

So AMA's XML parsing was tossing an error as of 2 days ago. It turns out that
there were some invalid characters in the url section of the malware stream. This is nothing
new, as a few months back I noticed this same sort of thing. How I got around it last time was, just walking every line
before we processed the XML and re-wrote the file to disk. Like so:


for line in xmlfile.readlines():
tmpfile.writelines(filter(lambda x: x in string.printable,line))
tmpfile.close()



Why is this breaking again? Continue down the rabbit hole. Here is the
offending XML:




(note that blogger is stripping some of this content but you get the idea)
So the crazy thing is the URL element actually worked! If you were to
paste that into a browser it would have downloaded a file.
Note I have made http:// into hxxp:\\ in order to not make it an auto
link. So what is this link to? It looks like its a copy of "DarkComet Rat".
DarkComet RAT is a Remote Administration Tool (or known as a
RAT in the Malware Community). It is a zip file too so it's not super
dangerous inherently. However, there are executables inside the archive that I have
NOT examined or tested. Best not mess with them. However, here is the
link for the RAT tools webpage:
http://www.darkcomet-rat.com/

The question still remains why is our xml parser dying on this url? As it turns out the
bytes 0x0a and 0x0d are in string.printable.
We were seeing some sneaky malware authors placing the 0x0d (form feed!) in the urls. When these bytes are inserted
into a browser URL bar they are stripped right out.
However xml parsers and python file.readlines() interpreted it as a
new line which in turn broke the parsing engine! So anyhow heres the
code should anyone want to see it.



"""A bit of a hack... However it works. Some malware authors
were innovating with the use of non-printable characters a few months back.
filtering based of string.printable() fixed most of this until today. 1-17-2011
It appears the 0x\0a - 0x\0d are "PRINTABLE". What we were seeing is form feed
inserted into our xml url streams. This caused a new line in the xml and then
an invalid xml document as the tag was not closed. This code fixes that issue.
"""
for y in xmlfile.read():
try:
if iscntrl(y) and y != "\n":
y = ""
else:
pass
except:
pass
tmpfile.write(filter(lambda x: x in string.printable, y))
tmpfile.close()


With that code added to our pre-processor class our xml is now 'clean' enough to be parsed.

To see a picture of the hex in action go here.

-Riley

P.S. If you would like to discuss this post further email Riley at riley (dot) porter (at) g2-inc (dot) com.

1 comment:

  1. Those are scripts that share the common strain with XML based malware signatures. I think I can experiment with these scripts.
    anti spam service

    ReplyDelete