Intro

ClamAV is a lightweight and open-source antivirus solution capable of many things. One popular use is as a mail/attachment scanner, while another useful implementation is in Cuckoo Sandbox to offer additional detection and data points. While there is a ruleset available directly from ClamAV/Cisco, there are also sevreral other feeds of ClamAV rules, such as SaneSecurity's offering. I recommend using several signature databases when using ClamAV to ensure the most coverage.

This tutorial is not about implementing ClamAV or running it in an organization. This is simply a basic guide to help an analyst kickstart writing ClamAV signatures for commonly observed threats. Before getting into anything, I highly recommend you grab a copy of the user manual from here as it will contain much more verbose information than what I will go over here.

Environment

First, grab ClamAV 0.99 and make sure it is installed on your lab environment. Latest version can be found here. Once you have ClamAV 0.99 installed, check the version by using the command clamscan -V which should return something like:

ClamAV 0.99/21475/Fri Mar 25 17:40:45 2016.  

Secondly, grab a copy of oletools and get that set up to use in our test environment. Oletools can be downloaded here. This is an incredible toolset and will help greatly in extracting malicious macros we want to look at.

For ease, in my environment I have created two directories, one called "sigs" and one called "samples". These will serve as our two working directories for building and testing. This can be called whatever you want and placed wherever you want. Totally up to you.

Hybrid-Analysis will serve as our test ground, and this document will be what we sig. Please download this sample and place it into the "samples" directory (or wherever on your test environment). Hybrid-Analysis has excellent references for writing these signatures in their platform, so I highly recommend checking out the report for hints. As of now, (March 26th 2016), there is not an official ClamAV signature that is hitting on this.

$ clamscan e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin
e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin: OK

----------- SCAN SUMMARY -----------
Known viruses: 4297365  
Engine version: 0.99  
Scanned directories: 0  
Scanned files: 1  
Infected files: 0  
Data scanned: 0.07 MB  
Data read: 0.04 MB (ratio 1.80:1)  
Time: 7.631 sec (0 m 7 s)  

Working with ClamAV

In your test environment, it will be useful to have a local set of rules you use for testing and tweaking. This will make troubleshooting and keeping track of what you are working on much easier. In my "sigs" folder I have created a file called "local-rules.ldb". ClamAV has several types of database rule types:

  • ldb
    • Logical signatures
      • Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and flexible pattern matching.


  • hdb
    • Hash-based Signatures
      • The easiest way to create signatures for ClamAV is to use filehash checksums, however this method can be only used against static malware.


  • hsb
    • SHA1 and SHA256 hash-based signatures


  • mdb
    • PE section based hash signatures

Scanning files with our rules is as easy as adding the "-d" flag and providng the path to the ruleset you would like to scan with. To do a quick test, we can create a quick hash based signature for our doc to ensure our setup is working properly:

samples$ sigtool --md5 e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin >> ../sigs/local-test.hdb  

Which creates a signature like:

sigs$ cat local-test.hdb  
a8b99ab2a14781acd15d7b012acfcaa9:44544:e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin  

And the result from scanning with our new signature:

sigs$ clamscan -d local-test.hdb ../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin  
../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin: e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin.UNOFFICIAL FOUND
----------- SCAN SUMMARY -----------
Known viruses: 1  
Engine version: 0.99  
Scanned directories: 0  
Scanned files: 1  
Infected files: 1  
Data scanned: 0.07 MB  
Data read: 0.04 MB (ratio 1.80:1)  
Time: 0.011 sec (0 m 0 s)  

Writing a Logical Signature

While hash based signatures are okay, I think we can all agree that it is not a very scalable solution. Furthermore, many email campaigns will deliver attachments with different hashes. So while you could catch one specific malicious attachment with hash-based sigantures, we want to catch variants via logical signatures.

First, an overview of the ldb signature format:

SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;Subsig1;Subsig2;...  

We can break this down as:

  • SignatureName
    • Straight forward. Call it whatever you want. We can go with something mundane like "ClamAV.MalDoc.VBM" for now.


  • TargetDescriptionBlock:
    • This is where we can specify quite a few things, but most importantly: the file type this signature is meant to detect on. Right now it is 0-12, with each digit representing a different file type.
    • In this case we will use "Target:2" as 2 == OLE2 containers, including their specific macros. The OLE2 format is primarily used by MS Office and MSI installation files.


  • Logical Expression
    • This is where we will insert our boolean logic for detecting on our content matches. This uses operands like "&" for "and", "|" for "or", as well as "=", "<", ">".


  • SubSigN
    • These are our contents that will be matched upon and serve as the basis for detection. These also will be what we use in our logical expression.

A key thing to remember with these signatures is that the contents and rule options are seperated by a semi-colon. This is important, because your rule will error if they are left out. Again, there are many other options and indepth features in addition to what I explained here, but I won't be covering all of them.

File Analysis

With our target document downloaded, placed in a directory where we can scan and analyze it, we are ready to begin writing.

First, we should take a look at the doc to determine what is going on with it and what will be good to match on. For this document we will focus solely on the macros within it. We could, for example, write a signature on the "lure" if it contained one (e.g. "This document is protected, please enable macros to view!"). Taking a look at the macro contained in the document is as easy as using oletools' "olevba.py" as seen here:

oletools$ python olevba.py ../../../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin  

Which will return a lot of content, but we want to focus first on the last box of info that it spits out:

oletools-olevba-output.png

This has some valuble strings as well as good IOC data. Going back into Hybrid-Analysis, we can see a similar set of details in their platform here:

hybrid-analysis-macros.png

We should begin to make a list of suspicious strings that will be good for our signature. The idea is that this combination is likely going to end up poorly for the person opening it, thus we should make note and use them in our sig:

Output,Print #,Open,CreateObject,Environ,DoEvents  

Each of these strings become a SubSig that will be represented as a number in our Logical Expression. So, for here, "Output" would be content "0", "Print #" is "1", and so on. These strings should be converted to hex, as ClamAV will only match directly on hex.

One way to do this would be to run $ echo "Output" | xxd -p which gives us the hex version of "Output" or "4f75747075740a". Repeat this for the other strings we have identified. Furthermore, we can add modifiers to the contents such as making it caseless, etc. Content modifiers are enabled by adding two colons (::) after the hex string, and then placing the flag. So, to make "Output" caseless, it would look like this: 4f75747075740a::i;. The last content in the list of SubSigs does not require a semi-colon.

Additionally, we want to ensure we cover down on the various Auto* strings that a macro might use (DocumentOpen, AutoExec, etc) so it is imperative to keep a running tally of different kinds, or at the very least ensure it makes it in. It is better to have several in case the macro switches up what it decides to use. With some suspicious strings identified, we also probably want to match on the main loop being iterated in the macro, which we can see in the olevba.py output:

olevba-macro.png

For this, we can utilize the PCRE abilities that ClamAV has. With PCREs in ClamAV, they must always be anchored by a content (will get into), and start with a forward slash ("/") as well as end with a forward slash. I am sure there are multiple ways to write PCREs on this guy, but this is the PCRE I wrote to detect on part of the loop:

/[A-Za-z]+\s+=\s+(?P<nums>\d{2,4})\s+Do\s+While\s+[A-Za-z]+\s+\<\s+(?P=nums)\s+\+\s+\d+/si;

ClamAV has a decent PCRE implementation and allows for using several flags as well (seen at the end of the PCRE). The ClamAV document has a longer list, but here are a couple useful ones:

  • i

    • Case insensitive


  • s

    • PCRE_DOTALL, matches across line breaks


  • m

    • Multiline matching

Putting it all together

Armed with our content matches and PCRE, we can start to build out a solid ClamAV signature. First, we will begin with the signature name. As mentioned above, we can just go with "ClamAV.MalDoc.VBM". Name is important, but more useful to make a format and stick with it for all of your sigs.

ClamAV.MalDoc.VBM;  

Second, we will implement the TargetType, which will be 2. As discussed above this is the number for MS Office docs.

ClamAV.MalDoc.VBM;Target:2;  

Third, will begin to build out the Logical Expression to put all of our contents together. Remember our contents are considered numbers now? Here is where it comes into play. So, after converting our contents into hex: 4f7574707574;5072696e742023;4f70656e;4372656174654f626a656374;456e7669726f6e;446f4576656e7473 we can start building out the Logical Expression:

ClamAV.MalDoc.VBM;Target:2;(0&1&2&3&4&5);  

Then, add in our contents:

ClamAV.MalDoc.VBM;Target:2;(0&1&2&3&4&5);4f7574707574::i;5072696e742023::i;4f70656e::i;4372656174654f626a656374::i;456e7669726f6e::i;446f4576656e7473::i;  

We can then, add in our PCRE and anchor it to the first content so it works. In this case it isnt super important, just needs an anchor to work. Be sure to keep track of the content numbers, and add the PCRE as a new content in the Logical Expression:

ClamAV.MalDoc.VBM;Target:2;(0&1&2&3&4&5&6);4f7574707574::i;5072696e742023::i;4f70656e::i;4372656174654f626a656374::i;456e7669726f6e::i;446f4576656e7473::i;0/[A-Za-z]+\s+=\s+(?P<nums>\d{2,4})\s+Do\s+While\s+[A-Za-z]+\s+\<\s+(?P=nums)\s+\+\s+\d+/si;  

As mentioned before, we want to account for the various Auto open functions a macro might use, so we will incorporate those and add a logical "or" to account for them. Be sure to place an ampersand (&) infront of the new contents or else we will get an error:

ClamAV.MalDoc.VBM;Target:2;(0&1&2&3&4&5&6&(7|8|9|10));4f7574707574::i;5072696e742023::i;4f70656e::i;4372656174654f626a656374::i;456e7669726f6e::i;446f4576656e7473::i;0/[A-Za-z]+\s+=\s+(?P<nums>\d{2,4})\s+Do\s+While\s+[A-Za-z]+\s+\<\s+(?P=nums)\s+\+\s+\d+/si;446f63756d656e745f4f70656e::i;576f726b73686565745f4f70656e::i;4175746f5f4f70656e::i;4175746f4f70656e::i  

We are finished! We have crafted a good signature based on what we observed. Now, the testing begins.

Testing our signature

Now that we have a completed signature, we can plug it into our local-rules.ldb file and test it using clamscan! In your favorite method, put our new signature into local-rules.ldb file.

A couple things to remember: ensure there is NOT a newline after the signature, and make sure it is in one line and not split up. I will use gedit and paste the rule into local-rules.ldb:

sigs$ gedit local-rules.ldb  

gedit-sig-local.png

Now, to scan. Enter the following command to scan using our new sig against our malicious document:

sigs$ clamscan -d local-rules.ldb ../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin  

Boom! We have a detection!

sigs$ clamscan -d local-rules.ldb ../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin  
../samples/e5a31b34a6c54685ec8347443c5ea6dc97ecc215adfc752af4cb83a329f481dd.bin: ClamAV.MalDoc.VBM.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1  
Engine version: 0.99  
Scanned directories: 0  
Scanned files: 1  
Infected files: 1  
Data scanned: 0.00 MB  
Data read: 0.04 MB (ratio 0.00:1)  
Time: 0.006 sec (0 m 0 s)

If it doesnt fire, or gives errors here are some common things to check:

  • Ensure the logical operands contain proper operands (a & in the right spots)
  • Ensure there is a semi-colon between all sections of the rule (except for the final one)
  • Ensure there is not a new-line or any other malarkey after the rule in your ldb file

ClamAV has great error output, so it should help track down problems.

Final Thoughts

ClamAV is a great solution for detecting malicious behavior in documents, executable, exploits and many other file types. I think being able to write and know about ClamAV as a whole is a good toolset for any analyst. Here I provided some foundational knowledge on how to write and work with ClamAV signatures for the 0.99 engine.

If you have any feedback or questions please email me at jack@malwarefor.me.
Additionally, you can reach out on Twitter or follow for for updates