Monday, February 22

Writing Snort Rules Correctly

Let me start off by saying I'm not bashing the writer of this article, and I'm trying not to be super critical.  I don't want to discourage this person from writing articles about Snort rules.  It's great when people in the Snort community step up and explain some simple things out there.  There are mistakes, it comes with the territory.  If you choose to be one of the people that tries to write Snort rules, you also choose to be someone who wants to learn how to do it better.  That's why I write this blog post, not to bash the writer, but to teach.

I noticed this post today over at the "Tao of Signature Writing" blog, and to be honest I glanced over most of it figuring it was a rehash of things I've already read or things that have already been written from countless people about "Here's how you write Snort rules!".  I scrolled down quickly skimming, not reading at all really, and noticed this part:
Now, let us look at the second question: “We have “aol” as the id and Import method name. Should we use “aol” along with “Import”?”. Just because we narrowed down to “clsid:” followed by CLSID number, does not mean that we have to narrow down in this case too. Just like how the Shellcode will change, the attackers might change the ID too, to just find out if they could evade the IDS/IPS. Why give them a chance? Hence, we should broaden our search to just the import method: content:”.Import(“. The reason why we have “.” and “(” around the key “Import” is to narrow the chances of triggering the signature on some term “Import” and to concentrate on the vulnerable method.

This post is about ActiveX and CLSID detection with a Snort rule, trying to detect an AOL 9.5 ActiveX 0day.  Okay, fair enough, so the above paragraph is trying to find the Import command to call the javascript.  So I kept reading.

Then I got to this part:
In here, I would like to position the CLSID before the method. This would help me trigger the signature specific to “AOL 9.5 ActiveX 0day Exploit (heap spray)“. I can do this ordering by using “Offset”. We cannot set the “Depth” in this case, since the position of CLSID or Method in a packet will change according to the packet size or the way in which it is sent. Hence, the content of final signature would look something like this:

content:”clsid:A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase;content:”.Import(“; nocase; Offset:0;

The writer is correct in a couple things.
  • First, they say they want to position the CLSID before the method, so they want to do with using offset.
  • Second, they say they cannot set a "depth" because the position and method in the packet will change according to the packet size, which is partially correct.

However, the problem with this above signature is that the offset is placed after the second content match.

So here's what would happen with the above signature so far.  The CLSID content match is the longest, so it would be fed into the fast pattern matcher.  If the fast pattern matcher came across a packet that matched the CLSID that is specified in the rule, <leaves stuff out>, then the packet would then be run through the detection engine (rule) for detection.  Contrary to popular belief, unless an offset/depth/distance/within modifier is specified, there is no order for the packet to match.  So if I were to write the above as this:

content:”clsid:A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase;content:”.Import(“;nocase;

Snort doesn't care which order the content matches are in.  As long as both the contents are in the packet, then the rule will fire.  So putting a content:".Import("; nocase; offset:0; does absolutely nothing.  You can kind of think of offset:0; being implied, but if you don't have any relative content matches, then it really doesn't matter unless you are trying to be specific to a position match.  However, as the author already stated, you can't add a depth statement to the rule, so it plain, just doesn't matter.  I see this kind of thing all the time, so I figured common mistake.  So I kept on reading:
Now, let us look into the direction of traffic. Client-side exploits generally flow from server to client: “flow:to_client,established;“.

The author explains that "Client-side exploits generally flow from server to client".  Okay, correct in this instance, but not always, so let me explain:

Flow has four direction operators you can specify:

  • to_server
  • from_server
  • to_client
  • from_client

What happens is when I hear from people is that they think "server" as that 2U thing back in the server room (hence the name), and client being "you".  But that's not how Snort thinks about it.  Snort thinks about client server in the "who initiated the conversation" term.  So, at the beginning of a TCP conversation there is a 3-way handshake.  SYN, SYN-ACK, ACK.

The client is who initiated the conversation, the server is who is responding. So, in this case, since we are attempting to catch a web browser accessing a webpage and downloading a webpage which contains this CLSID, the flow would be to_client.  (Or from_server) Correct.  However, what if someone downloaded a PDF, and upon opening the PDF the PDF went and grabbed something off the internet.  This is a client side exploit, however, the flow would be reversed.  So, the author is correct in saying that "Client-side exploits are generally..." I wanted to explain to make sure no one was confused.  The "established" keyword means the the session is established.  So beginning on the 3rd part of the 3-way handshake.
In this case some folks might believe that CLSID is already in the “content” part of the signature, and that this is a repetition if we use it in PCRE once again. We are not using this PCRE to repeat the value in the content, but to ensure that we do not miss any possibilities of matching this exploit. Let us look into the PCRE part of this signature:


In here, the signature is telling the PCRE compiler that there is “< object” followed by strings and “>” with multiple-strings possibly following it followed by “classid” & “=” with the “clsid”, “:” and “{“. The true classid is then inserted into the PCRE. The PCRE ends with /i to indicate the case-insensitive nature of this regular expression.

The first paragraph is partially correct.  If you check for a content match, you can use a pcre to clarify what you are looking for.  This is done for a couple reasons.  One, as the author states above, is to not miss the possibilities of matching the exploit, but more accurately, it's to avoid obfuscation of the exploit.  So for example, let's go back and take a look at the content match before we look at the pcre portion.

content:”clsid:A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase;

Problem with this content match is, well, I wouldn't have put the specific "clsid:" in there.  Reason?  If I was an attacker and I wanted to bypass your rule, I would put "clsid: A105BD70-BF56-4D10-BC91-41C88321F47C”. (Notice the space after the colon.)  Which completely bypasses the content match.

So let's come back to the pcre and take a look at it.

Now, this PCRE format was written by the VRT and a lot of people have copied it blindly without understanding what it does.  So let me explain, as what the author wrote in the second paragraph quoted above, is wrong.  As I said, I'm not trying to be mean or whatever, I am simply trying to teach.

So, the pcre is this:


(I am going to put double quotes around the things we are trying to match that are explicit, the quotes don't actually exist in the regular expression unless specified)

So we are looking for "<OBJECT"

Then a whitespace (\s).  That's what "\s" is.  (It says 'followed by strings' in the above quoted paragraph).  Whitespace is a tab, (0x09), space (0x20), new line character, or a line feed (0x0A), or a carriage return (0x0D).  The "+" sign after the "\s" means 'any character directly proceeding it as many times, but there must be at least 1'.  So there must be 1 or more "\s" there.

Then you see this "[^>]", which the author says that we are positively looking for.  The thing about character classes "[ ]" is, they allow you to do some nifty things.  Range matching, ([0-9]), multiple matches, [abc] (this will look for either an a, b, or c, for one character), and you can also do negative matches.  Or "lack of" matches.  The way you specify a negative match within a character class is to use the carat within a character class.  So "[^>]" means, "the next character after any amount of positively matched "\s" cannot be a ">".  Directly after that is a "*" character.  The "*" is similar to a "+" but the difference is, while a "+" means you must have at least 1 match of the proceeding character (in this case the negative character class), the "*" means you don't have to have a positive match.  It means "0 or more".

Following that we have a "classid\s*=\s*" match.  So look for classid(maybeaspacehere,it'soptional)=(maybeanotherspacehere)

Then there is a "[\x22\x27]".  In regular expressions, if you want to specify a hex character you have to write "\x" before the hex.  So, you might see a space specified like this: 0x20.  You might see it specified in Unicode like this: %20.  In regular expressions, it would be "\x20".  Since there are two characters within the character class, 0x22 is the hex for a double quote.  "  and 0x27 is hex for a single tick. '

Since this is a run of the mill character class match (not a range or something more complex) this means that the next character that the "[\x22\x27]" pattern match is looking for is either a ' or a ".  Notice the "?" after the character class?  That's a 'lazy optional'.  So without going into a long book about lazy and greedy (which, by the way, if you are interested, I suggest checking out the book "Mastering Regular Expressions" by Jeffery Friedl, it's the bible), the "?" basically means "The Character that is directly in front of the "?" is optional".  So, it essentially means, when all put together the match is either a ' or a " or not at all.

Then we have (maybesomewhitepacehere)clsid(maybesomemorewhitespacehere):(maybesomemorewhitespacehere){(optionally)(maybesomemorewhitespacehere)A105BD70-BF56-4D10-BC91-41C88321F47C.

Notice that I translated "\x3a" and "\x7B" (the latter of which has the "?" behind it, so it's optional) above.

Then the modifiers of the whole Regular Expression at the end are "/si".

"s" means "include new lines in the dot metacharacter".  However, there are no "." metacharacters in the regular expression, so that was probably put there by habit (and good practice), and the "i" means "anything within the regular expression treat with case insensitivity"  similar to the "nocase;" keyword in Snort's regular rule language.

So the final signature that the writer comes up with is:

alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:”ActiveX Exploit Signature Sample”; flow:to_client,established; content:”clsid:A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase; content:”.Import(“; nocase; Offset:0;pcre:”/<OBJECT\s+[^>]*classid\s*=\s*[\x22\x27]?\s*clsid\s*\x3a\s*\x7B?\s*A105BD70-BF56-4D10-BC91-41C88321F47C/si”; reference:url,; rev:1;)

Which I am going to rewrite:

alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:"ActiveX Exploit Signature Sample"; flow:to_client,established; content:"A105BD70-BF56-4D10-BC91-41C88321F47C"; nocase; content:".Import("; distance:0; pcre:”/<OBJECT\s+[^>]*classid\s*=\s*[\x22\x27]?\s*clsid\s*\x3a\s*\x7B?\s*A105BD70-BF56-4D10-BC91-41C88321F47C/si"; reference:url,; rev:2;)

So, what did I do different?  Removed the "CLSID" content match, it won't speed up detection, and it checked for in the pcre anyway. So, if you are going to fire up the pcre engine to check the content match on the long content match, just knock out two birds with one stone.

What's with the "distance:0;" stuff?  I made the content match directly proceeding that relative to the previous content match.  Since I don't have a within, I don't constrain the match.

Why did you keep the ".Import(" stuff?  False positive reduction.  It will do nothing to speed up the match.

So, be careful when writing rules.  Unless you understand all the pieces and parts you can walk yourself right into a dark hole and do it wrong.  You can do that to yourself, but take extra care that you don't walk anyone down the hole with you.

Again, I post this, not to be mean, but to be constructive.


iamnowonmai said...

well done joel -
first one I have bookmarked. very nice explanation of a topic that is waaaaay over my head.

uberVU - social comments said...

Social comments and analytics for this post...

This post was mentioned on Twitter by JoelEsler: Writing Snort Rules is harder than it looks

Joel said...

A couple other people have sent me other questions like that via Email, I've asked them to post them as comments on the blog so that the conversation can be kept in one place.

Glad that it helped Robby!

Robby D said...

Hey, question: "?" in perl extended regexps makes a lazy expression, but I've been thinking about it in a different way than you've outlined it.

As you put it: "the “?” basically means “The Character that is directly in front of the “?” is optional”. So, it essentially means, when all put together the match is either a ‘ or a ” or not at all."

However, I've always thought of it as "inverse greedy". That is, if you don't put the question-mark there, the regex will try to match as long a string as possible that fits the parameters, but if you do include it, it, instead, matches as *short* a string as possible.




will give you the whole string, whereas


will give you "cfooMonkey".

I feel like this doesn't quite jive with the way you've explained it.

Is this a perl e-regex thing? Have I missed something about the overall conceptualization of the "?" in pattern matches?

Joel said...

No, you are right-ish. Inverse greedy is a good way to say it. So, least as possible, even optionally. It's like a "*", whereas it's 0 or more, but the question mark is 0 or 1 essentially.

Does that help?

You can actually make the question mark lazy, (as opposed to greedy) by placing a second question mark after the first.

Robby D said...

I think that does help, actually.

The reason I wasn't thinking in the way you were is that I've always only used the "?" as a modifier for a glob ("*") in which "inverse greedy" is the easiest way (for me) to think about it. However, when it is used by itself as a modifier, it is, as you put it, lazy: "it could be there, or not, doesn't matter".


will match any string with at least one character between "c" and "Monkey" but *not* "cMonkey"


would match either case mentioned.

Guess I need to go actually *read* Mastering Reg Exp instead of using it as a reference book.

Thanks for the clarification!

Joel said...

A couple other people have sent me other questions like that via Email, I've asked them to post them as comments on the blog so that the conversation can be kept in one place.

Glad that it helped Robby!

wolvee said...

I have read your blog post regarding "Writing Snort Rules is harder than it looks" You have explained it very well, but I have some questions.

Your final rules is

*alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:”ActiveX Exploit Signature Sample”; flow:to_client,established; content:”A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase; content:”.Import(“; distance:0; pcre:”/]*classids*=s*[x22x27]?s*clsids*x3as*x7B?s*A105BD70-BF56-4D10-BC91-41C88321F47C/si”; reference:url,; rev:2;)*

In the exploit object tag can be placed anywhere. so it is not mandatory that the vulnerable method will come after the clsid.

Microsoft Works 7 WkImgSrv.dll crash POC

function payload() {
var num = -1;
obj.WksPictureInterface = num;

/in the above exploit the vulnerable method is above the clsid. so what I feel, it is worth to remove the distance modifier to the second content match.

My final rule will be

**alert tcp $EXTERNAL_NET $HTTP_PORTS -> $HOME_NET any (msg:”ActiveX Exploit Signature Sample”; flow:to_client,established; content:”A105BD70-BF56-4D10-BC91-41C88321F47C”; nocase; content:”.Import(“; nocase; pcre:”/]*classids*=s*[x22x27]?s*clsids*x3as*x7B?s*A105BD70-BF56-4D10-BC91-41C88321F47C/si”; reference:url,; rev:3;)

Please correct me if I am wrong.

Joel said...

Wolvee wrote me this via email and I asked him to put it as a comment on the blog.

This is exactly the type of discussion that I was hoping to provoke by writing this post.

The trick is with this rule is, what does the ".Import(" get us? False Positive reduction I say in the post, and while that's partially correct, if we zoom back from it and look at the big picture. What does this get us?

The answer is, nothing. The point of the rule is to match the ActiveX. So why did I put the content match in there?

I put it in there to illustrate how to place two content matches and make the second one relative to the first. In other words, have match one, then match 2 after match one.

If I were to have this rule running in real life, I wouldn't have the second content match. Actually, in all reality, I wouldn't have the pcre in there at all.

Signature Analytics » Blog Archive » Errors/Correction in Tao of Signature Writing – Part 4 said...

[...] that Joel has made all possible corrections in the previous blog. You could read that from here: Thank you [...]

shadowbq said...

somebody is over engineering their snort rules :)

"If I were to have this rule running in real life, I wouldn’t have the second content match. Actually, in all reality, I wouldn’t have the pcre in there at all."

Thanks for the write up.

fimz said...

Hi Joel,

Good article, it has however confused me a bit.Im relatively new to snort, have been using it since a while but yesterday found myself fully blank when a friend of mine asked me a question:

Ive seen the terms snort rules and snort signatures being used interchangeably across many texts. I would really like to know which is which. e.g.
which heading would the rule below come under:

Snort Rule example or Snort Signature Example:

alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL Worm propagation attempt"; content:"|04|"; depth:1; content:"|81 F1 03 01 04 9B 81 F1 0 1|"; content:"sock"; content:"send"; reference:bugtraq,5310; classtype:misc-attack; reference:bugtraq,5311; sid:2003; rev:2;)

Comments, help, pointers appreciated? Im sure there are others who have came across the same controversy.


I replied to this as, yes it is a snort rule, which is taking action based on finding a certain signature in the traffic.

At the back of my mind I wasnt fully sure myself as ive seen several texts in which authors have confused the terms by using them interchangeably. That is why I believe its best to ask someone who knows.

Ive noticed you have also used it interchangeably in this article when you are proposing the corrected snort rule.

Hoping for a positive response.

Joel said...

We call Snort rules, rules. Signatures are traditionally look for "x" and match it. We have much much more functionality within Snort rules, (moving within a packet, judging numerical values and jumping, moving backwards in a packet for a match, etc.)

The one above is a simple rule. It's looking for several pieces of content.

Writing Snort Rules Correctly | InfoSec Resources said...

[...] Writing Snort Rules Correctly AKPC_IDS += "443,";Popularity: unranked [...]

Remove Spyware said...
This comment has been removed by a blog administrator.
Yaron said...

So if i have a rule that combines content:"..." terms and pcre expression, what snort does is the following:
1. Match the longest pattern (fast pattern)
2. If (1) matches then match all patterns
3. If (2) matches invoke pcre over the entire packet

Is that correct?

Joel Esler said...

Essentially, that is correct. There are some other things like port buckets and what not in there, but yes, what you said is correct for the most part.

Yaron said...

That was helpful.