In our everyday lives we are constantly validating ‘things’. For instance, before we drink water from a bottle, we make sure that the seal is intact before drinking it. That way we help ensure that by drinking that water we don’t have undesired effects like sickness in our bodies later on. When I go to my favorite Indian restaurant here in the Seattle area, I always verify with the waiter/waitress that my meal is at the lowest level of spice to ensure that it doesn’t have an undesired effect on my body later on (grin). We are constantly validating things in the real-world. Unfortunately in the digital world we aren’t as diligent, and where we are the wrong technique is being employed. As a result malicious hackers are able to do nasty things like steal millions of credit card numbers and compromise our networks.
Performing proper and well-placed input validation are absolutely critical first steps in keeping the bad guys out of your enterprise systems, enterprise networks, and especially away from your organization’s sensitive information. While the idea of validating input itself is intuitive and straight forward, properly implementing an input validation strategy however can be hardly that. Without a sound understanding and strategy to validating input, enterprises can spend resources on control mechanisms that are flawed and create a false sensitive of security (arguably worse than not validating input at all). In this article, I’ll show you different approaches on how to validate inputs.
Approaches to Input Validation
Step back into the real-world for a moment. Say we wanted to make sure that a can of soda (input), like Coke or Pepsi, was safe to drink before consuming it into our bodies (system). What are the some of the cues or indicators that would help ensure that can of soda was not tampered with and indeed safe to drink? For starters, you could inspect the can itself for puncture marks to ensure that the can has not been tampered with after leaving the factory. You could also listen for cues like the crack of the lid and the quick wisp of gas being released when you open the can. Any one of these cues help us verify that the drink is safe to consume, and in the digital world there also exists cues or factors that help us verify that input from potentially untrusted sources are safe to consume into our systems. They include:
- Type. This factor indicates the kinds of input you are expecting from your users. For instance, are you expecting just numbers (0-9) as in the case of credit numbers (i.e. 1234-1234-1234-1234) or letters like the ones used in country abbreviations (i.e. CA or US)?
- Length. This factor indicates the amount of data we are expecting. For instance, if the input we are expecting to get is a credit card number, then we can expect the length to be 16 digits.
- Format. This factor indicates how the data is expected to be arranged. For instance, zip codes in the United States are typically 5 digits so we could use say the expected format is five consecutive digits. Postal codes in Canada however look like this M1P2X3 so we could say the expected format is an uppercase alphabet character followed by a digit, repeated 3 times.
- Range. This factor indicates the upper and/or lower bounds of the input we are expecting. For instance we could say that we expect alphabetic characters, but only vowels.
- Black-List Technique. The black-list technique is sometimes referred to as the Principle of Exclusions and works by first defining a set of unacceptable or bad inputs into a system. It then checks any input into a system to see if it matches any of those unacceptable input patterns defined earlier in the ‘bad’ set. If a match is found, then the input is considered malicious or a possible threat and rejected. If a match is not found, then the input is assumed to be valid and consumed by the system. I placed emphasis on the word assumed on purpose because as we’ll see later in our discussion I’ll show you how an attacker can circumvent the black-list technique and why this technique should be avoided whenever possible.
- White-List Technique. The opposite of the black-listing technique is the white-listing technique. This technique (or sometimes referred to as the Principle of Inclusions) validates inputs into a system by first defining a set of acceptable inputs or expected inputs. It then tries to match any input into that system against the set of acceptable inputs. If a match is found, then the input is valid and the system continues to use or consume that input. Now, if a match is not found, that is the input is not contained in the valid set, then the input must be part of the unacceptable set and therefore is considered a possible threat to the system and rejected.

Choosing the Correct Input Validation Technique
We’ve just discussed the inner workings of the black-list and white-list techniques to validating input. What has eluded our discussion so far is the answer to the question of which of the two input validation techniques should we use? I alluded to the answer before, but in case you missed it in general you should use the white-list technique and avoid the black-list technique whenever possible. Here’s why:Our challenge really is being able to reliably tell if an input into a system is good or bad (an attack to the system). Let’s take a look at the two input validation techniques and see how reliably we can make this conclusion.
We’ll start with the white-list technique. Recall that the white-list technique defines the set of all acceptable inputs to a given system. The set of all acceptable inputs into a system is known and can be concisely defined, that is you could write down everything in this set if you had to. Sure, that set could be very large, but it has a limit and is therefore finite. Since we’re able to map out a complete acceptable set a developer can easily take any input and reliably make the conclusion that if the input does not exist in the acceptable set, it therefore must be part of the unacceptable set and should be rejected. From a scalability perspective, this technique is easy to implement.
This conclusion about the validity of input however cannot be easily or reliably made using the black-list technique. Recall that the black-list technique works in the opposite fashion as the white-list technique, that is it defines the set of all unacceptable inputs (or possible attack inputs) into a given system. The problem here is that we can’t be sure that our unacceptable set is complete because we can’t predict all the possible actions or inputs from an attacker. We could reasonably predict the likely actions, but not all which still leaves us short of a complete set. The unacceptable set is in actuality infinitely large and therefore by definition it is impossible to define all the possible bad inputs into a system. So if a developer tries to make a conclusion about the validity of some input by saying since this input doesn’t exist in the (incomplete) unacceptable set it must be a valid input they’re in hot water. This is because if they (very likely) miss even a single bad input in his or her black-list set then the attacker can mask their attack as a valid input circumventing the security control entirely. I don’t want to mislead and leave you with the impression that the black-list technique is absolutely horrible and should be avoided like the plague, but it’s very difficult to implement correctly. Even if you were able to get your black-list to work in 99% of the cases, if the attacker is able to find the 1% case where it fails, all you’ve really achieved is 100% failure.

Let’s take a look at an example to illustrate these two important properties of the white-list and black-list technique. The source code for the example is available for download at the end of the article. In our example below, we’ve created program that asks the user to enter in a color. If the color entered is a primary color (red, blue or yellow) then the program will indicate that it is a primary color otherwise that it is not a primary color.

Our goal as the attacker is to trick this program to saying that a color we entered in is a primary color when really it isn’t.Using the white-list technique our example .NET (C#) validation code might look like this.
With the white-list technique we defined the colors RED, BLUE, and YELLOW as our acceptable list of inputs. Anything that the user (or attacker) enters that isn’t contained in this list is automatically recognized as non-primary and rejected. Hooray, our program works correctly!

Now let’s see what happens when we modify our program to use the black-list technique. The example validation .NET (C#) code would look something like this.
If an attacker tries to enter in a non-primary color that exists in our black-list, the program correctly recognizes the input as not a primary color. However what happens if the attacker enters in a color that is not in our black-list, but also isn’t a primary color either like ‘puce’? Our program incorrectly concludes that ‘puce’ is a primary color and reports it as one. If this was a real-life scenario with security implications, we’d be sunk so clearly we want to avoid the black-list technique whenever possible.

Exceptions to the Rule: When White-Listing Fails
Earlier I said in general we want to use the white-list technique and avoid the black-list technique; however there will be scenarios where the white-list technique is not sufficient or applicable. This is especially so in situations where input factors such as the type, length, format or ranges are not known ahead of time. For instance, one of our customers that we were helping had a common scenario where users could enter in a description of their entry.
The description would be posted back on a web page and would be susceptible to potential cross-site scripting (XSS) attacks. We don’t know what the user will enter in ahead of time so white-listing is out the door. In this instance we used a concept called defense-in-depth which I’ll write about later to mitigate the overall risk. Briefly, the general idea behind defense-in-depth is you utilize several layers of defense to reduce the probability of some attack from succeeding. For instance, a defense-in-depth approach might be to use input validation as one layer of defense, but also implement another. In our description box scenario we used the ASP.NET page request validation at the server as one layer, and the Microsoft Anti-Cross Site Scripting Library V1.5 that I published when I was at Microsoft as another layer to significantly reduce the probability that a XSS attack would successfully occur.
There are scenarios where black-listing can be used effectively along with the white-list technique. For instance, white-listing can be implemented using regular expressions (I’ll be discussing this in a later article). One downfall of regular expressions can be that they can be slow if the regular expression is not optimized or excessively large which can lead to overall performance degradation. If there are certain inputs that you know should not be accepted by a system (black-list) you can check for those inputs before incurring the cost of the regular expression (white-list).
Conclusion
As layers within the enterprise such as the network and host layer become more and more secure, attackers are switching their focus to the application layer, which as of the time of this article continues to be the most neglected and least secured enterprise layer. Failing to validate input from untrusted sources leaves applications in vulnerable states that allow attackers to do things such as stealing credit card information, compromise networks and bring down critical services. What’s worse, failing to properly validate input from untrusted sources continues to leave applications and enterprise systems in vulnerable states, but also with a false sense of security amongst business owners and customers.In this article I discussed different approaches to validating input from untrusted sources, in particular using the white-list and black-list techniques. Unfortunately, knowing the approaches to input validation is one third of the entire story. Business owners still need to understand approaches to when to validate input and what input to validate which will be the topic of future posts.
Downloads
1 comments:
Your analogies were very didactic to learn input validation problem and countermeasures.
Cheers,
Fabrício Braz
Brasilia - Brasil
Post a Comment