I recently built a simple chat client (I know, I’m recreating late ‘90’s tech!) and in order to make my chat application safer, I needed to prevent XSS attacks using HTML escaping. To start, I had no idea what the frack an XSS attack was, much less how to utilize HTML escaping to prevent one.
After a bit of research, I learned that an XSS attack, or cross-site scripting is a common type of hack where a user (likely malicious) attempts to upload a script onto a website. How does that look in the real world?
In my project, my peers attempted to “hack” my chat client by sending user messages (which the chat client received via AJAX). If one of those comments was simply: “hey what’s up Jordan, you’re the coolest dude that lives in your house”, then that would render on the website easily enough. However, if the comment contained a script tag, like this:
<script src=‘dobad stuff’> </script>, what would happen?
The Bad News
Real quick, here is what this might look at (first with an error-free example):
var userComment = 'Hi, how are you?'; $('body').append(userComment); //now the userComment is a child of the body
And now for an example where the
userComment isn’t a simple collection of letters and the most common punctuation, but instead a script, like so:
var badUserComment = "<script>console.log('youz been hacked')</script>"; $('body').append(badUserComment);
Here’s what that does in a page which has jQuery loaded:
Because I didn’t escape the
badUserComment console logs an innocuous comment, but it’s easy to ease how this technique might be used to execute much more malicious attacks.
HTML Escaping to the Rescue
/n so that the new line appears where intended.
Okay, so how does this work when putting HTML in the DOM? I googled around a bit, and after crawling through too many pages with incomplete answers, I figured out how HTML escaping works.
The Nitty, the Gritty, Welcome to Escape City
&, and many would argue you should escape any characters with a character code above 127 (meaning everything that isn’t a normal letter or number). Then you replace those characters with their HTML URL encoded equivalent. For ampersand, this would be
< this is
<, so on and so forth.
If this sounds tedious to you, then that’s good, because it’s not advisable for you to manually run your own homegrown regex checking function to replace the necessary characters in HTML on your own. Google itself recommends a few alternatives, largely encouraging developers to utilize libraries, but I’ll point out the easiest ones I encountered. Also, please note, as all articles like these make sure to point out, this is not an exhaustive list of ways to prevent XSS attacks, but rather a general primer on the topic.
The first ‘escaping’ utility I encountered appeared in Underscore.JS, and it worked much like the example I explained by running a regex over the most common offenders. Underscore also offers a mini-templating service which will take care of this as well. Other, more dedicated templating services such as a Handlebars.JS and Mustache.JS serve as more robust options which take care of a lot more than escaping, but they do escape as well.
', it’s not the most complete solution, a factor which should not be discounted. At the very least, it now fixes my