zargony.com

#![desc = "Random thoughts of a software engineer"]

Scramble email addresses in views to reduce spam

If you put your email address on a public web page, you can usually be sure to get tons of spam from there on, because address harvesters will sooner or later visit your page and recognize the email address.

There are different solutions to prevent harvesters recognizing an email address. I personally don't like the use of images to display email addresses or the use of feedback forms instead displaying addresses at all. These methods have a negative impact on the site usability, since a visitor wouldn't be able to easily copy an email address to his email application anymore.

Another method is to scramble email addresses in a way that harvesters cannot recognize it. Using JavaScript, the address is unscrambled and displayed to a human visitor. Even though this is not a foolproof solution, it provides the best measure between safety and usability in my opinion -- as long as you do it right.

So here's an easy way to use scrambled email addresses in Rails views.

Most tools that scramble text, simply use HTML character entity references (e.g. a becomes a). Obviously this isn't very helpful, since it doesn't even require JavaScript to decode such texts. It can be unscrambled with a simple search-and-replace operation (which would be just one line of Ruby code). I'm not an expert in address harvesting, but most probably harvesters can automatically decode character entity references nowadays.

So I made it a bit harder. The following helper method scrambles a text without the character codes itself in the result. It does this by storing character codes relative to the previous one and starting with a random value. Simple search-and-replace operations aren't sufficient to decode texts scrambled this way. To use it, simply drop the following method into application_helper.rb:

def safe_text (text)
  enc = []
  enc << rand(255)
  text.chars.size.times do |i|
    enc << text.chars[i] - enc.last
  end
  javascript_tag("var t=[#{enc.join(',')}]; for (var i=1; i<t.length; i++) { document.write(String.fromCharCode(t[i]+t[i-1])); }")
end

To scramble some text in a view, simply call safe_text with the text to be scrambled:

Contact us: <%= safe_text('info@example.com') %>

The result will look like this:

Contact us: <script type="text/javascript">
//<![CDATA[
var t=[22,83,27,75,36,28,73,47,50,59,53,55,46,0,99,12,97]; for (var i=1; i<t.length; i++) { document.write(String.fromCharCode(t[i]+t[i-1])); }
//]]>
</script>

The question is left open, if this method is worth anything. Maybe nowadays address harvesters are able to execute JavaScript code, which would effectively render any scrambling like the above useless. As said, I'm not a spam expert, but I suppose that most harvesters can be tricked this way (feel free to drop me a comment if you know more about it).

Update: I did some tests on different email address scrambling methods. Results are available here.