Preventing outside linking to images to prevent bandwidth stealing

By reeset / On / In General Computing

One of my hats that I wear at home is IT professional for my wife, specifically when it comes to her blog.  To keep things running well, I periodically monitor bandwidth usage and space usage to make sure that we keep our hosts happy.  Well, over the past two months, I’d noticed a really, really, really large spike in bandwidth traffic.  Over a general month, the blog handles approximately 60 GB of http traffic.  A generous portion of that comes from robots that I allow to harvest the site (~12 GB), the remainder from traffic from visitors.  This changed however, last month when bandwidth usage jumped from ~60 GB a month to a little over 120 GB.  Now, our hosts are great.  We pay for 80 GB of bandwidth a month, but this is a soft cap so they didn’t complain when we went way over our allotment.  At the same time, I like to be good neighbors on our shared host – so I wanted to figure out what was causing the spike in traffic.

Looking through the log files and chatting with the hosts (who have better log files), we were able to determine that the jump in traffic was due to one image.  This one (example of linking to the file — should be broken unless you are reading it through the google reader, which I allow as an exception):

 

(this one has been downloaded and placed on my blog)

It’s an image from the Thomas Jefferson memorial.  My wife had taken the picture the last time we were in DC and had posted it here: http://athomewithbooks.net/2012/10/saturday-snapshot-october-27/.  In Oct. and 1/2 of Nov., this single image had been responsible for close to 100 GB of bandwidth traffic.  What I couldn’t figure out was where it was all coming from…but looking at the logs, we were able to determine that it was being linked to from StumbleUpon.  While the linking to the image wasn’t a problem, the bandwidth usage was.  So, I started to look at options, and there actually is a quite elegant one if you find yourself in a position where you need to limit bandwidth.

The simple solution is to simply not allow linking to any images (or specific file types) from outside the blog domain.  This is actually pretty easy to accomplish using mod_rewrite and an .htaccess file.  Using the following snippet (replacing my domain – athomewithbooks.net with your own):

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?athomewithbooks.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} |^http://(www\.)?athomewithbooks.net.*$  [NC]
RewriteRule \.(gif|jpg|js|css|png|jpeg)$ – [F]

I’ve directed the webserver to only serve these file types (gif|jpg|jpeg|js|css|png) when linked from my specific domain.  So, if I were to try and link to an image from a different domain, the image would come back broken (though – you can also serve a “you can’t link to this image” image if you wanted to as well.  The way that this works, is it reads the headers passed by the browser, specifically the HTTP_REFERRER value to determine if the request is originating from an allowed domain.  If it’s not, it doesn’t server the image.

Now, this method isn’t perfect.  Some browsers don’t pass this value, some pass it poorly – so it’s likely some people that shouldn’t see the image will see it (due to the first line which serves content if the domain isn’t defined) but in generally, it provides a method for managing your bandwidth usage.

–TR

2 thoughts on “Preventing outside linking to images to prevent bandwidth stealing

  1. I’ve worked on a couple of sites that have been slashdotted, but I think yours might be the first I’ve heard of that has been stumbleuponed, congratulations. It’s also possible to implement http forwards with mod_rewrite, which can be used to load images from a hosting service with a lot more bandwidth.

    1. That’s true. Alyce use to keep most of her images hosted off another service, but when we migrated, it made life a little more difficult. When terms changed on that service — more difficult still. So, it reduces complications to keep everything on our locally hosted servers. It means I sometimes have to worry about bandwidth — but that’s not necessarily a bad thing.

      –tr