Free Hostia vs. the scraper sites

2008.05.04 | Posted in Web

Since creating this blog, I’ve gotten a fair amount of pingbacks from scraper sites. It’s quite easy to mark them as spam and mass delete them at my convenience, but it’s still annoying to know that somewhere, your content is appearing on someone else’s site without proper citation. Google PageRank also lowers the ranks of blogs with many pingbacks from scraper sites - another incentive to take action rather than sit passively.1

If you’re not sure what a scraper site is, Wikipedia defines it as2:

A scraper site is a website that copies all of its content from other websites using web scraping. No part of a scraper site is original. A search engine is not a scraper site: sites such as Yahoo and Google gather content from other websites and index it so that the index can be searched with keywords. Search engines then display snippets of the original site content in response to a user’s search.

One of the solutions is to contact the host provider and see if they’ll remove the site for content violations. I recently reported 3 separate sites to Free Hostia, a company that has free hosting, paid hosting, and domains. Before I continue, I want to emphasize that I am not shilling for Free Hostia - my site is hosted elsewhere and I’ve never heard of them until I saw the pingbacks. However, I thought I would get a faster response with the hosting company than with Google AdSense.

Google KidSense

Free Hostia responds in 24-36 hours

Free Hostia did exactly what they promised to in the inquiry email I sent them - terminate the copyright violators immediately. Whether or not the owners will try to resurrect their accounts is a separate matter but as of now, 3 scraper sites were suspended within 24-36 hours of my email and are still suspended at the time of writing. I have to say I’m happy to know at least 1 company out there takes scraper sites seriously and is willing to help out the bloggers retain their content. If your blog is being scraped by a site with a “xxxxxxx.freehostia.com” address, report them! I think you’ll find that they deal with the problem a lot more quickly than taking the long route.

  1. http://lorelle.wordpress.com/2007/11/27/how-to-stop-content-theft-the-best-tips/ []
  2. http://en.wikipedia.org/wiki/Scraper_site []

Tags

Related Posts

Comments

Nice, I’ll remember this. I don’t know what the deal is with scraping anyway, don’t APIs take that place ^^

Why the hell do scraper sites exist anyway?

Lawl, yeah, that’s a problem for me too, though I don’t pay much heed to it. Also, there’s a drastic measure one can take, AFAIK, there’s this anti-leech plugin that works in such a way that fake content gets published there instead, usually stuff banned by Google Ads, owning these scraper sites in the process.

@Nagato

IIRC, these sites have ads plastered all over them, and with stolen content, they hope someone will click on their links and the like.

@Ryan: Google is getting smarter about them, as is Yahoo, but people still are trying to make money without making anything themselves. :(

@Shin: I tried one anti-leech plugin but I can never get it to work properly. Which one do you use?

Actually I don’t use any, just came across one by RedAlt that sounded fun to use, although I never got around to implementing it due to lack of know-how ^^;

I’m glad to hear that this situation was resolved and that Free Hostia was a good neighbor in this area. I’ll be sure to note that should I need to work with them in the future.

Thank you for posting your update here, it is a big help to me.

Let mke now if there are any hosts you’ve had a lot of trouble with and would like me to see if I can help with!

It is really a pest sometimes, what with these scrapper sites. I am having some meh issues with them since they are appearing more often. X_X

@Impz: My blog is only a few months old and already getting scraped. I figured it would happen at some point but I never realized how fast they work. x.x

You can’t really stop them from linking to your site, so the only thing you can do is to mark the pingback as spam. Those sites are quite an eyesore though, especially when you see incoming links for your blog flooded with them.

Leave a Reply

Initial comments are held for moderation. Please read the comment policy for more information.